Text Information Management and Analysis Group
About
|
People
|
Projects
|
Publications
|
Downloads
|
Internal Wiki
Under Construction: More to come soon
Software
Modern Text Analysis (MeTA) Toolkit
Extended rstandard retrieval models for appropriate lower-bounding of TF (BM25+, Dir+, PL2+, and others)
Positional Language Models
Positional Relevance Models
Opinosis Opinion/Text Summarizer
(Ganesan & Zhai COLING 2010)
UCAIR Toolbar
(an Internet Explorer plug-in to support client-side personalized search developed by the UCAIR team)
Search logger (browser plug-in to log your web search history) developed by
Bin Tan
:
IE version (.msi)
,
Firefox version (.xpi)
See
here
for usage.
GeneRecognizer
: A gene/protein name recognition tool, trained on the annotated data from BioCreative I Task 1A (written by
Jing Jiang
)
A Tokenization Tool for Biomedical Text
(written by
Jing Jiang
)
C++ wrapper for WordNet's C API
written by
Bin Tan
(only supports HYPERPTR currently; easily finds the most frequent synset by tagged count)
Data Sets
Data Set for Mobile App Retrieval
used in (Park et al. SIGIR 2015).
Data set for Latent Aspect Rating Analysis
used in
(Wang & Zhai KDD 2011)
and
(Wang & Zhai KDD 2010)
.
Data set for online discussion structure learning
used in
(Wang & Zhai SIGIR 2011)
.
Data set for forum post retrieval
used in (Duan & Zhai ECIR 2011).
Data Set for Implicit Feedback
used in
(Shen et al. SIGIR 05).
Data Set for Evolutionary Theme Pattern Discovery
used in
(Mei & Zhai KDD 2005)
.
Data Set for Multi-Aspect Review Assignment
used in (Karimzadehgan & Zhai CIKM 2008).
Data Set for Contrastive Summarization
used in (Kim & Zhai CIKM 2009)
Data Set for Opinion Matching
Data Set for Opinosis
used in (Ganesan & Zhai COLING 2010)
Presentations
MIAS 2011 Tutorial on IR and Web Information Access
HLT 2007 Tutorial on LM for IR