REFORM: Robust, EFfective, and Optimal Retrieval Models
[ Team ]
[ Results ]
[ Publications ]
[ Funding ]
1. Introduction
Although many different retrieval models have been proposed and studied ever since the beginning of
the field of IR, there has been no single model that has proven to be the best. Theoretically well-motivated models all need heuristic modifications to perform well empirically. It has been a long-standing
scientific challenge to develop principled retrieval models that also perform well empirically.
Existing retrieval models have several fundamental limitations: (1) The performance of a retrieval model is highly sensitive to the document collections and queries in an unpredictable way. (2) A model that performs well on some data set may perform poorly on another data set. (3) Heavy parameter tuning must be done manually to achieve optimal performance.
In this project, we aim to develop novel retrieval models that are robust (w.r.t. the variation of document collections and queries), effective (in terms of retrieval accuracy), and can guarantee optimality to certain extent.
2. Team members
- Current members
- Past members
- Azadeh Shakery (Ph.D., 2008)
- Hui Fang (Ph.D., 2007, Ohio State University)
- Tao Tao (Ph.D., 2007, Microsoft)
3. Research Results
We have developed and/or studied many different retrieval models. At a high-level, our main research results
are in the following three directions:
-
Statistical language models
Statistical language models have recently been applied to information retrieval with a lot of success. Due to their solid statistical foundation, they make it possible to automatically tune retrieval parameters through statistical estimation. We have been developing new language models that are more robust
and effective than existing models and have also been applying language models to non-traditional retrieval
tasks such as expert finding and review assignment.
-
Axiomatic approaches to information retrieval
We have established a novel axiomatic retrieval framework that opens up new directions in studying
retrieval models. We show that intuitive retrieval heuristics can be captured by formally defined
constraints on retrieval functions and through the analysis of these constraints we can predict the empirical behavior of a retrieval method analytically. By using this framework, we have developed several
new effective retrieval models.
-
Hypertext retrieval model
A major challenge in developing models for hypertext retrieval
is to effectively combine content information with the link
structure available in hypertext collections. Although
several link-based ranking methods have been developed to
improve retrieval results, none of them can fully exploit the
discrimination power of contents as well as fully exploit all
useful link structures. We developed a general
relevance propagation framework for combining content and link
information. We further generalized the framework to support structure-based
propagation of content information in a general way, which makes it possible
to apply the framework to many other retrieval tasks.
- Qiaozhu Mei, Duo Zhang, ChengXiang Zhai.
Smoothing Language Models with Document and Word Graphs ,
Proceedings of the 31st Annual International ACM SIGIR Conference on Research and
Development in Information Retrieval ( SIGIR'08 ), to appear.
( 17% acceptance)
- Xuanhui Wang, Hui Fang, ChengXiang Zhai.
A study of methods for negative relevance feedback ,
Proceedings of the 31st Annual International ACM SIGIR Conference on Research and
Development in Information Retrieval ( SIGIR'08 ), to appear.
( 17% acceptance)
- Azadeh Shakery, ChengXiang Zhai.
Smoothing Document Language Models with Probabilistic
Term Count Propagation, Information Retrieval Journal, to appear.
- Qiaozhu Mei, Hui Fang, ChengXiang Zhai,
A Study of Poisson Query Generation Model for Information Retrieval,
Proceedings of the 30th Annual International ACM SIGIR Conference on Research and
Development in Information Retrieval ( SIGIR'07 ), pages 319-326. ( 18% acceptance) pdf
- Hui Fang, ChengXiang Zhai, Probabilistic Models for Expert Finding , Proceedings of
the 29th European Conference on Information Retrieval (ECIR'07), pages 418-430. ( 19% acceptance) pdf
- Tao Tao, ChengXiang Zhai, An Exploration of Proximity Measures in Information Retrieval,
Proceedings of the 30th Annual International ACM SIGIR Conference on Research and
Development in Information Retrieval ( SIGIR'07 ), pages 295-302. ( 18% acceptance) pdf
- Jing Jiang and ChengXiang Zhai,
Extraction of coherent relevant passages
using hidden Markov models, ACM Transactions on Information
Systems, 24(3), July 2006, pages 295-319. URL
- Tao Tao, ChengXiang Zhai,
Regularized Estimation of Mixture Models for Robust Pseudo-Relevance Feedback
Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval ( SIGIR'06 ), pages 162-269. ( 19% acceptance) pdf
- Tao Tao, Xuanhui Wang, Qiaozhu Mei, ChengXiang Zhai,
Language Model Information Retrieval with Document Expansion.
Proceedings of HLT/NAACL 2006, pages 407-414. ( 25% acceptance) pdf
- ChengXiang Zhai and John Lafferty,
A risk minimization framework for information retrieval ,
Information Processing and Management ( IP &M ), 42(1), Jan. 2006. pages 31-55.
URL
- Hui Fang, ChengXiang Zhai,
Semantic Term Matching in Axiomatic Approaches to Information Retrieval
Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval ( SIGIR'06 ), pages 115-122. ( 19% acceptance) pdf
- Hui Fang, ChengXiang Zhai, An Exploration of Axiomatic Approach to Information Retrieval ,
Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval ( SIGIR'05 ), 480-487, 2005.
pdf ( 19% acceptance)
- Hui Fang, Tao Tao, ChengXiang Zhai, A formal study of information retrieval heuristics,
Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval ( SIGIR'04), pages 49-56, 2004. Best Paper Award. pdf ( 22% acceptance )
- ChengXiang Zhai, William W. Cohen, and John Lafferty, Beyond Independent Relevance: Methods and Evaluation Metrics for Subtopic Retrieval ,
Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval ( SIGIR'03 ), pages 10-17, 2003.
ps, pdf ( 17% acceptance )
- ChengXiang Zhai, John Lafferty, A study of smoothing methods for language models applied to information retrieval , ACM Transactions on Information Systems ( ACM TOIS ), Vol. 22, No. 2, April 2004, pages 179-214. ( ps)
- John Lafferty and Chengxiang Zhai, Probabilistic relevance models based on document and query generation , In Language Modeling and Information Retrieval, Kluwer International Series on Information Retrieval, Vol. 13, 2003. ps,
pdf
- Chengxiang Zhai and John Lafferty, Model-based feedback in the language modeling approach to information retrieval , Proceedings of the Tenth ACM International Conference on Information and Knowledge Management (CIKM'01), pages 403-410, 2001. ps,
pdf ( 25% acceptance)
5. Funding Support
- National Science Foundation, CAREER grant IIS-0347933
- Alfred P. Slown Research Fellowship
- UIUC Faculty Startup
[ Team ]
[ Results ]
[ Publications ]
[ Funding ]