REFORM: Robust, EFfective, and Optimal Retrieval Models [ Team ] [ Results ] [ Publications ] [ Funding ]

1. Introduction

Although many different retrieval models have been proposed and studied ever since the beginning of the field of IR, there has been no single model that has proven to be the best. Theoretically well-motivated models all need heuristic modifications to perform well empirically. It has been a long-standing scientific challenge to develop principled retrieval models that also perform well empirically. Existing retrieval models have several fundamental limitations: (1) The performance of a retrieval model is highly sensitive to the document collections and queries in an unpredictable way. (2) A model that performs well on some data set may perform poorly on another data set. (3) Heavy parameter tuning must be done manually to achieve optimal performance. In this project, we aim to develop novel retrieval models that are robust (w.r.t. the variation of document collections and queries), effective (in terms of retrieval accuracy), and can guarantee optimality to certain extent.

2. Team members

Current members

ChengXiang Zhai (Professor)
Yuanhua Lv (Ph.D.)
Qiaozhu Mei (Ph.D. student)
Xuanhui Wang (Ph.D.)

Past members

Azadeh Shakery (Ph.D., 2008)
Hui Fang (Ph.D., 2007, Ohio State University)
Tao Tao (Ph.D., 2007, Microsoft)

3. Research Results

We have developed and/or studied many different retrieval models. At a high-level, our main research results are in the following three directions:

Statistical language models

Statistical language models have recently been applied to information retrieval with a lot of success. Due to their solid statistical foundation, they make it possible to automatically tune retrieval parameters through statistical estimation. We have been developing new language models that are more robust and effective than existing models and have also been applying language models to non-traditional retrieval tasks such as expert finding and review assignment.
Axiomatic approaches to information retrieval

We have established a novel axiomatic retrieval framework that opens up new directions in studying retrieval models. We show that intuitive retrieval heuristics can be captured by formally defined constraints on retrieval functions and through the analysis of these constraints we can predict the empirical behavior of a retrieval method analytically. By using this framework, we have developed several new effective retrieval models.
Hypertext retrieval model
A major challenge in developing models for hypertext retrieval is to effectively combine content information with the link structure available in hypertext collections. Although several link-based ranking methods have been developed to improve retrieval results, none of them can fully exploit the discrimination power of contents as well as fully exploit all useful link structures. We developed a general relevance propagation framework for combining content and link information. We further generalized the framework to support structure-based propagation of content information in a general way, which makes it possible to apply the framework to many other retrieval tasks.

4. Selected Publications (See all publications)

Qiaozhu Mei, Duo Zhang, ChengXiang Zhai. Smoothing Language Models with Document and Word Graphs , Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval ( SIGIR'08 ), to appear. ( 17% acceptance)
Xuanhui Wang, Hui Fang, ChengXiang Zhai. A study of methods for negative relevance feedback , Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval ( SIGIR'08 ), to appear. ( 17% acceptance)
Azadeh Shakery, ChengXiang Zhai. Smoothing Document Language Models with Probabilistic Term Count Propagation, Information Retrieval Journal, to appear.
Qiaozhu Mei, Hui Fang, ChengXiang Zhai, A Study of Poisson Query Generation Model for Information Retrieval, Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval ( SIGIR'07 ), pages 319-326. ( 18% acceptance) pdf
Hui Fang, ChengXiang Zhai, Probabilistic Models for Expert Finding , Proceedings of the 29th European Conference on Information Retrieval (ECIR'07), pages 418-430. ( 19% acceptance) pdf
Tao Tao, ChengXiang Zhai, An Exploration of Proximity Measures in Information Retrieval, Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval ( SIGIR'07 ), pages 295-302. ( 18% acceptance) pdf
Jing Jiang and ChengXiang Zhai, Extraction of coherent relevant passages using hidden Markov models, ACM Transactions on Information Systems, 24(3), July 2006, pages 295-319. URL
Tao Tao, ChengXiang Zhai, Regularized Estimation of Mixture Models for Robust Pseudo-Relevance Feedback Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval ( SIGIR'06 ), pages 162-269. ( 19% acceptance) pdf
Tao Tao, Xuanhui Wang, Qiaozhu Mei, ChengXiang Zhai, Language Model Information Retrieval with Document Expansion. Proceedings of HLT/NAACL 2006, pages 407-414. ( 25% acceptance) pdf
ChengXiang Zhai and John Lafferty, A risk minimization framework for information retrieval , Information Processing and Management ( IP &M ), 42(1), Jan. 2006. pages 31-55. URL
Hui Fang, ChengXiang Zhai, Semantic Term Matching in Axiomatic Approaches to Information Retrieval Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval ( SIGIR'06 ), pages 115-122. ( 19% acceptance) pdf
Hui Fang, ChengXiang Zhai, An Exploration of Axiomatic Approach to Information Retrieval , Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval ( SIGIR'05 ), 480-487, 2005. pdf ( 19% acceptance)
Hui Fang, Tao Tao, ChengXiang Zhai, A formal study of information retrieval heuristics, Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval ( SIGIR'04), pages 49-56, 2004. Best Paper Award. pdf ( 22% acceptance )
ChengXiang Zhai, William W. Cohen, and John Lafferty, Beyond Independent Relevance: Methods and Evaluation Metrics for Subtopic Retrieval , Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval ( SIGIR'03 ), pages 10-17, 2003. ps, pdf ( 17% acceptance )
ChengXiang Zhai, John Lafferty, A study of smoothing methods for language models applied to information retrieval , ACM Transactions on Information Systems ( ACM TOIS ), Vol. 22, No. 2, April 2004, pages 179-214. ( ps)
John Lafferty and Chengxiang Zhai, Probabilistic relevance models based on document and query generation , In Language Modeling and Information Retrieval, Kluwer International Series on Information Retrieval, Vol. 13, 2003. ps, pdf
Chengxiang Zhai and John Lafferty, Model-based feedback in the language modeling approach to information retrieval , Proceedings of the Tenth ACM International Conference on Information and Knowledge Management (CIKM'01), pages 403-410, 2001. ps, pdf ( 25% acceptance)

5. Funding Support

National Science Foundation, CAREER grant IIS-0347933
Alfred P. Slown Research Fellowship
UIUC Faculty Startup

[ Team ] [ Results ] [ Publications ] [ Funding ]

1. Introduction

2. Team members

3. Research Results

Statistical language models

Axiomatic approaches to information retrieval

Hypertext retrieval model

4. Selected Publications (See all publications)

5. Funding Support