|
QueryClinic: Improve Search Accuracy for Difficult Queries
|
[ Team ]
[ Results ]
[ Publications ]
[ Funding ]
1. Introduction
With the explosive growth of online information, such as news articles, email messages, scientific
literature, government documents, and information about all kinds of products on the Web,
search engines have now become essential tools in all aspects of our life;
clearly, their effectiveness would directly affect our productivity and quality of life.
The current generation search engines are very useful, but they tend to work well only for
easy queries (e.g., finding homepages or finding popular/known topics) and would perform poorly
for difficult queries (e.g., queries with ambiguous words or queries about unknown topics).
It is quite common that a user would have to reformulate a query many times,
yet still would not get satisfactory results. In such cases, the user has to
either spend a significant amount of time in searching or simply give up. Thus improving search
accuracy for such difficult queries can bring significant benefits to users.
This project aims to develop an interactive search environment (called QueryClinic)
to better support a user in finding such hard-to-find information. The main idea behind QueryClinic
is to make the search process a collaborative process in which QueryClinic and a user
would interact with each other and work together toward improving search results. This is in contrast with the
current search process in which a search engine passively responds to a user's query with some search results. Specifically, QueryClinic would actively involve a user in the search process
so that the user can give more input to the search process and also receive more guidance and assistance
in reformulating queries. In this way, the system would be able to collect more informative feedback
information and analyze the entire interaction session as a whole (rather than just a few words in the query)
to understand more precisely the user's information need, which in turn helps the system better
direct the search process and improve search accuracy.
2. Team members
- ChengXiang Zhai (Professor)
- Alexander Kotov (Ph.D. student)
- Yuanhua Lv (Ph.D. student)
- Parikshit Sondhi (Ph.D. student)
- V.G.Vinod Vydiswaran (Ph.D. student)
- Xuanhui Wang (Ph.D. student)
3. Selected Major Research Results
- Negative feedback: When the initial search results are extremely poor, all the top-ranked documents
may be non-relevant. To help users in such a situation, we studied how to learn from the top-ranked non-relevant documents (i.e., negative feedback) to improve the ranking of the unseen documents and proposed effective
methods for negative feedback [Wang et al. SIGIR 08].
- Browsing with multi-resolution topic map (Systematic Query Suggestion): When querying doesn't work well, browsing can be very useful.
Current search engines support browsing mainly in two ways, i.e., through hyperlinks and manually fixed categories. We proposed
to build multi-resolution topic maps to enable a user to navigate into relevant information without needing to formulate queries, thus
users can benefit from browsing when they cannot formulate effective queries. A topic map can also be regarded as a way to
systematically suggest queries to users. We further propose to construct a multi-resolution topic map based on search logs, so that future users can follow the "footprints" left by previous users in the information space, achieving social surfing. Evaluation shows that users can find
relevant information more effectively with navigation than with query reformulation, especially when the queries are difficult [Wang et al. CIKM 2008, Wang et al. CIKM 2009]. Click here
to see more information about social surfing.
- Natural question guided search: We studied how to engage users in an
interactive retrieval system naturally through presenting clarification
questions to users as supplementary information to regular search results.
Such a question-guided search paradigm not only enables a user to reach
directly the answers to questions related to the information need, but
also serves as a natural way to refine the user's interests. Accurate
determination of these questions may substantially improve the quality
of search results and usability of search interfaces. We proposed a new
framework for question-guided search, in which a retrieval
system would automatically generate potentially interesting
questions to the users. Since the answers to such questions
are known to exist in search results, these questions can potentially
guide the users directly to the answers they are
looking for, eliminating the need to scan the documents in
the results list. Moreover, in case of imprecise or ambiguous
queries, automatically generated questions can naturally
engage the users into feedback cycles to refine their information
need and guide them towards their search goals. Implementation
of the proposed strategy raises new challenges
in content indexing, question generation, ranking and feedback.
We proposed new methods to address these challenges
and evaluated them with a prototype system on a subset of
Wikipedia. We evaluated these methods with a user study.
Experiment
results show that the proposed method for
question-based query refinement allows the users to more
easily navigate in search results and effectively explore the
results space in an interactive and natural way [Kotov & Zhai WWW 2010].
- Maryam Karimzadehgan, ChengXiang Zhai,
Estimation of Statistical Translation Models Based on Mutual Information for Ad Hoc Information Retrieval ,
Proceedings of the 33rd Annual International ACM SIGIR Conference on Research and
Development in Information Retrieval ( SIGIR'10 ), pages 323-330, 2010.
( 16.7% acceptance) pdf
- Yuanhua Lv, ChengXiang Zhai, Positional Relevance Model for Pseudo-Relevance Feedback ,
Proceedings of the 33rd Annual International ACM SIGIR Conference on Research and
Development in Information Retrieval ( SIGIR'10 ), pages 579-586, 2010.
( 16.7% acceptance) pdf
- Alexander Kotov, ChengXiang Zhai, Towards Natural Question-Guided Search,
Proceedings of the World Wide Conference 2010 ( WWW'10), pages 541-550. pdf
- Xuanhui Wang, Bin Tan, Azadeh Shakery, ChengXiang Zhai, Beyond Hyperlinks: Organizing Information Footprints in Search Logs to Support Effective Browsing,
Proceedings of the 18th ACM International Conference on Information and Knowledge Management ( CIKM'09), pages 1237-1246, 2009.
( full paper, 14.5% acceptance) pdf
- Yuanhua Lv, ChengXiang Zhai, Adaptive Relevance Feedback in Information Retrieval,
Proceedings of the 18th ACM International Conference on Information and Knowledge Management ( CIKM'09), pages 255-264, 2009.
( full paper, 14.5% acceptance) pdf
- Yuanhua Lv, ChengXiang Zhai, Positonal Language Models for Information Retrieval,
Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and
Development in Information Retrieval ( SIGIR'09 ), pages 299-306, 2009.
( 16% acceptance) pdf
- Xuanhui Wang, ChengXiang Zhai, Mining term association patterns from search logs for effective query reformulation,
Proceedings of the 17th ACM International Conference on Information and Knowledge Management ( CIKM'08), pages 479-488.
(17% acceptance)
- Xuanhui Wang, Hui Fang, ChengXiang Zhai.
A study of methods for negative relevance feedback ,
Proceedings of the 31st Annual International ACM SIGIR Conference on Research and
Development in Information Retrieval ( SIGIR'08 ), pages 219-226.
( 17% acceptance)
5. Funding Support
- National Science Foundation, grant IIS-0713581
- Microsoft Research and Microsoft adCenter Research, "Beyond Search" grant
- Yahoo! Ph.D. Fellowship
- Alfred P. Slown Research Fellowship
[ Team ]
[ Results ]
[ Publications ]
[ Funding ]