Maria Esch, Jinbo Chen, Stephan Weise, Keywan Hassani-Pak, Uwe Scholz, Matthias Lange
October 18, 2016
Information Retrieval (IR) plays a central role in the exploration and interpretation of integrated biological datasets that represent the heterogeneous ecosystem of life sciences. Here, keyword based query systems are popular user interfaces. In turn, to a large extend, the used query phrases determine the quality of the search result and the effort a scientist has to invest for query refinement. In this context, computer aided query expansion and suggestion is one of the most challenging tasks for life science information systems. Existing query front-ends support aspects like spelling correction, query refinement or query expansion. However, the majority of the front-ends only make limited use of enhanced IR algorithms to implement comprehensive and computer aided query refinement workflows. In this work, we present the design of a multi-stage query suggestion workflow and its implementation in the life science IR system LAILAPS. The presented workflow includes enhanced tokenisation, word breaking, spelling correction, query expansion and query suggestion ranking. A spelling correction benchmark with 5,401 queries and manually selected use cases for query expansion demonstrate the performance of the implemented workflow and its advantages compared with state-of-the-art systems.