Abstract
Modern search is heavily powered by knowledge bases, but users still query using keywords or natural language. As search becomes increasingly dependent on the integration of text and knowledge, novel approaches for a unified representation of combined data present the opportunity to unlock new ranking strategies. We have previously proposed the graph-of-entity as a purely graph-based representation and retrieval model, however this model would scale poorly. We tackle the scalability issue by adapting the model so that it can be represented as a hypergraph. This enables a significant reduction of the number of (hyper)edges, in regard to the number of nodes, while nearly capturing the same amount of information. Moreover, such a higher-order data structure, presents the ability to capture richer types of relations, including nary connections such as synonymy, or subsumption. We present the hypergraph-of-entity as the next step in the graph-of-entity model, where we explore a ranking approach based on biased random walks. We evaluate the approaches using a subset of the INEX 2009 Wikipedia Collection. While performance is still below the state of the art, we were, in part, able to achieve a MAP score similar to TF-IDF and greatly improve indexing efficiency over the graph-of-entity.
References
[1] Gomes F., Devezas J., Figueira Á., Temporal visualization of a multidimensional network of news clips, In: Advances in Information Systems and Technologies, Springer, 2013, 157–16610.1007/978-3-642-36981-0_15Search in Google Scholar
[2] Belkin N. J., Croft W. B., Information filtering and information retrieval: Two sides of the same coin?, In: Communications of the ACM, 1992, 35(12), 29–3810.1145/138859.138861Search in Google Scholar
[3] Bautin M., Skiena S., Concordance-based entity-oriented search, In: IEEE/WIC/ACM Conference on Web Intelligence (WI’07), 2007, 2–510.1109/WI.2007.84Search in Google Scholar
[4] Blanco R., Lioma C., Graph-based term weighting for information retrieval, In: Information Retrieval, 2012, 15(1), 54–9210.1007/s10791-011-9172-xSearch in Google Scholar
[5] Rousseau F., Vazirgiannis M., Graph-of-word and TW-IDF: New approach to ad hoc IR, In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, ACM, 2013, 59–6810.1145/2505515.2505671Search in Google Scholar
[6] Bu J., Tan S., Chen C., Wang C., Wu H., Zhang L., He X., Music recommendation by unified hypergraph: Combining social media information and music content, In: Proceedings of the 18th International Conference on Multimedia, Firenze, Italy, October 25-29, 2010, 391–40010.1145/1873951.1874005Search in Google Scholar
[7] Xiong C., Callan J., Liu T., Word-entity duet representations for document ranking, In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, August 7-11, 2017, 763–77210.1145/3077136.3080768Search in Google Scholar
[8] Bast H., Buchhold B., Haussmann E., Semantic search on text and knowledge bases, In: Foundations and Trends® in Information Retrieval, 2016, 10(2-3), 119–27110.1561/1500000032Search in Google Scholar
[9] Schenkel R., Suchanek F. M., Kasneci G., YAWN: A semantically annotated Wikipedia XML corpus, In: Datenbanksysteme in Business, Technologie und Web (BTW 2007), 12. Fachtagung des GIFachbereichs “Datenbanken und Informationssysteme” (DBIS), Proceedings, 7.-9. März 2007, Aachen, Germany, 2007, 277–291Search in Google Scholar
[10] Luhn H. P., A statistical approach to mechanized encoding and searching of literary information, In: IBM Journal of Research and Development, 1957, 1(4), 309–31710.1147/rd.14.0309Search in Google Scholar
[11] Sparck Jones K., A statistical interpretation of term specificity and its application in retrieval, In: Journal of Documentation, 1972, 28(1), 11–2110.1108/eb026526Search in Google Scholar
[12] Robertson S. E., Walker S., Jones S., Hancock-Beaulieu M., Gat-ford M., Okapi at TREC-3, In: Proceedings of The Third Text Retrieval Conference, TREC 1994, Gaithersburg, Maryland, USA, November 2-4, 1994, 109–126Search in Google Scholar
[13] Ponte J. M., Croft W. B., A language modeling approach to information retrieval, In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, August 24-28 1998, 275–28110.1145/290941.291008Search in Google Scholar
[14] Amati G., van Rijsbergen C. J., Probabilistic models of information retrieval based on measuring the divergence from randomness, In: ACM Transactions on Information Systems, 2002, 20(4), 357–38910.1145/582415.582416Search in Google Scholar
[15] Kraaij W., Westerveld T., Hiemstra D., The importance of prior probabilities for entry page search, In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland, August 11-15, 2002, 27–3410.1145/564376.564383Search in Google Scholar
[16] Westerveld T., KraaijW., Hiemstra D., Retrieving web pages using content, links, URLs and anchors, In: Proceedings of The Tenth Text Retrieval Conference, TREC 2001, Gaithersburg, Maryland, USA, November 13-16, 2001Search in Google Scholar
[17] Brin S., Page L., The anatomy of a large-scale hypertextual web search engine, In: Computer Networks, 1998, 30(1-7), 107–11710.1016/S0169-7552(98)00110-XSearch in Google Scholar
[18] Badache I., Boughanem M., A priori relevance based on quality and diversity of social signals, In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, Santiago, Chile, August 9-13, 2015, 731–73410.1145/2766462.2767807Search in Google Scholar
[19] Badache I., Boughanem M., Emotional social signals for search ranking, In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, August 7-11, 2017, 1053–105610.1145/3077136.3080718Search in Google Scholar
[20] Macdonald C., Ounis I., Voting for candidates: Adapting data fusion techniques for an expert search task, In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management, Arlington, Virginia, USA, November 6-11, 2006, 387–39610.1145/1183614.1183671Search in Google Scholar
[21] Fang Y., Si L., Related entity finding by unified probabilistic models, In: World Wide Web, 2015, 18(3), 521–54310.1007/s11280-013-0267-8Search in Google Scholar
[22] Davison B. D., Topical locality in the web, In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, Greece, July 24-28, 2000, 272–27910.1145/345508.345597Search in Google Scholar
[23] Raiber F., Kurland O., Exploring the cluster hypothesis, and cluster-based retrieval, over the web, In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, Maui, HI, USA, October 29 - November 02, 2012, 2507–251010.1145/2396761.2398678Search in Google Scholar
[24] Hogan A., Harth A., Decker S., ReConRank: A scalable ranking method for semantic web data with context, In: Proceedings of Second International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS 2006), in conjunction with International Semantic Web Conference (ISWC 2006), 2006Search in Google Scholar
[25] Balmin A., Hristidis V., Papakonstantinou Y., ObjectRank: Authority-based keyword search in databases, In: (e)Proceedings of the Thirtieth International Conference on Very Large Data Bases, Toronto, Canada, August 31 - September 3, 2004, 564–57510.1016/B978-012088469-8.50051-6Search in Google Scholar
[26] Nie Z., Zhang Y., Wen J., Ma W., Object-level ranking: Bringing order to web objects, In: Proceedings of the 14th International Conference on World Wide Web, Chiba, Japan, May 10-14, 2005, 567–57410.1145/1060745.1060828Search in Google Scholar
[27] Chakrabarti S., Dynamic personalized PageRank in entity-relation graphs, In: Proceedings of the 16th International Conference on World Wide Web, Banff, Alberta, Canada, May 8-12, 2007, 571–58010.1145/1242572.1242650Search in Google Scholar
[28] Delbru R., Toupikov N., Catasta M., Tummarello G., Decker S., Hierarchical link analysis for ranking web data, In: The Semantic Web: Research and Applications, 7th Extended Semantic Web Conference, ESWC 2010, Heraklion, Crete, Greece, May 30 – June 3, 2010, Proceedings, Part II, 2010, 225–23910.1007/978-3-642-13489-0_16Search in Google Scholar
[29] Raviv H., Kurland O., Carmel D., Document retrieval using entity-based language models, In: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, Pisa, Italy, July 17-21, 2016, 65–7410.1145/2911451.2911508Search in Google Scholar
[30] Neumayer R., Balog K., Nřrvĺg K., On the modeling of entities for ad-hoc entity search in the web of data, In: Advances in Information Retrieval - 34th European Conference on IR Research, ECIR 2012, Barcelona, Spain, April 1-5, 2012, Proceedings, 2012, 133–14510.1007/978-3-642-28997-2_12Search in Google Scholar
[31] Lin B., Rosa K. D., Shah R., Agarwal N., LADS: Rapid development of a learning-to-rank based related entity finding system using open advancement, In: The First International Workshop on Entity-Oriented Search (EOS), 2011Search in Google Scholar
[32] Schuhmacher M., Dietz L., Ponzetto S. P., Ranking entities for web queries through text and knowledge, In: Proceedings of the 24th ACM International Conference on Information and Knowledge Management, Melbourne, VIC, Australia, October 19-23, 2015, 1461–147010.1145/2806416.2806480Search in Google Scholar
[33] Chen J., Xiong C., Callan J., An empirical study of learning to rank for entity search, In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, Pisa, Italy, July 17-21, 2016, 737–74010.1145/2911451.2914725Search in Google Scholar
[34] Tonon A., Demartini G., Cudré-Mauroux P., Combining inverted indices and structured search for ad-hoc object retrieval, In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, Portland, OR, USA, August 12-16, 2012, 125–13410.1145/2348283.2348304Search in Google Scholar
[35] Cao L., Guo J., Cheng X., Bipartite graph based entity ranking for related entity finding, In: Proceedings of the 2011 IEEE/WIC/ACM International Conference on Web Intelligence, Campus Scientifique de la Doua, Lyon, France, August 22-27, 2011, 2011, 130–13710.1109/WI-IAT.2011.60Search in Google Scholar
[36] Raviv H., Kurland O., Carmel D., The cluster hypothesis for entity oriented search, in: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2013, 841–84410.1145/2484028.2484128Search in Google Scholar
[37] Bron M., Balog K., de Rijke M., Example based entity search in the web of data, In: Advances in Information Retrieval – 35th European Conference on Information Retrieval, ECIR 2013, Moscow, Russia, March 24-27, 2013, Proceedings, 2013, 392–40310.1007/978-3-642-36973-5_33Search in Google Scholar
[38] Pound J., Mika P., Zaragoza H., Ad-hoc object retrieval in the web of data, In: Proceedings of the 19th International Conference on World Wide Web, ACM, 2010, 771–78010.1145/1772690.1772769Search in Google Scholar
[39] Devezas J., Coelho F., Nunes S., Ribeiro C., Music Discovery: Exploiting TF-IDF to boost results in the long tail of the tags distribution, 2013Search in Google Scholar
[40] Arvola P., Geva S., Kamps J., Schenkel R., Trotman A., Vainio J., Overview of the INEX 2010 ad hoc track, In: Comparative Evaluation of Focused Retrieval - 9th International Workshop of the Inititative for the Evaluation of XML Retrieval, INEX 2010, Vugh, The Netherlands, December 13-15, 2010, Revised Selected Papers, 2010, 1–3210.1007/978-3-642-23577-1_1Search in Google Scholar
[41] Demartini G., Iofciu T., de Vries A. P., Overview of the INEX 2009 entity ranking track, In: Focused Retrieval and Evaluation, 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, Brisbane, Australia, December 7-9, 2009, Revised and Selected Papers, 2009, 254–26410.1007/978-3-642-14556-8_26Search in Google Scholar
[42] Clarke C. L. A., Craswell N., Soboroff I., Overview of the TREC 2009 web track, In: Proceedings of The Eighteenth Text Retrieval Conference, TREC 2009, Gaithersburg,Maryland, USA, November 17-20, 2009Search in Google Scholar
[43] Balog K., Serdyukov P., de Vries A. P., Overview of the TREC 2011 entity track, In: Proceedings of The Twentieth Text REtrieval Conference, TREC 2011, Gaithersburg, Maryland, USA, November 15-18, 2011Search in Google Scholar
[44] Campinas S., Ceccarelli D., Perry T. E., Delbru R., Balog K., Tummarello G., The Sindice-2011 dataset for entity-oriented search in the web of data, In: The First International Workshop on Entity-Oriented Search (EOS), 2011, 26–32Search in Google Scholar
[45] Dkaki T., Mothe J., Truong Q. D., Passage retrieval using graph vertices comparison, In: Proceedings of the 2007 Third International IEEE Conference on Signal-Image Technologies and Internet-Based System, Shanghai, China, December 16-18, 2007, 71–7610.1109/SITIS.2007.82Search in Google Scholar
[46] Page L., Brin S., Motwani R., Winograd T., The PageRank citation ranking: Bringing order to the web, Technical report, Stanford InfoLab, 1999Search in Google Scholar
[47] Khurana U., Deshpande A., Efficient snapshot retrieval over historical graph data, In: Proceedings of the 2013 IEEE International Conference on Data Engineering (ICDE 2013), Brisbane, Australia, April 8-12, 2013, 997–100810.1109/ICDE.2013.6544892Search in Google Scholar
[48] Martins B., Silva M. J., A Graph-Ranking Algorithm for Geo-Referencing Documents, In: Proceedings of the Fifth IEEE International Conference on Data Mining, Houston, Texas, USA, 27-30 November, 2005, 741–744Search in Google Scholar
[49] Zhu Y., Yan E., Song I., A natural language interface to a graph-based bibliographic information retrieval system, In: Data & Knowledge Engineering, 2017, 111, 73–8910.1016/j.datak.2017.06.006Search in Google Scholar
[50] Blanco R., Mika P., Vigna S., Effective and efficient entity search in RDF data, In: The Semantic Web – ISWC2011 – 10th International Semantic Web Conference, Bonn, Germany, October 23-27, 2011, Proceedings, Part I, 2011, 83–9710.1007/978-3-642-25073-6_6Search in Google Scholar
[51] Bendersky M., Croft W. B., Modeling higher-order term dependencies in information retrieval using query hypergraphs, In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, Portland, Oregon, USA, 2012, 941–95010.1145/2348283.2348408Search in Google Scholar
[52] Xiong S., Ji D., Query-focused multi-document summarization using hypergraph-based ranking, In: Information Processing & Management, 2016, 52(4), 670–68110.1016/j.ipm.2015.12.012Search in Google Scholar
[53] Haentjens Dekker R., Birnbaum D. J., It’s more than just overlap: Text As Graph, In: Proceedings of Balisage: The Markup Conference 2017, 19, 2017Search in Google Scholar
[54] Cattuto C., Schmitz C., Baldassarri A., Servedio V. D. P., Loreto V., Hotho A., Grahl M., Stumme G., Network properties of folk-sonomies, In: AI Communications, 2007, 20(4), 245–262Search in Google Scholar
[55] Seidman S. B., Structures induced by collections of subsets: A hypergraph approach, In: Mathematical Social Sciences, 1981, 1(4), 381–39610.1016/0165-4896(81)90016-0Search in Google Scholar
[56] Tan S., Bu J., Chen C., Xu B., Wang C., He X., Using rich social media information for music recommendation via hypergraph model, In: ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) – Special Section on ACM Multimedia 2010 Best Paper Candidates, and Issue on Social Media, 2011, 7S(1), 2210.1145/2037676.2037679Search in Google Scholar
[57] McFee B., Lanckriet G. R. G., Hypergraph models of playlist dialects, In: Proceedings of the 13th International Society for Music Information Retrieval Conference, Mosteiro S.Bento Da Vitória, Porto, Portugal, October 8-12, 2012, 343–348Search in Google Scholar
[58] Theodoridis A., Kotropoulos C., Panagakis Y., Music recommendation using hypergraphs and group sparsity, In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, May 26-31, 2013, 56–6010.1109/ICASSP.2013.6637608Search in Google Scholar
[59] von Neumann J., The computer and the brain, Yale University Press, 2012Search in Google Scholar
[60] Sporns O., Networks of the brain, MIT press, 201010.7551/mitpress/8476.001.0001Search in Google Scholar
[61] Davison E. N., Schlesinger K. J., Bassett D. S., Lynall M.-E., Miller M. B., Grafton S. T., Carlson J. M., Brain network adaptability across task states, In: PLOS Computational Biology, 2015, 11(1), 1–1410.1371/journal.pcbi.1004029Search in Google Scholar PubMed PubMed Central
[62] Jie B., Wee C.-Y., Shen D., Zhang D., Hyper-connectivity of functional networks for brain disease diagnosis, In: Medical Image Analysis, 2016, 32, 84–10010.1016/j.media.2016.03.003Search in Google Scholar PubMed PubMed Central
[63] Gu S., Yang M., Medaglia J. D., Gur R. C., Gur R. E., Satterthwaite T. D., Bassett D. S., Functional hypergraph uncovers novel covariant structures over neurodevelopment, In: Human Brain Mapping, 2017, 38(8), 3823–383510.1002/hbm.23631Search in Google Scholar PubMed PubMed Central
[64] Zhang B. T., Random hypergraph models of learning and memory in biomolecular networks: Shorter-term adaptability vs. longer term persistency, In: 2007 IEEE Symposium on Foundations of Computational Intelligence, 2007, 344–34910.1109/FOCI.2007.371494Search in Google Scholar
[65] Goertzel B., Patterns, hypergraphs and embodied general intelligence, In: The 2006 IEEE International Joint Conference on Neural Network Proceedings, 2006, 451–45810.1109/IJCNN.2006.246716Search in Google Scholar
[66] Bellaachia A., Al-Dhelaan M., Random walks in hypergraph, In: Proceedings of the 2013 International Conference on Applied Mathematics and Computational Methods, Venice Italy, 2013, 187–194Search in Google Scholar
[67] Devezas J., Lopes C. T., Nunes S., FEUP at TREC 2017 OpenSearch track: Graph-based models for entity-oriented search, In: The Twenty-Sixth Text REtrieval Conference Proceedings (TREC 2017), Gaithersburg, MD, USA, 2017Search in Google Scholar
[68] Devezas J., Nunes S., Graph-based entity-oriented search: imitating the human process of seeking and cross referencing information, In: ERCIM News, Special Issue: Digital Humanities, 2017, 111, 13–14Search in Google Scholar
[69] Mikolov T., Sutskever I., Chen K., Corrado G. S., Dean J., Distributed representations of words and phrases and their compositionality, In: Proceedings of the 26th International Conference on Neural Information Processing Systems, Lake Tahoe, Nevada, United States, 2013, 2, 3111–3119Search in Google Scholar
[70] Robertson S., Understanding inverse document frequency: On theoretical arguments for IDF, In: Journal of Documentation, 2004, 60(5), 503–52010.1108/00220410410560582Search in Google Scholar
[71] Alhelbawy A., Gaizauskas R., Graph ranking for collective named entity disambiguation, In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2014, 2, 75–8010.3115/v1/P14-2013Search in Google Scholar
[72] Hoffart J., Yosef M. A., Bordino I., Fürstenau H., Pinkal M., Spaniol M., Taneva B., Thater S., Weikum G., Robust disambiguation of named entities in text, In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2011, 782–792Search in Google Scholar
[73] Moro A., Raganato A., Navigli R., Entity linking meets word sense disambiguation: A unified approach, In: Transactions of the Association for Computational Linguistics, 2014, 2, 231–24410.1162/tacl_a_00179Search in Google Scholar
[74] Geva S., Kamps J., Lehtonen M., Schenkel R., Thom J. A., Trotman A., Overview of the INEX 2009 ad hoc track, In: Focused Retrieval and Evaluation, 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, Brisbane, Australia, December 7-9, 2009, Revised and Selected Papers, 2009, 4–2510.1007/978-3-642-14556-8_4Search in Google Scholar
[75] Coelho F., Ribeiro C., Automatic illustration with cross-media retrieval in large-scale collections, In: 2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI), 2011, 25–3010.1109/CBMI.2011.5972515Search in Google Scholar
[76] Liu J., Pasupat P.,Wang Y., Cyphers S., Glass J., Query understanding enhanced by hierarchical parsing structures, In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013, 72–7710.1109/ASRU.2013.6707708Search in Google Scholar
[77] Fogaras D., Rácz B., Csalogány K., Sarlós T., Towards scaling fully personalized PageRank: Algorithms, lower bounds, and experiments, In: Internet Mathematics, 2011, 2(3), 333–35810.1080/15427951.2005.10129104Search in Google Scholar
© 2019 José Devezas et al., published by De Gruyter Open
This work is licensed under the Creative Commons Attribution 4.0 Public License.