Improving web search relevance with learning structure of domain concepts
Department or Administrative Unit
This paper addresses the problem of improving the relevance of a search engine results in a vertical domain. The proposed algorithm is built on a structured taxonomy of keywords. The taxonomy construction process starts from the seed terms (keywords) and mines the available source domains for new terms associated with these entities. These new terms are formed in several steps. First the snippets of answers generated by the search engine are parsed producing parsing trees. Then commonalities of these parsing trees are found by using a machine learning algorithm. These commonality expressions then form new keywords as parameters of existing keywords and are turned into new seeds at the next learning iteration. To match NL expressions between source and target domains, the proposed algorithm uses syntactic generalization, an operation which finds a set of maximal common sub-trees of constituency parse trees of these expressions. The evaluation study of the proposed method revealed the improvement of search relevance in vertical and horizontal domains. It had shown significant contribution of the learned taxonomy in a vertical domain and a noticeable contribution of a hybrid system (that combines of taxonomy and syntactic generalization) in the horizontal domains. The industrial evaluation of a hybrid system reveals that the proposed algorithm is suitable for integration into industrial systems. The algorithm is implemented as a component of Apache OpenNLP project.
Galitsky, B. A., & Kovalerchuk, B. (2014). Improving Web Search Relevance with Learning Structure of Domain Concepts. In F. Aleskerov, B. Goldengorin, & P. M. Pardalos (Eds.), Clusters, Orders, and Trees: Methods and Applications (pp. 341–376). Springer Science. https://doi.org/10.1007/978-1-4939-0742-7_21
© Springer Science+Business Media New York 2014