Skip to main content

How Google Might Connect Topics to Wikipedia Articles Using Probabilistic Entity Linking

The Hummingbird update and the Knowledge Graph form a vital part of the Google semantic search technology. The way Google predicts answers to questions asked by searchers is indeed remarkable. But, Wikipedia and Freebase forms the base of this technology. Google uses probabilistic entity linking technique in order to connect topics to already existing Wikipedia articles. This is what we see for all queries that returns results based on the Knowledge Graph.

Google probably uses a novel efficient Gibbs sampling scheme which can also incorporate side information, such as the Wikipedia graph. This conceptually simple probabilistic approach achieves state-of-the-art performance in entity-linking on the Aida-CoNLL dataset.

The ‘entity-linking’ task involves annotating phrases, also known as mentions, with unambiguous identifiers, referring to topics, concepts or entities. Mapping text to unambiguous references provides a first scalable handle on long-standing problems such as language polysemy and synonymy, and more generally on the task of semantic grounding for language understanding.

The constructed LDA model has each topic associated with a Wikipedia article. Using this ‘Wikipedia-interpretable’ LDA model, the topicword assignments discovered during inference qualify directly for entity linking. The topics are constructed using Wikipedia, and the corresponding parameters remain fixed. This model has one topic per Wikipedia article, resulting in over 4 million topics. Furthermore, the vocabulary size, including mention unigrams and phrases, is also in the order of millions.To ensure efficient inference we propose a novel Gibbs sampling scheme that exploits sparsity in the Wikipedia-LDA model. 

Have a look at the below example: 

Here, the word "croft" is related to 2 entities namely Lara Croft and Robert Craft but linking England, Bat and Inning together, Google can easily recognize, Robert Craft is related to cricket and England. 

Also See: