Wednesday, July 24, 2013

Taxonomic Classification Will Help Google Find the Real Context of Words

One of the most difficult tasks for the search engines is to find the real context of words contained in a web document. For example, the word 'close' has two different meanings. The first meaning is 'to shut down' and the second meaning is 'nearby '. As both the words are spelled the same, it becomes difficult for search engines to determine the underlying meaning of the words.

Taxonomic Classification and the Golden Set

Taxonomic classification is a technique which uses a hierarchical and tree like structure in order to classify every word that falls under a category contained in the taxonomy. The taxonomy itself is a large set of  documents which requires human readers to identify specific words within a known set of documents and associate the words under separate labels which may then form a different set known as "golden set" or the "training set". A new classifier model gets developed in conjunction with the original taxonomy.

In layman terms, the model specifically deals with identifying particular labels and forming a hierarchical structure containing "top level" and further "lower level" categories that may be used to process and identify the correct meaning of the word. This will be a continuous process and the model is an ever growing model with new words constantly being added as labels.


Here is a short example of how this model will work?

Foreign or Domestic Cars
Make of the Car
Model of the Car
Color of the Car
Car Price

Now, the interesting part of this classification is that, the data forming the part of the documents will be obtained by websites. A branded website having proper categorization of cars and models might be used to produce a golden set; or data from several websites together might be used for this task.

Summary of the Patent

Here are the screenshots of the summary of the patent:-

Full Patent information can be viewed here:-

Training Set Construction for Taxonomic Classification

Inventors:Juang; Philo (Los Angeles, CA), Testa; Christopher (Venice, CA), Mote; Nicolaus (Los Angeles, CA)

Juang; Philo
Testa; Christopher
Mote; Nicolaus

Los Angeles
Los Angeles


Assignee:Google Inc. (Mountain View, CA) 
Family ID:45572099
Appl. No.:13/350,213
Filed:January 13, 2012

Also See:- 

Location Relevance System and Relevancy Score to Power Local Search Results
Google Patent to Identify Erroneous Business Listings
Google Granted Patent for Detecting Hidden Texts and Hidden Links
New Google Patent to Identify Spam in Information Collected From a Source
Google Patent Named Ranking Documents to Penalize Spammers
Rich Snippets in Google
How to Add Ratings and Review Stars on Google Search Results
Query Highlighting on Google Search Results
List of Google Search Operators
New Google Search Quality Updates
Google Search Tips and Tricks
Post a Comment