Taxonomic Classification Will Help Google Find the Real Context of Words

Rate this post
One of the most difficult tasks for the search engines is to find the real context of words contained in a web document. For example, the word ‘close’ has two different meanings. The first meaning is ‘to shut down’ and the second meaning is ‘nearby ‘. As both the words are spelled the same, it becomes difficult for search engines to determine the underlying meaning of the words.

Taxonomic Classification and the Golden Set

Taxonomic classification is a technique which uses a hierarchical and tree like structure in order to classify every word that falls under a category contained in the taxonomy. The taxonomy itself is a large set of  documents which requires human readers to identify specific words within a known set of documents and associate the words under separate labels which may then form a different set known as “golden set” or the “training set”. A new classifier model gets developed in conjunction with the original taxonomy.

In layman terms, the model specifically deals with identifying particular labels and forming a hierarchical structure containing “top level” and further “lower level” categories that may be used to process and identify the correct meaning of the word. This will be a continuous process and the model is an ever growing model with new words constantly being added as labels.

Example

Here is a short example of how this model will work?

Automobiles
|
Foreign or Domestic Cars
|
Make of the Car
|
Model of the Car
|
Color of the Car
|
Car Price

Now, the interesting part of this classification is that, the data forming the part of the documents will be obtained by websites. A branded website having proper categorization of cars and models might be used to produce a golden set; or data from several websites together might be used for this task.

Summary of the Patent

Here are the screenshots of the summary of the patent:-

Full Patent information can be viewed here:-


Training Set Construction for Taxonomic Classification


Inventors: Juang; Philo (Los Angeles, CA), Testa; Christopher (Venice, CA), Mote; Nicolaus (Los Angeles, CA)
Applicant:
Name City State Country Type

Juang; Philo
Testa; Christopher
Mote; Nicolaus
Los Angeles
Venice
Los Angeles
CA
CA
CA
US
US
US
Assignee: Google Inc. (Mountain View, CA) 
Family ID: 45572099
Appl. No.: 13/350,213
Filed: January 13, 2012



Also See:- 

Location Relevance System and Relevancy Score to Power Local Search Results
Google Patent to Identify Erroneous Business Listings
Google Granted Patent for Detecting Hidden Texts and Hidden Links
New Google Patent to Identify Spam in Information Collected From a Source
Google Patent Named Ranking Documents to Penalize Spammers
Rich Snippets in Google
How to Add Ratings and Review Stars on Google Search Results
Query Highlighting on Google Search Results
List of Google Search Operators
New Google Search Quality Updates
Google Search Tips and Tricks