Skip to main content

Co-occurrence Frequencies - How Google May Identify Substitute Terms of a Query Term?

Google may use query revision engines in order to revise search queries and include substitute terms of the main query term. This is done in order to identify particular web documents that are responsive to the main query. The query revision engine consists of query revisers that picks the candidate substitute terms that are put into comparison program in order to find out co-occurrence frequencies that gets the final substitute term.

Google has been granted a patent that describes in detail about the process of finding out substitute queries and helping the users get exactly what they are looking for? This is not the same as "finding out synonyms", but instead, substitute terms match the main query word after determining its context using various methods.

The Process of Finding Out Substitute Queries

Step 1 - The search query enters the query revision engine.
Step 2 - The query revisers finds out the candidate substitute terms according to various criteria.
Step 3 - A substitute query is found out by determining and comparing the co-occurrence frequencies of the candidate substitute term with the co-occurrence frequencies of the main query term.

What are Co-occurrence Frequencies?

Google compares the occurrence of words that are contained in the original document with the words contained in the document where the candidate substitute query exists. This is done by constructing vectors and comparing the vectors together.

Here is a summary of the patent that will make things more clear.

Google co-occurrence summary

Google co-occurrence summary continued

Full Patent information can be viewed here:- 

Evaluation of substitute terms

Inventors:Ikeda; Daisuke (Tokyo, JP), Yang; Ke (Cupertino, CA)

Ikeda; Daisuke
Yang; Ke



Assignee:Google Inc. (Mountain View, CA) 
Appl. No.:13/438,743
Filed:April 3, 2012

Recommended Reading:- Google May Substitute Query Terms with Co-occurrence

Also See:- 

Google Patent to Identify Erroneous Business Listings
New Google Patent to Identify Spam in Information Collected From a Source
Google Patent Named Ranking Documents to Penalize Spammers
Types of Messages Sent by Google Webspam Team
Page Highjacking
When Does Google Displays Alternative Titles and Descriptions
Seo Tutorial