Tuesday, May 22, 2012

Latent Semantic Indexing

Search engines use a large database to store web pages. It applies a number of techniques for retrieving the stored data. One such data retrieval technique is Latent Semantic Indexing, popularly known as “LSI”. This technique is based upon a mathematical technique known as Singular value decomposition.  This technique is mainly used to find out words that are used in the same context. This helps to extract the conceptual content of a body of text by noticing connections between words. For example - Synonyms

Example of Latent Semantic Indexing

An easier way to understand this concept is given below:-

Suppose we have 2 different web pages containing information related to dog food. The main content (use of semantic words) of the 2 pages is given as follows:-
Page A- Used words – (Dogs, dog food, meat, diet, pedigree, foods, breed, breeding, canine, meds, and cat)
Page B- Used words- (Dogs, dog meal, pets, dog food information, pet food, nutrition, dog health, breeders, Great Dane, German shepherd, Pug, Cocker Spaniel, grains, meats, quacker oats, bone ,meal, raw food, samples, biscuits , wheat gluten, meat inspection act etc)
These 2 pages when retrieved from the database would clearly indicate that Page B is more relevant to the user query “Dog food” as it contains more similar words gathered with the help of the process of LSI.

Please note: - LSI often returns relevant documents that don't contain the keyword at all.
Post a Comment