Latent Semantic Indexing

Rate this post
Search
engines use a large database to store web pages. It applies a number of
techniques for retrieving the stored data. One such data retrieval technique is
Latent Semantic Indexing, popularly known as “LSI”. This technique is based
upon a mathematical technique known as Singular value decomposition.  This technique is mainly used to find out
words that are used in the same context. This helps to extract the conceptual
content of a body of text by noticing connections between words. For example –
Synonyms
Example
of Latent Semantic Indexing

An easier way to understand this concept is
given below:-
Suppose we have 2 different web pages
containing information related to dog food. The main content (use of semantic
words) of the 2 pages is given as follows:-
Page A- Used words – (Dogs, dog food, meat,
diet, pedigree, foods, breed, breeding, canine, meds, and cat)
Page B- Used words- (Dogs, dog meal, pets, dog
food information, pet food, nutrition, dog health, breeders, Great Dane, German
shepherd, Pug, Cocker Spaniel, grains, meats, quacker oats, bone ,meal, raw
food, samples, biscuits , wheat gluten, meat inspection act etc)
These 2 pages when retrieved from the
database would clearly indicate that Page B is more relevant to the user query “Dog
food” as it contains more similar words gathered with the help of the process
of LSI.
Please
note
: – LSI often returns relevant documents that
don’t contain the keyword at all.