Monday, June 4, 2012

Web Crawlers,Spiders,Bots,Indexers

A Web crawler is an automated computer program used to crawl the web and store references to web pages that it finds while crawling. It is used by the search engines for the creation and updation of its web index. The searchengines uses several programs for the indexing and processing of the web pages but the main task of a web crawler is to find references to web pages in the form of hyperlinks and visit them in order to help the search engines build the web index.

Other names of web crawler

Spider
Bots
Indexer
Robots
Document Indexer
Hit list

How a Web Crawler works?

The process of web crawling is described below:-
1-    Web crawler finds new hyperlinks present on a web page with the help of the URL server.
2-    The crawler follows the links and sends information to another server which then stores its reference and inner links.
3-      It repeats the first two steps for every new hyperlink that it finds.

Examples of Web Crawlers

Google - Googlebot
Yahoo- Yahoo Slurp
Bing- Bingbot

Types of Crawling

General Crawling - The crawler starts with a single url and finds and fetches new hyperlinks.
Distributed Crawling - The process of crawling is distributed amongst several crawlers.
Focused Crawling- The crawler fetches only predefined set of hyperlinks.

Post a Comment