History:
The very first tool used for searching on the Internet was Archie developed in 1990 by
Alan Emtage. The first Web search engine was Wandex developed by Mathew Gray in 1993.
Another very early search engine, Aliweb, also appeared in 1993.The first "full text" crawler-based search engine was WebCrawler, which came out in 1994.-Lycos was also developed in 1994.AltaVista came in 1995 and year 1996 is the born year of Inktomi. ( Two very famous search engines ).Google commercially came in year 1998.
Alan Emtage. The first Web search engine was Wandex developed by Mathew Gray in 1993.
Another very early search engine, Aliweb, also appeared in 1993.The first "full text" crawler-based search engine was WebCrawler, which came out in 1994.-Lycos was also developed in 1994.AltaVista came in 1995 and year 1996 is the born year of Inktomi. ( Two very famous search engines ).Google commercially came in year 1998.
Definition:
-A searchable database of internet files collected by a computer program called “Spider”.-Or a program that searches documents for specified keywords and returns a list of documents in the form of indexed web-page.
Types of Search Engines:
1.1. Individual Search Engines : Uses a spider to collect its own searchable index.
Example: Google
2. Meta Search Engines : Searches multiple individual engines simultaneously. It
does not have its own index database.
Classification based on indexing Technique:
1. Spiders or Crawlers based : Send crawlers out into cyberspace. It visit a web-
site and follow links on it.
2. Human Powered Search Engines or Directories : Rely on human review and
submission about a webpage. Example: Yahoo!
Search
site and follow links on it.
2. Human Powered Search Engines or Directories : Rely on human review and
submission about a webpage. Example: Yahoo!
Search
3. Hybrid Search Engines: Employ both. Example: MSN
Working of Search Engine:
Operates in following order :-
1. Web Crawling : Send out automated programs (robots or scooters or spiders). spiders fetch
web pages from cyberspace and feed to search engine. large search engines like “AltaVista”
sends many parallel spiders.Then indexing is done.After crawling a page, its contents are
stored into a giant database in an indexed format.This content can be: text of web page and
its MetaTags (links,title and description). (Index: Database containing a copy of each
web-page)
Operates in following order :-
1. Web Crawling : Send out automated programs (robots or scooters or spiders). spiders fetch
web pages from cyberspace and feed to search engine. large search engines like “AltaVista”
sends many parallel spiders.Then indexing is done.After crawling a page, its contents are
stored into a giant database in an indexed format.This content can be: text of web page and
its MetaTags (links,title and description). (Index: Database containing a copy of each
web-page)
2. Searching ( or Query Processing): Based on user queries.The engine examine its index to find
out documents that match the best.
out documents that match the best.
3. Page Ranking: Different search engines uses their propriety algorithm. Calculates the most
relevant result for user query and displayed in order of relevance.
relevant result for user query and displayed in order of relevance.
What is a Page Rank?
-Page Rank is a “vote” by all other pages on the web.-A link to a page count as a vote of support.-Google uses program known as “PigeonRank” for its Page Ranking.
Google:
Google began as a research project in Jan 1996 by Larry Page and Sergey Brin.
Originally nick-named “BackRub”, because system checked back-links to estimate a site’s importance.Indexing is performed by a program called Googlebot.Source for names of spiders for different search engine is – HTTP_USER_AGENT.Some terms of Google:-
- robots.txt:
- it is a standard document to stop Googlebot from downloading specified
information from your server.
information from your server.