What are Search Engines?

Search Engine has become a "generic" term used to refer to both crawler-based (Google) and directory-based (Yahoo!) site search companies. The two types of searches produce listing in significantly different ways and therefore can produce widely varying results. Search engines create listings automatically by crawling website and compiling information about that site into an index. When "searching" one of these indices, results are presented giving emphasis on specific words. The methodology that determines specific results from this search and delivery is known as an algorithm. And these algorithms are proprietary and confidential to each search engine company.

Crawling a Site

Search Engines that use bots or spiders to crawl a site create references automatically. People typing in "key?€? words find those words in the sites list of referenced words and the crawler-based sites serve those pages where those words are referenced.

These kinds of sites like to see site content changed on a frequent basis as changes are seen as an indication of new and revised information. A site that uses bots or spiders to crawl a site eventually finds these changes and that can affect your ranking in their listings. How you lay out your pages matters. Page titles, your page content, links to your site, and other items are all important in determining your rank relevance to the words being referenced.

Crawler-Based Search Engines Dissected

The first of 3 major components to crawler-based search engines is the spider (or bot) known as the crawler. A bot visits a site, reads the main words while eliminating all connector words and follows all links it can find to other pages within the site. If you change one of your web pages, search engines eventually find those changes, which can affect how you are listed. Bots return to sites on a schedule known only to the search engine company, but more frequently to sites that continually update content.

The words found by the bots are placed in a database called an index or catalog ?€" the second component of a crawler-based search engine. This is where your keywords for each page indexed by a search engine will be searched and produces a results page and the order in which that page will be displayed. If a page is changed, then this index is updated with that new information the next time a bot visits the site.

New sites and new pages can take some time before a spider finds and adds the page to the index, and depending on how the site is constructed and whether or not a spider can "read?€? a page or links to a page, the page may never be found, read, or indexed. And pages are not immediately added to the index as soon as they have been crawled. It may take some time to get from the read stage to the index. Until the content is added to the index, the content is not available to the search engines.

Search software may be the most important component to the user in producing relevant results from the indexed content. The software is a program that searches through billions of pages of indexed content finding matches and determining relevance to the terms searched. Each search engine handles this task differently and that is why each may produce varying results. You may rank near the top on one and be nearly invisible on another.

Internet search engines are special sites on the Web that are designed to help people find information stored on other sites. They accept queries supplied by web users and return a list of resources that best fit the query of the user. There are differences in the ways various search engines work, but they all perform three basic tasks:

They search the Internet -- or select pieces of the Internet -- based on important words.
They keep an index of the words they find, and where they find them.
They allow users to look for words or combinations of words found in that index.

Early search engines held an index of a few hundred thousand pages and documents, and received maybe one or two thousand inquiries each day. Today, a top search engine will index hundreds of millions of pages, and respond to tens of millions of queries per day

A search engine is a coordinated set of programs that includes:

A spider (also called a "crawler" or a "bot") that goes to every page or representative pages on every Web site that wants to be searchable and reads it, using hypertext links on each page to discover and read a site's other pages
A program that creates a huge index (sometimes called a "catalog") from the pages that have been read
A program that receives your search request, compares it to the entries in the index, and returns results to you

Major components of crawler-based search engines

Crawler-based search engines have three major components.

1) The crawler: Also called the spider. The spider visits a web page, reads it, and then follows links to other pages within the site. The spider will return to the site on a regular basis, such as every month or every fifteen days, to look for changes.

2) The index: Everything the spider finds goes into the second part of the search engine, the index. The index will contain a copy of every web page that the spider finds. If a web page changes, then the index is updated with new information.

3) The search engine software: This is the software program that accepts the user-entered query, interprets it, and sifts through the millions of pages recorded in the index to find matches and ranks them in order of what it believes is most relevant and presents them in a customizable manner to the user.

All crawler-based search engines have the basic parts described above, but there are differences in how these parts are tuned. That is why the same search on different search engines often produces different results. Our comparisons will then be based on these differences in all three parts.

All search engines contain the following main components:

Spider	A browser-like programme that downloads web pages
Crawler	A program that automatically follows all of the links on each web page
Indexer	A program that analyzes web pages downloaded by the spider and the crawler
Database	Storage for downloaded and processed pages
Results engine	Extracts search results from the database
Web server	A server that is responsible for interaction between the user and other search engine components

Different types of search engines

When people mention the term "search engine", it is often used generically to describe both crawler-based search engines and human-powered directories however these two types of search engines gather their listings in radically different ways and therefore are inherently different.

Crawler-based search engines, create their listings automatically by using a piece of software to “crawl” or “spider” the web and then index what it finds to build the search base. Web page changes can be dynamically caught by crawler-based search engines and will affect how these web pages get listed in the search results. Crawler-based search engines are those that use automated software agents (called crawlers) that visit a Web site, read the information on the actual site, read the site's meta tags and also follow the links that the site connects to performing indexing on all linked Web sites as well. The crawler returns all that information back to a central depository, where the data is indexed. The crawler will periodically return to the sites to check for any information that has changed. The frequency with which this happens is determined by the administrators of the search engine.

Google, AllTheWeb and AltaVista,

Crawler-based search engines are good when you have a specific search topic in mind and can be very efficient in finding relevant information in this situation. However, when the search topic is general, crawler-base search engines may return hundreds of thousands of irrelevant responses to simple search requests, including lengthy documents in which your keyword appears only once.

Human-powered directories, such as the Yahoo directory, Open Directory and LookSmart, depend on human editors to create their listings. Typically, webmasters submit a short description to the directory for their websites, or editors write one for the sites they review, and these manually edited descriptions will form the search base. Therefore, changes made to individual web pages will have no effect on how these pages get listed in the search results. Human-powered search engines rely on humans to submit information that is subsequently indexed and catalogued. Only information that is submitted is put into the index.

Human-powered directories are good when you are interested in a general topic of search. In this situation, a directory can guide and help you narrow your search and get refined results. Therefore, search results found in a human-powered directory are usually more relevant to the search topic and more accurate. However, this is not an efficient way to find information when a specific search topic is in mind.

Meta-search engines, such as Dogpile, Mamma, and Metacrawler, transmit user-supplied keywords simultaneously to several individual search engines to actually carry out the search. Search results returned from all the search engines can be integrated, duplicates can be eliminated and additional features such as clustering by subjects within the search results can be implemented by meta-search engines.

Meta-search engines are good for saving time by searching only in one place and sparing the need to use and learn several separate search engines. "But since meta-search engines do not allow for input of many search variables, their best use is to find hits on obscure items or to see if something can be found using the Internet.

No.	Name	Language	Website
1	Blingo	English	blingo.com
2	Yippy (formerly Clusty)	English	yippy.com
3	DeeperWeb	English	DeeperWeb.com
4	Dogpile	English	dogpile.com
5	Excite	English	[3]
6	Harvester42
7	HotBot	English	hotbot.com
8	Info.com	English	info.com
9	Ixquick (StartPage)	Multilingual	startpage.com
10	Kayak and SideStep	Multilingual	kayak.com
11	Mamma		mamma.com
12	Metacrawler	English	metacrawler.com
13	Mobissimo	Multilingual	mobissimo.com
14	Otalo	English	otalo.com
15	PCH Search and Win		search.pah.com
17	WebCrawler	English	webcrawler.com

Web Search Engines

Typically, Web search engines work by sending out a spider to fetch as many documents as possible. Another program, called anindexer, then reads these documents and creates an index based on the words contained in each document. Each search engine uses a proprietary algorithm to create its indices such that, ideally, only meaningful results are returned for each query.

Definition of:browser rendering engine

Software that renders HTML pages (Web pages). It turns the HTML layout tags in the page into the appropriate commands for the operating system, which causes the formation of the text characters and images for screen and printer. Also called a "layout engine," a rendering engine is used by a Web browser to render HTML pages, by mail programs that render HTML e-mail messages, as well as any other application that needs to render Web page content. For example, Trident is the rendering engine for Microsoft's Internet Explorer, and Gecko is the engine in Firefox. Trident and Gecko are also incorporated into other browsers and applications. Following is a sampling of browsers and rendering engines. See HTML and render.

                    Rendering
Browser             Engine     Source

Internet Explorer   Trident    Microsoft

AOL Explorer        Trident    Microsoft

Firefox             Gecko      Mozilla

Netscape            Gecko      Mozilla

Safari              WebKit     WebKit

Chrome              WebKit     WebKit

Opera               Presto     Opera

Konqueror           KHTML      KHTML

Building a Search

Searching through an index involves a user building a query and submitting it through the search engine. The query can be quite simple, a single word at minimum. Building a more complex query requires the use of Boolean operators that allow you to refine and extend the terms of the search.

The Boolean operators most often seen are:

AND - All the terms joined by "AND" must appear in the pages or documents. Some search engines substitute the operator "+" for the word AND.
OR - At least one of the terms joined by "OR" must appear in the pages or documents.
NOT - The term or terms following "NOT" must not appear in the pages or documents. Some search engines substitute the operator "-" for the word NOT.
FOLLOWED BY - One of the terms must be directly followed by the other.
NEAR - One of the terms must be within a specified number of words of the other.
Quotation Marks - The words between the quotation marks are treated as a phrase, and that phrase must be found within the document or file.

Human-powered search engines rely on humans to submit information that is subsequently indexed and catalogued. Only information that is submitted is put into the index

CIS UNILORIN

Labels

Saturday, 25 June 2016

Commercial Library

Web Search Engines

Building a Search

No comments:

Post a Comment