Google Is Not Necessarily Best Search Engine

Why Us?

Studies & Opinions
. Appraisal/Valuation
. Infrastructure
. Marketplace
. Monetization
. Protection+Legal
. Search Engines

. Lighter Side
. Other

Connect & Share

Studies & Opinions

Google Is Not Best!

Alex Tajirian
December 19, 2004

Google, the popular search engine, is not alone at the top. Although Google has a slightly larger number of indexed pages than its closest competitor, AlltheWeb -- 3.3 billion vs. 3.2 billion -- the number is misleading in terms of search-result pages. In fact, 42% of keywords lead to more page results from AlltheWeb than from Google, according to a recently conducted study by DomainMart. Furthermore, about 2.3% of keywords retrieve no page results on Google, but retrieve at least 1 page on AlltheWeb.

This finding suggests that you may be better off using AlltheWeb for certain classes of keywords, or at least you should use both search engines to improve the results of your searches.

In general, the quality of search-engine results depends on whether you are interested in current pages or archived pages. Once a Web page is modified or deleted, the previous version is lost when the popular search engines update their databases. If you are interested in archived Web pages, you should use Alexa’s Wayback Machine.

For the most recent information, the quality of results depends on

the number of indexed pages relevant to the keyword searched.
the criteria used to rank the relevancy of pages for a given keyword search. (Google’s initial success was due to assessing the relevance of a Web site based on the number of other sites that link to it).
whether the criteria used to rank sponsored advertising takes into account users as well as advertisers. If ranking is based solely on bids, for example, it might discourage users from clicking on ads, which in turn reduces advertising, leads to a drop in profits of both advertisers and search engines, and can mean the demise of the search engine.
the frequency of updating the ranking index for new content.
the frequency of modifying the ranking criteria, as users start devising techniques to fool the search engines into indexing irrelevant pages. (The first such threat came from Google bombs through blogs.) As with the concept of stock-market efficiency, you cannot keep a winning strategy a secret for a long time. Thus, ranking criteria need to be frequently modified.
a search engine’s ability to organize results according to their position among recognized authoritative sites on a topic (such as Teoma) or to use the sources of information (such as products, news, eBay) to refine results (as with Vivísimo).
a search engine’s ability to learn from the behavior of users (such as the startup Mooter).

Thus, even if a search engine employs a better ranking technique, it is not necessarily better when a significant number of pages are omitted from its search database. Conversely, indexing a large number of pages does not necessarily improve the quality of top search results if the ranking methodology is flawed.

DomainMart’s study was motivated by our continuous effort to improve domain-name pricing through understanding search-engine results in relation to keyword composition of domain names. This study was directed at investigating quality feature (1) above—i.e., the number of indexed pages relevant to the keyword searched.

The sample used in the study consisted of 551 keywords representing domain names sold between January 2003 and January 2004. For each keyword, we recorded the number of search results from Google, AlltheWeb, and AltaVista search engines.

Since the number of indexed pages between Google and AlltheWeb are almost the same, as noted earlier, and since we did not analyze possible commonality among search-result pages in response to a keyword, we decided that a test for equality of averages between two samples did not make a lot of sense. In theory, the page-result data from Google and from AlltheWeb can have no common pages, which makes such tests irrelevant. Moreover, such statistical tests (the standard t-test and Wilcoxon nonparametric signed-rank test) require making specific assumptions about the structure of data that we did not believe appropriate for our data. Thus, we decided to look at intuitive descriptors of the data.

The following table summarizes some of our findings. It shows the proportion of AlltheWeb and AltaVista search results that were larger than Google’s, as well as the proportion of results greater than Google’s by at least 25%, 50%, and 75%. For example, 42% of the keyword searches on AlltheWeb had greater page results than Google, and 32% of those surpassed Google page results by 25%.

Search Engine	Proportion (%) of page results greater than Google’s by at least
Search Engine	25%	50%	75%
AlltheWeb	32	25	20
AltaVista	2	1.8	1.8

Hence, the study indicates that you will find a significant number of pages using AlltheWeb that are not available using Google.

Further research is needed to determine whether there are classes of keywords that will yield greater search results if used with a particular search engine. For example, what is the result differences for single vs. multiple keywords?

Encouraged by the above results, we are studying methodologies to improve searches by combining information from multiple search engines.

Topic tags: Google, search engines

Connect & Share

Top