Google, the popular search engine, is not alone
at the top. Although Google has a slightly larger number of indexed
pages than its closest competitor, AlltheWeb -- 3.3 billion vs.
3.2 billion -- the number is misleading in terms of search-result
pages. In fact, 42% of keywords lead to more page results from AlltheWeb
than from Google, according to a recently conducted study by DomainMart.
Furthermore, about 2.3% of keywords retrieve no page results on
Google, but retrieve at least 1 page on AlltheWeb.
This finding suggests that you may be better off
using AlltheWeb for certain classes of keywords, or at least you
should use both search engines to improve the results of your searches.
In general, the quality of search-engine results
depends on whether you are interested in current pages or archived
pages. Once a Web page is modified or deleted, the previous version
is lost when the popular search engines update their databases.
If you are interested in archived Web pages, you should use Alexa’s Wayback Machine.
For the most recent information, the quality of
results depends on
-
the number of indexed pages relevant
to the keyword searched.
-
the
criteria used to rank the relevancy of pages for a given keyword
search. (Google’s initial success was due to assessing the
relevance of a Web site based on the number of other sites
that link to it).
-
whether the criteria used to rank
sponsored advertising takes into account users as well as
advertisers. If ranking is based solely on bids, for example,
it might discourage users from clicking on ads, which in turn
reduces advertising, leads to a drop in profits of both advertisers
and search engines, and can mean the demise of the search
engine.
-
the
frequency of updating the ranking index for new content.
-
the frequency of modifying the ranking
criteria, as users start devising techniques to fool the search
engines into indexing irrelevant pages. (The first such threat
came from Google bombs through blogs.) As with the concept
of stock-market efficiency, you cannot keep a winning strategy
a secret for a long time. Thus, ranking criteria need to be
frequently modified.
-
a
search engine’s ability to organize results according to their
position among recognized authoritative sites on a topic (such
as Teoma) or to use the sources of information (such as products,
news, eBay) to refine results (as with Vivísimo).
-
a search
engine’s ability to learn from the behavior of users (such
as the startup Mooter).
Thus,
even if a search engine employs a better ranking technique, it is
not necessarily better when a significant number of pages are omitted
from its search database. Conversely, indexing a large number of
pages does not necessarily improve the quality of top search results
if the ranking methodology is flawed.
DomainMart’s study was motivated by our continuous
effort to improve domain-name pricing through understanding search-engine
results in relation to keyword composition of domain names. This
study was directed at investigating quality feature (1) above—i.e.,
the number of indexed pages relevant to the keyword searched.
The sample used in the study consisted of 551 keywords
representing domain names sold between January 2003 and January
2004. For each keyword, we recorded the number of search results
from Google, AlltheWeb, and AltaVista search engines.
Since the number of indexed pages between Google
and AlltheWeb are almost the same, as noted earlier, and since we
did not analyze possible commonality among search-result pages in
response to a keyword, we decided that a test for equality of averages
between two samples did not make a lot of sense. In theory, the
page-result data from Google and from AlltheWeb can have no common
pages, which makes such tests irrelevant. Moreover, such statistical
tests (the standard t-test and Wilcoxon nonparametric signed-rank
test) require making specific assumptions about the structure of
data that we did not believe appropriate for our data. Thus, we decided to look at intuitive descriptors
of the data.
The following table summarizes some of our findings.
It shows the proportion of AlltheWeb and AltaVista search results
that were larger than Google’s, as well as the proportion of results
greater than Google’s by at least 25%, 50%, and 75%. For example,
42% of the keyword searches on AlltheWeb had greater page results
than Google, and 32% of those surpassed Google page results by 25%.
Search Engine |
Proportion (%) of page results greater than Google’s
by at least |
25% |
50% |
75% |
AlltheWeb |
32 |
25 |
20 |
AltaVista |
2 |
1.8 |
1.8 |
Hence, the study indicates that you will find a
significant number of pages using AlltheWeb that are not available
using Google.
Further research is needed to determine whether
there are classes of keywords that will yield greater search results
if used with a particular search engine. For example, what is the
result differences for single vs. multiple keywords?
Encouraged by the above results, we are studying
methodologies to improve searches by combining information from
multiple search engines.
Topic tags: Google, search
engines
Connect & Share
|