PDA

View Full Version : Search Engine Data Collection Differences


Overlay
12-29-2009, 05:44 PM
Google has become a household name with respect to search engines, but I notice that I usually get a vastly greater number of responses on search terms by using MSN's Bing. What accounts for the large-scale differences in the number of responses reported? Is a significant portion of the Bing responses repetitive, inactive, or bogus in some way?

bigmack
12-29-2009, 07:17 PM
A good way to A/B is here:
http://www.bing-vs-google.com/

I don't look for quantity as much as results that come close to my intended goal.

wilderness
12-29-2009, 11:18 PM
The former MSN search had reduced the quantity of returned references to a search even prior to the introduction of Bing.

Some times there's a good and bad difference between the two.

Frequently I take searches from my own websites visitor logs and examine the results (if it's a topic that I'm looking to expand my awareness on).
Then immediately afterwards, I take the provided search terms and hone them into a more effective search on what the person was looking for.

Both items are then sent in a sort of FYI to a harness discussion list.

Here's a recent comparison (as your able to see, the results are quite different:

Somebody did a search using the MS-Bing and with subtle emphasis on the
"fifties".
roosevelt raceway in the fifties (http://www.bing.com/search?q=roosevelt+raceway+in+the+fifties&FORM=MSNH11&qs=n)

dismal results.
Bing does NOT have an option for date reference in their advanced search
options (1950..1959) (i.e., fifities) that Google does.

Google performed much better by enclosing "Roosevelt raceway" in quotes and adding the 1950..1959 (http://www.google.com/#hl=en&source=hp&q=%22roosevelt+raceway%22+1950..1959&btnG=Google+Search&aq=f&aqi=&oq=&fp=e8aec8f715611eed)

wilderness
12-29-2009, 11:25 PM
Most people when doing simple searches FAIL to utilize quotes around names.

Here's an explanation I've had online for some time (http://www.mi-harness.com/srchhlp.html)

The simplicity of putting quote around "harness racing" in a search and harness racing absent the quotes may be day and night.
You may also add multiple quotes using multiple terms, seperated by a trailing blank, then a + sign, then a trailing blank with the 2d quoted term.
EX:
"John Campbell" + "harness racing"
or
"john Campbell" + "New Jersey"

I've even used three quoted multiple wordsets.

chickenhead
12-31-2009, 02:40 PM
Google has become a household name with respect to search engines, but I notice that I usually get a vastly greater number of responses on search terms by using MSN's Bing. What accounts for the large-scale differences in the number of responses reported? Is a significant portion of the Bing responses repetitive, inactive, or bogus in some way?

I've read elsewhere that Bing's index is roughly 1/5 the size of Google. The number both return is not real, it includes all kinds of crud, mispellings, etc. The best way to attempt to see the index size is to search for things that return very few results.

I saw as an example: Glycobiosciences

Google reports 4,000 or so. But employing Google's duplicate filtering, it is really 114. Apples to apples, Bing is only 34.

If I do a site search for my humble site, Google returns 120 hits. If I do the same search on Bing, it returns 24. 24 is surprisingly low, there are many more than 24 that have off site links pointing to them directly, so apparently Bing is missing those other sites as well, or the spider would have hit the pages they are pointing to. Since Bing doesn't know about those links, if lowers the index ranking of the pages they do know about. Bad Bing.

I have also noticed Bing crawls those pages it does know about extremely sparsely, I have had changes that weren't reflected in Bings search result snippet for months after they were made.

I think Bing tries to compensate by hitting the top 10,000 or 100,000 sites very hard, so the most common sites are up to date and well indexed -- they just don't do as good of a job finding the smaller nuggets that are out there.

chickenhead
12-31-2009, 02:52 PM
Just as an example of how stupid Bings spider is, here are the results for my site name followed by Crystal Vision, which is a tiny, I believe defunct racing partnership:

Bing: http://www.bing.com/search?q=north+american+racing+partnerships+crysta l+vision&go=&form=QBRE&qs=n

Google: http://www.google.com/search?hl=en&lr=&safe=off&rlz=1C1GGLS_enUS358US358&q=north+american+racing+partnerships+crystal+visio n&aq=f&oq=&aqi=

Bings top result is a page on my site that has a link to each one of the subpages for each partnership. So on the page they return there is a link to the Crystal Vision page on my site. All the spider had to do when it crawled that page is follow each of those links, and it would know all of them. But Bing has never crawled 90% of those subpages, even tho it hit the links to them, so it can't return them.

Google on the other hand returns the Crystal Vision page as the top result, and the second result, indented, is the page that points to all the partnership pages.

Why is Bing so lazy to not reliably follow simple links? I have no idea, but it definitely effects result quality.

For comparison sake, Yahoo is the worst of the lot, here are their results for the same search, my site doesn't even hit the top 10:

http://search.yahoo.com/search?p=north+american+racing+partnerships+crysta l+vision&toggle=1&cop=mss&ei=UTF-8&fr=yfp-t-701

There is very good reason Google has the dominant position.