#### by Scott Moore (samoore in BIT330, Fall 2009)

## Summary data

The following tables contains the precision of the top 10 results returned on queries and the number of overlap documents shared between the top 10 results of different search engines.

Web search (comparing top 10 results) | ||||
---|---|---|---|---|

Ask | Bing | Yahoo | ||

Ask | 4.3 | 1.5 | 3.9 | 1.8 |

Bing | 4.5 | 2.4 | 2.3 | |

5.7 | 2.5 | |||

Yahoo | 5.1 | |||

All | 1.1 |

Let's make sure that we know what this table tells us.

- Diagonal values
- Consider the cell which contains "4.5". This tells us that, on average, the top 10 results returned by Bing contains 4.5 relevant documents.
- Off-diagonal values
- Consider the cell which contains "3.9". This says that, on average, the top 10 results returned by Ask contained 3.9 documents returned in the top 10 documents of Google.

The apparent result is that just over half of the results for Google and Yahoo are relevant, while just over 3/7 of the Ask and Bing results are relevant. These are remarkably consistent with last year's experiment in which we looked at the top 20 results of Microsoft Live Search (43%), Google (54%), and Yahoo (52%). The standard deviation of this year's precision values range from 2.3 to 2.9.

Now let's consider the overlap data. Ask's results have much in common with Google (while at the same time providing the lowest level of precision, which is a real trick). When looking at the three search engines that we examined last year, overlap has increased from about one-fifth of the results to one-fourth of the results.

## Results

### Explanation of statistics

For the individual results, I show for how many students the precision was better for the first search engine, better for the second search engine, or the same for the two search engines. For the Student's paired t, I test the hypothesis that the differences in precision for the two search engines is equal to zero; this test assumes that the data is normally distributed. I used this table of values to test the hypotheses. For the Wilcoxon signed rank test, I am testing the hypothesis that the precisions for the two search engines are selected from the same distribution (no matter what that distribution might be). I used the method described on this page to calculate this statistic.

### Differences in precision of the top 10 results

**Ask vs. Bing**: Test hypothesis that Ask and Bing are equivalent. The first test indicates that Bing might be slightly better. The second test is not able to refute the hypothesis. The third test concludes strongly that Ask and Bing are different.

- Individual results (A/B/=): 12/15/6
- Student's paired t: $t = -0.50$
- Wilcoxon: $W_{27} = -70 \Rightarrow z = -4.84 > z_{99.9} = 3.291$

**Google vs. Ask**: Test hypothesis that Google is better than Ask. All tests provide strong support for this statement.

- Individual results (A/G/=): 5/21/7
- Student's paired t: $t = 3.20 > t_{30,99.75} = 3.030$
- Wilcoxon: $W_{26} = 208 \Rightarrow z = 18.4 > z_{99.95} = 3.291$

**Yahoo vs. Ask**: Test hypothesis that Yahoo is better than Ask. All tests provide strong support for this statement.

- Individual results (A/Y/=): 11/18/4
- Student's paired t: $t = 1.69 > t_{30,90.0} = 1.310$
- Wilcoxon: $W_{29} = 147 \Rightarrow z = 9.99 > z_{99.95} = 3.291$

**Google vs. Bing**: Test hypothesis that Google is better than Bing. All tests provide strong support for this statement.

- Individual results (B/G/=): 7/22/4
- Student's paired t: $t = 3.07 > t_{32,99.75} = 3.030$
- Wilcoxon: $W_{29} = 246 \Rightarrow z = 18.8 > z_{99.95} = 3.291$

**Bing vs. Yahoo**: Test hypothesis that Yahoo is better than Bing. The first two tests provide some support for this hypothesis, while the third test provides strong support for it.

- Individual results (B/Y/=): 11/17/5
- Student's paired t: $t = 1.57 > t_{30,90.0} = 1.310$
- Wilcoxon: $W_{28} = 120 \Rightarrow z = 8.53 > z_{99.95} = 3.291$

**Google vs. Yahoo**: Test hypothesis that Google is better than Yahoo. Again, the first two tests provide some support for this hypothesis, while the third test provides strong support for it.

- Individual results (G/Y/=): 17/8/8
- Student's paired t: $t = 1.36 > t_{30,90.0} = 1.310$
- Wilcoxon: $W_{25} = 122 \Rightarrow z = 8.60 > z_{99.95} = 3.291$

## Discussion

So, what we have is G > Y > B > A with some level of significance. Any relationship among search engines that "jumps" one position is highly significant. Last year's experiment shows that Google and Yahoo were clearly better than Microsoft Live with Google being slightly better than Yahoo. Note that, while Google was better for one-half of you, Yahoo provided better results for one-quarter of you — and it was the same for the other quarter. I'm guessing that you will not know whether Yahoo would be better for a query than Google on any one query, so I would look at both Google and Yahoo if I'm doing anything more than superficial research.

Let's look a little more in-depth as to why I recommend this. For any in-depth research you do, support you do as I say and look at the top 10 (though 20 would be better) results in both Google and Yahoo. On average in those top 10 results, you will get 5.7 relevant documents from Google, 5.1 from Yahoo, with 1.2 documents in common (the relevant half of the usual 2.5 documents that they share); thus, you will get 9.6 relevant documents from the two searches. Since Google and Yahoo index documents differently (note the minimal overlap in results), you will get a lot of relevant results and they will be from different parts of the Web than if you had used only one search engine.

As for Bing and Ask? If you enjoy using them, then go ahead and use them. They both have useful interface elements that make working with them beneficial in their own way. You should be aware of what you are giving up by using them.