Search Tool Overlap Data

The due date for this assignment is 9/17. This looks like a ton of work but it's actually not that bad — I was just really obsessive-compulsive with the directions. (I hope they end up being clear enough for you; if not, I hope you'll let me know sooner rather than later.)

In this assignment you will determine where the top 5 (and then top 10) results of one search engine appear in the results of the other search engine. The idea is that we are trying to determine if it's more likely that a top result (compared to a lower result) in one search engine appears in another search engine. It's a reasonable conjecture but we want to see if it's actually the case.

Web Search Engines
Google, and Yahoo Web
Blog Search Engines
Google Blog Search, and Bloglines

1. Defining the question and queries

You will use the same questions and the same queries that you used in the previous assignment.

2. Gathering your data

You will use the same data and the same data print-outs that you used in the previous assignment.

3. Analyzing your data

Report on the results in the following way:

Web search engines

  1. For the first 5 results in Google, if the result appears in the Yahoo results, then write G5 next to the Yahoo result.
  2. For the next 5 results in Google (that is, results 6-10), if the result appears in the Yahoo results, then write G10 next to the Yahoo result.
  3. For the next 10 results in Google (that is, results 11-20), if the result appears in the Yahoo results, then write G20 next to the Yahoo result.
  4. For the first 5 results in Yahoo, if the result appears in the Google results, then write Y5 next to the Google result.
  5. For the next 5 results in Yahoo (that is, results 6-10), if the result appears in the Google results, then write Y10 next to the Google result.
  6. For the next 10 results in Yahoo (that is, results 11-20), if the result appears in the Google results, then write Y20 next to the Google result.
  7. Pick up the Yahoo results list. You are going to determine values for the GY table.
    • In the top 5 results of the Yahoo results,
      • over(5,5): count the number of times G5 appears.
      • over(10,5): count the number of times either G5 or G10 appears.
      • over(20,5): count the number of times either G5 or G10 or G20 appears.
    • In the top 10 results of the Yahoo results,
      • over(5,10): count the number of times G5 appears.
      • over(10,10): count the number of times either G5 or G10 appears.
      • over(20,10): count the number of times either G5 or G10 or G20 appears.
    • In the top 20 results of the Yahoo results,
      • over(5,20): count the number of times G5 appears.
      • over(10,20): count the number of times either G5 or G10 appears.
      • over(20,20): count the number of times either G5 or G10 or G20 appears.
  8. Pick up the Google results list. You are going to determine values for the YG table.
    • In the top 5 results of the Google results,
      • over(5,5): count the number of times Y5 appears.
      • over(10,5): count the number of times either Y5 or Y10 appears.
      • over(20,5): count the number of times either Y5 or Y10 or Y20 appears.
    • In the top 10 results of the Google results,
      • over(5,10): count the number of times Y5 appears.
      • over(10,10): count the number of times either Y5 or Y10 appears.
      • over(20,10): count the number of times either Y5 or Y10 or Y20 appears.
    • In the top 20 results of the Google results,
      • over(5,20): count the number of times Y5 appears.
      • over(10,20): count the number of times either Y5 or Y10 appears.
      • over(20,20): count the number of times either Y5 or Y10 or Y20 appears.

Blog search engines

For the blog search engines you'll go through the same steps.

  1. For the first 5 results in Google Blog Search, if the result appears in the Bloglines results, then write G5 next to the Bloglines result.
  2. For the next 5 results in Google Blog Search (that is, results 6-10), if the result appears in the Bloglines results, then write G10 next to the Bloglines result.
  3. For the next 10 results in Google Blog Search (that is, results 11-20), if the result appears in the Bloglines results, then write G20 next to the Bloglines result.
  4. For the first 5 results in Bloglines, if the result appears in the Google Blog Search results, then write B5 next to the Google result.
  5. For the next 5 results in Bloglines (that is, results 6-10), if the result appears in the Google Blog Search results, then write B10 next to the Google result.
  6. For the next 10 results in Bloglines (that is, results 11-20), if the result appears in the Google Blog Search results, then write B20 next to the Google result.
  7. Pick up the Bloglines results list. You are going to determine values for the GB table.
    • In the top 5 results of the Bloglines results,
      • over(5,5): count the number of times G5 appears.
      • over(10,5): count the number of times either G5 or G10 appears.
      • over(20,5): count the number of times either G5 or G10 or G20 appears.
    • In the top 10 results of the Bloglines results,
      • over(5,10): count the number of times G5 appears.
      • over(10,10): count the number of times either G5 or G10 appears.
      • over(20,10): count the number of times either G5 or G10 or G20 appears.
    • In the top 20 results of the Bloglines results,
      • over(5,20): count the number of times G5 appears.
      • over(10,20): count the number of times either G5 or G10 appears.
      • over(20,20): count the number of times either G5 or G10 or G20 appears.
  8. Pick up the Google Blog Search results list. You are going to determine values for the BG table.
    • In the top 5 results of the Google Blog Search results,
      • over(5,5): count the number of times B5 appears.
      • over(10,5): count the number of times either B5 or B10 appears.
      • over(20,5): count the number of times either B5 or B10 or B20 appears.
    • In the top 10 results of the Google Blog Search results,
      • over(5,10): count the number of times B5 appears.
      • over(10,10): count the number of times either B5 or B10 appears.
      • over(20,10): count the number of times either B5 or B10 or B20 appears.
    • In the top 20 results of the Google Blog Search results,
      • over(5,20): count the number of times B5 appears.
      • over(10,20): count the number of times either B5 or B10 appears.
      • over(20,20): count the number of times either B5 or B10 or B20 appears.

4. Summarize your data

Create two tables with the following structures:

This table provides a measure of how much of Google's responses are reproduced by Yahoo.
GY Yahoo
Google 5 10 20
5 over(5,5) over(5,10) over(5,20)
10 over(10,5) over(10,10) over(10,20)
20 over(20,5) over(20,10) over(20,20)
This table provides a measure of how much of Yahoo's responses are reproduced by Google.
YG Google
Yahoo 5 10 20
5 over(5,5) over(5,10) over(5,20)
10 over(10,5) over(10,10) over(10,20)
20 over(20,5) over(20,10) over(20,20)
This table provides a measure of how much of Blogline's responses are reproduced by Google Blog Search.
BG Google
Bloglines 5 10 20
5 over(5,5) over(5,10) over(5,20)
10 over(10,5) over(10,10) over(10,20)
20 over(20,5) over(20,10) over(20,20)
This table provides a measure of how much of Google Blog Search's responses are reproduced by Bloglines.
GB Bloglines
GBlog 5 10 20
5 over(5,5) over(5,10) over(5,20)
10 over(10,5) over(10,10) over(10,20)
20 over(20,5) over(20,10) over(20,20)

over(a,b) is the overlap between the top a results of the left search engine with the top b results of the top search engine. In this case, we're not going to do percentages — the overlap values should simply be the count. It should be the case that the diagonal terms in the top two tables are the same; the diagonal terms in the bottom two tables should be the same as well. Further, the over(20,20) term should be the same as the overlap figure that you calculated in the previous assignment.

5. What you need to do

You need to add your data to this page. You also need to turn the following couple pages in to me:

  • A title page with the following:
    • Search Tool Overlap Data
    • BIT330: Fall 2008
    • The date
    • Your name
    • Your uniqname
  • One page containing the following information. Keep a copy of this for your records as you will need them for your analysis write-up.
    1. The text description of the query that you submitted to the Web search engines.
    2. The search queries that you submitted to each of the three Web search engines.
    3. The text description of the query that you submitted to the blog search engines.
    4. The search queries that you submitted to each of the three blog search engines.
  • An appendix to this report containing the three sets of your original (and fully-marked-up) query results. This is the key piece of this assignment that you must turn in so that you can get credit for this assignment.
  • It should be stapled in the upper left corner. I do not want any clips or folded papers or report covers. I will bring a stapler to class that day.
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License