Search Tool Data

The due date for this assignment is 9/15. This looks like a ton of work but it's actually not that bad — I was just really obsessive-compulsive with the directions. (I hope they end up being clear enough for you; if not, I hope you'll let me know sooner rather than later.)

You will compare three Web Search engines (see below) and three Blog Search Engines (again, see below) and compare them in a controlled, quantitative manner. The point of this exercise is for you to determine whether or not you can recommend one tool over the others.

Web Search Engines
Windows Live, Google, and Yahoo Web
Blog Search Engines
Technorati, Google Blog Search, and Bloglines

1. Defining the question and queries

  1. Define two separate questions that you want to have answered; you will submit one of these questions to the Web search engines and one of these to the blog search engines. For each one write the question you are trying to answer in a short paragraph, one that you might present to a librarian if he were to conduct the search for you. It would be better if the questions are related to one of the topics you are researching this semester for this class.
  2. For each question, define a suitable query to submit to your search tool. The query can be different in form for each search tool but should be equivalent in content. I would be surprised if the query consisted of one word (instead of multiple words) and if it wasn't sophisticated in some way but I won't count off for simple queries. Again, be sure that the queries are as equivalent as possible across the three different search engines.

2. Gathering your data

  1. Submit the query to each search tool. Do this within 30 minutes from beginning to end. (Of course, it could be a different 30 minutes for the web and blog searches.) The reason for this requirement is that the databases change over time and we want each search engine to have an equal opportunity.
    • Print out a list of the first 20 resources as returned by each search engine. It would be best if you configured the search engine to return at least 20 results on the same Web page. You will be using these pages to do your analysis (see below). Change your question if an appropriate query doesn't return at least 20 results for all of the search engines.
    • Before closing the Web page, save the page to your file space. The reason for doing that is so that you can use the links embedded in the page later in the assignment. The printouts themselves are often not sufficient for finding the Web page itself; some parts don't print sometimes.

3. Analyzing your data

Report on the results in the following way:

Web search engines

  1. For these first 20 resources returned by the search engine, determine the precision of the results. That is, determine which of the Web pages are applicable and useful. (If you consider 15 of the resources for Google to be useful, then the precision would be 75%.) You should look at each Web page (not just their summaries) in order to determine this. On the printout you should write a P in the right margin next to a Web page that you consider to be applicable.
  2. Determine the overlap of the results returned by the different search engines. You will report 4 numbers here: how much the results from A overlap with B (that is, what percentage of the 20 results from A are found in the results of B), how much B overlaps with C, how much A overlaps with C, and how much A, B, and C overlap all together.
    • Live/Google
      • On the Live printout, go through each resource, and if you find it on the Google results, then put a G in the right margin.
      • Count up the number of Gs that you wrote.
    • Live/Yahoo
      • On the Live printout, go through each resource, and if you find it on the Yahoo results, then put a Y in the right margin.
      • Count up the number of Ys that you wrote.
    • Google/Yahoo
      • On the Google printout, go through each resource, and if you find it on the Yahoo results, then put a Y in the right margin.
      • Count up the number of Ys that you wrote.
    • Live/Google/Yahoo
      • On the Live printout, go through each resource and count up how many have both a G and a Y next to them.

Blog search engines

For the blog search engines you'll go through the same steps.

  1. For these first 20 resources returned by the search engine, determine the precision of the results. That is, determine which of the blog entries are applicable and useful. (If you consider 15 of the resources for Technorati to be useful, then the precision would be 75%.) You should look at each blog entry (not just their summaries) in order to determine this. On the printout you should write a P in the right margin next to a blog entries that you consider to be applicable.
  2. Determine the overlap of the results returned by the different search engines. You will report 4 numbers here: how much the results from A overlap with B (that is, what percentage of the 20 results from A are found in the results of B), how much B overlaps with C, how much A overlaps with C, and how much A, B, and C overlap all together.
    • Technorati/Google Blog
      • On the Technorati printout, go through each resource, and if you find it on the Google Blog results, then put a G in the right margin.
      • Count up the number of Gs that you wrote.
    • Technorati/Bloglines
      • On the Technorati printout, go through each resource, and if you find it on the Bloglines results, then put a B in the right margin.
      • Count up the number of Bs that you wrote.
    • Google Blog/Bloglines
      • On the Google Blog printout, go through each resource, and if you find it on the Bloglines results, then put a B in the right margin.
      • Count up the number of Bs that you wrote.
    • Technorati/Google Blog/Bloglines
      • On the Technorati printout, go through each resource and count up how many have both a G and a B next to them.

4. Summarize your data

Create two tables with the following structures:

Web search Live Google Yahoo Web
Live prec(l) over(l,g) over(l,y)
Google prec(g) over(g,y)
Yahoo Web prec(y)
All over(l,g,y)
Blog search Technorati Google Blog Bloglines
Technorati prec(t) over(t,g) over(t,b)
Google Blog prec(g) over(g,b)
Bloglines prec(b)
All over(t,b,g)

prec(x) is the precision of that search engine. over(a,b) is the overlap between search engines A and B. These numbers are percentage — that is, some number between 0 and 100 inclusive.

5. What you need to do

You need to add your data to this page. You also need to turn the following couple pages in to me:

  • A title page with the following:
    • Search Tool Data Collection
    • BIT330: Fall 2008
    • The date
    • Your name
    • Your uniqname
  • One page containing the following information. Keep a copy of this for your records as you will need them for your analysis write-up.
    1. The text description of the query that you submitted to the Web search engines.
    2. The search queries that you submitted to each of the three Web search engines.
    3. The text description of the query that you submitted to the blog search engines.
    4. The search queries that you submitted to each of the three blog search engines.
  • You will not turn in the appendix with your annotated printouts just yet. You will turn this in with the next set of data.
  • It should be stapled in the upper left corner. I do not want any clips or folded papers or report covers. I will bring a stapler to class that day.

Clarifications to integrate into next year's assignment

A couple of notes related to the data gathering assignment and, especially, the questions and queries:

  • Don't use a query that references a small, relatively unknown product or company. This simplifies your search too much.
  • Write more than 1/2 line describing your question. Tell me what types of documents would be most useful and what would be less useful.
  • The queries should be the same across the three search engines except if the search syntax of one search engine differs from another. So, if you use quotes in one query, use quotes in the other queries, too. Further, if you search in the title for one search engine, search the same way in the other search engine.
  • When you are showing what query you used, don't include quotes in the query unless you used quotes in the query. (Hmmmm.) For example, suppose your search query was fastest car acceleration. Don't write “fastest car acceleration” unless you used quotes at the beginning and end of the query when you typed it into the search engine. If you simply queried using those three words, then a standard way of representing this is to start and end your query with square brackets; thus, you would write [fastest car acceleration].
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License