04 Search Techniques
We go over several standard search techniques and strategies.
Class held on 09/21/2009. (student notes; possible questions).
Class structure
- Go through “At beginning of class” info
- Lecture through the slides (as a PDF)
- Talk through the examples
- Go through “At end of lecture”
At beginning of class
- Look at announcements made since the previous class
- If you're going to ask me a question via twitter or email, first do the following:
- Look at my previous twitter messages at drsamoore
- Look at my recent announcements on the wiki
- If you ask me about a wiki page, use http://bit.ly to send me the link to that page so that I can look at it.
- Do not wait until the last minute to start your assignments.
- There are technical issues that you have to learn related to wikidot. This can't be taught very well over twitter. Maybe you've noticed this?
- My office hours are in the Winter Garden on MTW from 3:30-4:30 (or, generally, when students stop coming by, so I might leave early if no one is there or I might leave late if I'm busy).
- Check who is doing what:
- Notes & questions
- Special blogs
- Let's look at what these folks did for today.
- Blog template trouble
- Many of you are having issues with this. You should have the following pages on your wiki (substitute for myWiki and nameOfMyFirstBlog):
- myWiki.wikidot.com/blog:_template
- myWiki.wikidot.com/blog:nameOfMyFirstBlog
- myWiki.wikidot.com/bloglist
- You need to understand the relationship among these three pages.
- Many of you are having issues with this. You should have the following pages on your wiki (substitute for myWiki and nameOfMyFirstBlog):
- Also, let's fix some blog formatting issues.
- First paragraphs, mainly.
- But also paragraph separation.
- File history information
- Grade tool (what to do???)
- Your first possible blog entry (on today's exercises) could be turned in next class (see the schedule-2009 for details on the timing of blog entries)
- From two classes ago: Why do search engines return different results?
My notes
Search techniques
These are most of the search techniques that we'll cover in today's class.
- Special search syntax — This is the tool that you have at your disposal that allows you to target your searches on specific parts of documents. Since different text in different parts means different things and perform different functions, you can use these operators to raise the precision of your queries.
- Full text search engines
- Title — intitle:
- Site — site:
- Top-level domain — site:
- URL contents — inurl:
- Links — link:
- Full text search engines
- Unique words and phrases — The use of multiple unique words and phrases are a key both to reducing the number of documents that are retrieved and raising the precision of your queries. Further, using multiple words and phrases increases the chances of retrieving content-filled documents (that is, increasing the number of “meaty” documents).
- They can be used to focus in on more specialized pages that would use those terms
- Gather related words using summaries
- Use search engines to find related words
- Example at Ask.com (both “Narrow your search” and “Expand your search”)
- Google
- Google Suggest feature
- “Related searches” at bottom of search results window
- Yahoo
- Yahoo Search Assist feature
- “Also try” at top or bottom of search results window
- Yahoo Directory (we'll cover this in a future class) can point in the right direction
- Use means queries
- Query specificity
- Narrow to more general: this is when you have a real good idea of what you're looking for.
- More general to narrow: this is when you don't know what you're looking for.
- Alternative naming
- People
- Using different name forms can return different information
- Sometimes you have to use other information to differentiate two identically named people
- Also, search specifiers can help target the information (intitle, site type, include, exclude)
- Places
- Use addresses (streets, zips, area codes, phone numbers)
- Use "official"
- People
Sites
This is the best summaries of the major general search engines that I could come up with. I have also linked to several useful help pages for each site.
- Google
- The best, most reliable, fastest, most wide-ranging general purpose search engine. Nice features: Showable "Options" on the left with lots of choices (especially time-related and Related Searches switch). When you're serious about searching, you have to make at least one stop here.
- Useful pages
- Yahoo
- Historically, the second best search engine in terms of returning relevant results. Nice feature: the hideable "Search Assist" box at the top that also shows Related Searches.
- Useful pages
- Ask
- A great search engine for exploring a topic. Nice features: the "Related searches" on the right, the binoculars hiding the page preview and page statistics; also larger images appear on mouse-over. Notice there are sponsored results at the top and bottom of the page.
- Useful pages
- Advanced search tips
- Site features: 1/2
- Bing
- A search engine that focuses on the user experience during the search. Nice features: "More on this page" and "Popular Links" in the pop-up bar on the right; "Related Searches" immediately available on left.
- Useful pages
- Help center
- Tour of Bing's features (video)
Useful settings
Each of these search engines provides a way to set up an account and, thereby, set up preferences. I generally use the following preferences:
- 30-50 results per page — I like the ability to scan more information more quickly
- Filtering (moderate on Google) — don't want this stuff popping up in the middle of class or a group meeting
- Open search results in new browser window — this keeps the search results up and available so that they're not so easily lost or closed
- Turn on search suggestions — I find these to be amazingly useful as I structure queries.
In-class examples
For most of the following I will (by default) use Google as the search engine as a demonstration of the search technique. For the most part, each of these search engines (other than Bing) could have been used.
Special search syntax example: Information about tigers
- tigers (31.9mm)
- tigers -"Detroit Tigers" (29.0mm)
- tigers animal (4.61mm)
- animal intitle:tigers (1.45mm)
- Tigers (the animal but not any sports teams):
- Google: tigers -detroit -memphis -missouri -baseball -lsu -football -athletics -sports -mlb -soccer -"Louisiana State" (14.4mm)
- Bing: tigers -detroit -memphis -missouri -baseball -lsu -football -athletics -sports -mlb -soccer -"Louisiana State" (5.21mm)
- Yahoo: tigers -detroit -memphis -missouri -baseball -lsu -football -athletics -sports -mlb -soccer -"Louisiana State" (49.7mm)
- Ask: tigers -detroit -memphis -missouri -baseball -lsu -football -athletics -sports -mlb -soccer -Louisiana -State (4.25mm)
- What's wrong with this page?
- Information from an organization
- animal intitle:tigers site:org (25.6k)
- Information from an organization or a government
- Information from a zoo
Unique words and phrases
- Bunch of birds example
- "flock of seagulls" "gaggle of geese" sparrows turkeys
- Lesson: put what you know in the search
- Use "means" and "definition" queries: Hydrocephalus
- Ask — hydrocephalus (300k) — look at "Related searches"
- Yahoo directory — hydrocephalus
- Google — hydrocephalus — 2.0 million documents (2.34 in 2008; 2.26 in 2007); note the "Refine results" part of the page. Also note the “definition” link near the top of the page.
- Google — hydrocephalus means — 1.15mm documents (385k in 2008; 789k in 2007)
- Google — 'hydrocephalus means' — 3280 documents (844 in 2008; 415 in 2007)
- Google — intitle:hydrocephalus (intitle:means OR intitle:definition) — 1460 documents (470 in 2008; 200 in 2007)
- Google — 'hydrocephalus means' (site:edu OR site:org OR site:gov) — 1020 documents (44 in 2008; 131 in 2007).
- Google — define hydrocephalus (359k documents)
- Related words: Investment guidance
- investment guidance — 4.05mm (487k in 2008; 4.48mm in 2007)
- 'investment guidance' — 44.1k (82.8k in 2008; 71.7k in 2007)
- investment guidance financial goals stocks bonds portfolio — 600k (235k in 2008; 1.62mm in 2007)
- 'investment guidance' financial goals stocks bonds portfolio — 872 documents (13.1k in 2008; 10.9k in 2007)
- Fun with quotes
- 'statistical analysis' means — 10.4mm documents (26mm in 2008; 21.5mm in 2007)
- 'statistical analysis' mean — 6.56mm documents
- 'statistical analysis' 'means' — 5.55mm documents (4.73mm in 2008; 7.04mm in 2007)
- 'statistical analysis' 'mean' — 6.57mm documents
- define:"statistical analysis"
- Lyrics
- Google — 'big rock stars' nickelback lyrics 'we all just' 'drugs come cheap' — 34 lyrics (6 results in 2007, and they were all good)
- Google — rockstar nickelback intitle:official video
Query specificity
- Dog breed information
- Google — dog breed cavalier king charles spaniel — 220k documents (355k in 2008; 888k in 2007)
- Google — dog breed 'cavalier king charles spaniel' — 195k documents (890k in 2008; 535k in 2007)
- Google — dog breed intitle:'cavalier king charles spaniel' — 40.1k documents (26.2k in 2008; 15.4k in 2007)
- Yahoo Directory — dog breed 'cavalier king charles spaniel' — 67 documents (69 documents in 2008 and 2007)
- Dog breed disease information
- Google — 'cavalier king charles spaniel' 'heart problem' OR 'heart murmur' OR 'mitral valve' — 4.54k documents (7,710 in 2008; 22,900 in 2007)
- Google — intitle:'cavalier king charles spaniel' 'heart problem' OR 'heart murmur' OR 'mitral valve' — 2.9k documents (250 in 2008)
- Yahoo — dog breed 'cavalier king charles spaniel' 'heart problem'= — no documents in the directory
Alternative naming
People
- George Washington information
- 'George Washington' biography -site:com -'Carver' — 941k documents (1.22mm in 2008; 1.06mm in 2007).
- intitle:'George Washington' biography -site:com -'Carver' — 293k documents (218k in 2008; 240k in 2007)
- "George Washington": — one whole category on George Washington, plus 84 other related categories
- Stephen Hawking (as a name example)
- Stephen Hawking — 1.93mm documents (3.61mm in 2008; 2.27mm in 2007)
- 'Stephen Hawking' — 1.86mm documents (3.86mm in 2008; 2.12mm in 2007)
- Note that the 2008 results make no sense when compared with the previous result. At least not given my understanding of how Google should operate.
- intitle:'Stephen Hawking' — 73.4k documents (61.3k in 2008; 63.1k in 2007)
- intitle:"Stephen * Hawking" — 3.1mm documents (9,310 in 2008; 9,190 in 2007)
- intitle:"Stephen * Hawking" OR intitle:"Stephen Hawking" — 628k documents (62,900 in 2008; 75,200 in 2007)
- Note that the 2009 results really make no sense. Look at the number of results for the previous two queries.
- "Hawking, Stephen" — 270k documents (535k in 2008; 241k in 2007) — library and books, mostly
- "Hawking, Stephen W." — 47.6k documents (72.1k in 2008; 53.2k in 2007) — again, library and books, mostly.
- "Hawking, Stephen William" — 75.2k documents (20,400 in 2008; 13,900 in 2007) — lots of encyclopedia type entries.
- Levi Strauss (since there are two/three of them)
- "Levi Strauss" — 1.77mm documents (3.97mm in 2008; 2.24mm in 2007)
- "Levi Strauss" -french -france -philosopher — 1.19mm documents (2.21mm in 2008; 2.06mm in 2007)
- intitle:"Levi Strauss" — 56.6k documents (78,100 in 2008; 68,200 in 2007)
- intitle:"Levi Strauss" -french -france -philosopher — 46.2k documents (66,300 in 2008; 53,700 in 2007)
- intitle:"Levi Strauss" (french OR france OR philosopher) — 9,420 documents (9,680 in 2008; 15,900 in 2007)
- intitle:"Levi Strauss" claude (french OR france OR philosopher) — 883 documents (1,160 in 2008; 556 in 2007)
- intitle:"Levi Strauss" bavaria germany — 310 documents (241 in 2008; 48 in 2007)
Places
- Pizza places in Ann Arbor
- pizza "ann arbor" — 763k documents (1.19mm in 2008; 887k in 2007).
- Look at all of the information this query has available at the top of the results page.
- pizza "ann arbor" william — 102k documents (547k in 2008; 629k in 2007)
- (734) 669-6973
- pizza 734 "ann arbor" — 109k documents
- pizza "ann arbor" — 763k documents (1.19mm in 2008; 887k in 2007).
- The Sears Tower (as a landmark)
- "sears tower" — 1.12mm documents (1.44mm in 2008; 1.49mm in 2007)
- "sears tower" official — 1.12mm documents (1.11mm in 2008; 257k in 2007)
- intitle:"sears tower" — 170k documents (28,800 in 2008; 19k in 2007)
- intitle:"sears tower" official — 37.9k documents (28.7k in 2008; 1,440 in 2007)
- intitle:"willis tower" official — 15.7k documents
At end of lecture
- Start working on today's exercises. The exercises are on this page. You should work on them for no more (but, probably, no less) than another hour outside of class; we will have more time in the next class after the lecture to continue working on them before going on to that day's exercises.
page_revision: 16, last_edited: 1254924073|%e %b %Y, %H:%M %Z (%O ago)