11 The Deep Web

Introduction

The Deep Web is also known as the “invisible web”. This means it is invisible to the general search engine. People also refer to this as the academic web, given the nature of the material that is often hidden. The Deep Web is much larger than the surface web (20-100 billion documents). There is something around 10 million documents in Google. There are 450,000 queryable databases; 7,500 Terabytes of data. What we want to be able to do is figure out how to access this data.

The standard web search resources provide us access to millions of documents, but students cannot access all of the high quality resources through Google and Yahoo.

Specialized Tools

We must learn to use specialized tools to gain access to some sources. The Google database gives us access to HTML pages, which is what Google can read. Google can’t read flash files, shockwave, etc… Google also cannot access documents that are accessed after filling out a form. Real time info and proprietary information also present problems for web search engines.

What we want out of these tools is to be able to do a search and not miss out on anything.

Google Scholar

Google Scholar is Google’s attempt at making the Deep Web searchable. This is actually pretty bad software. There are, however, a few highlights. Google Books allows for full text searches of books. There are also a significant amount of journal databases that have opened up their access to Google Scholar. This means full text of journals and books, not just excerpts, and in multiple languages. There are also library links.

Google does not tell us what journals and books are in and out of this tool and gives us no quantitative data about what can be accessed through Google Scholar. The search software is bad. Really, really bad. At one time, the most prevalent author on Google Scholar was “Password”. There are no specific fields for input in Google Scholar, which would be a good improvement.

Alternatives

During the class, we will be exploring the use of Turbo 10, Scirus, BNET, and Librarians’ Internet Index.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License