Diving Through the Deep Web

By samooresamoore (1251644906|%a, %b %e at %I:%M%p)

Google Can't Pick Up Everything

I was surprised to learn that if you can't find something on Google, that it doesn't mean it doesn't exist. There is apparently a significantly larger portion of data that exists beyond what Google and other dominant search engines can pick up.

Deep%20Web.jpg

These documents that are unquery-able from main search engines make up "The Deep Web" and consist of about:

  • 20-100 billion documents
  • 250 queryable databases
  • 7,500 terabytes of data

(keep in mind that these are very approximate)

The reason why Google and other main search engines that are unable to gain access to the Deep Web is because Google's "spiders" cannot track certain kinds of pages, such as proprietary pages, flash pages, and real time information.

Most of the Deep Web consists of academic journals, which is why I am so surprised that I hadn't known of the search engines that get gain access to these sites sooner. I am excited to know that for future research, I can use alternative search engines to explore parts of the web that I had never before been able to.

Why It Is Time To Branch Out

Thus far, whenever I had "serious" research to do, I would feel very clever by using specialized tools, such as Google Scholar or JStor. Jstor works well if you have access to it and have a certain specific resources that need retrieval, such as history of art journals. But for more general and free of charge searches, I always assumed Google Scholar was sufficient enough, especially because I had no other tools to compare it to.

102948694_428880c205.jpg

Google Scholar has some useful features, such as full text of books, multiple languages, and library link options to your school's library with every retrieval. However, it also has some huge drawbacks that make it an altogether ineffective search engine. For one, it has huge coverage gaps and Google makes it impossible to know what kind of coverage is being left out. There is no feature on the interface that reveals where the coverage comes from, such as a feature that reveals whether the retrievals that are journals, articles, or books. Likewise, Google prevents you from knowing what sources they pool from, which prevents you from knowing how reliable your retrievals might be. It does not retrieve documents through Google Scholar the way it does through the general Google Search, which makes it very inconsistent to get the same results with similar queries.

I was quite surprised to learn exactly how much information and possible resources Google Scholar leaves out and why it is not acceptable when you need to come up with credible and reliable sources for research. I had no idea that there were search engines that could provide comprehensive, integrated, and transparent results that reveal exactly where the information is coming from. I also didn't realize how many other search engines had interfaces that could be easier on the eyes and easier to follow.

For searching the Deep Web, here are some of the sites that I explored as an alternative to Google Scholar:

Making New Friends

Turbo10

turbo10-search.gif

One of the main advantages of Turbo10 is its search clusters. Once you search for one general term, Turbo10 will help you refine your search by having clusters that you can select in order to narrow it down. I found this extremely helpful as I rarely know exactly what I am looking for. When I am in the early stages of research, the clusters help me explore what related topics I can search for within my broader search. Likewise, you can select which search engines you want to utilize if that will increase the relevance of the search.

I also like the interface of Turbo10 compared to other basic search engines. It is much simpler (with only about 5 retrievals per page), as well as a small graphic to give you an idea what the site looks like before you click on it. This helps you save time and quickly see whether or not the site might be useful.

Scirus

scirusfig3.jpg

Scirus has by far the most transparent of these specialized search engines. Their About Us section of their page reveals all of the other databases where they pull from, which Google does not have. This allows viewers to get a clear analysis of the credibility and quality of the results, as well as signal to the viewers what Scirus has and what they don't have.

Scirus maintains this transparency for each query by showing which numbers of retrievals fall into the categories of: "journals," "preferred web," or "other web." Likewise, they show what types of pages have been retrieved: html, pdf, or word. This helps you quickly evaluate what kind of results are coming back and whether or not you need to rethink your query or how far into the documents you should look before trying again.

With the sorting and post-query features, I honestly felt like Scirus was able to tell me what I was thinking. The topic pages connected with Scirus really help you narrow down you search, which has been very beneficial for my beginning stages of research projects. Although Scirus does not have the most aesthetic of interfaces, it probably has the easiest interface to navigate.

lii0911.png

Librarian's Internet Index

Librarian's Internet Index had my favorite interface compared with all of the other alternative Deep Web search engines. On the front page, it allows you to pick general search categories if you are having trouble getting started.

The results themselves for Librarian's Internet Index are much fewer in number than most sites, making the search process seem less overwhelming. For example, when I searched for "cookies,"
it only came back with 31 retrievals, compared with the 3,256,355 retrievals that came back for Scirus.

The only disadvantage I found with Librarian's Internet Index was that it didn't state where the sources where coming from; the query results page didn't have feature that showed how many of the sites were journals or other formats.

BNet

bnet1.jpg

I was excited to find out about BNet, especially because so many of my research projects are related to business.

I found BNet's interface to be very impressive and strait forward. Whenever you searched for something, it showed where the content came from on the right hand side. Likewise, I can see how the discussion thread that follows every piece can be very useful to hear the perspectives of "everyday" users.

Other helpful features on BNet I liked were:

  • The Business Library Topics (on the lower left hand side of each of the headings on the main page)
  • The featured articles for every heading
  • The Video Library

For someone who has trouble keeping up with current events, this seems like a great site to follow current issues with the economy.

The Bottom Line

Although it will be impossible to cut out Google from your life entirely, I am hoping that this blog will encourage you to branch out a little bit. Although all of the sites mentioned above have a pros and cons depending on your needs, they will generally work 100 times better than Google Scholar will for conducting research. They should raise your standards from what you expect out of a search engine! These search tools are so helpful that they might inspire you to learn about things that you don't even need to know, just because its so much easier to find information :)

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License