Smarter Searching

You probably have not noticed yet but this website does not support searching for keywords. Searching for keywords on a website can be very helpful in locating relevant webpages. I could have added a search-form but decided it was not worth the effort. Let's try searching this website using Google. The way to do that is to enter the keywords of interest and add the tag "site:" and the web-address after it. Below is a query for searching pages containing the word "programming" on this site:

programming site:

Try it! You will not learn anything without practice. Copy or preferably type the above text into a Google search box. Feel free to change "programming" to any word or words of your choice. Note that there is no space between "site:" and the web-address. If you enter a space your search results most likely will not turn up anything.

The search probably worked pretty well in this case. However, searching a website using Google does not always work. To be able to search a website, the webpages of that site must be indexed by Google. If Google is not aware of the existence of a webpage then it is not going to return any results.

The other problem case occurs when Google knows about the website but has not indexed it recently. If there was any recent material added, Google is not going to mention it in the search results. As of the writing of this article, this website had three webpages about Minesweeper but the following search came up empty:

minesweeper site:

The good news is that Google indexes most well trafficked websites every day. This website is fairly new and not all that popular. It gets indexed only occasionally. Google indexes more than 3 million popular websites every day, so you are not likely to come upon a website like this one every day.

Another reason to use Google to search specific websites is that many websites have broken search functionality and have their own clumsy search interfaces. By using Google to search such sites, you can circumvent these problems.

A nice feature provided by Google is to search sites that only partially match a web-address. I used the following query to locate some nice lecture notes on databases:

databases lecture site:edu filetype:pdf

The above query searches university sites, i.e, sites that end with an "edu". If you want to search sites specific to a country, for example UK, you can do that by adding "site:uk" to your query.

You are probably wondering about the "filetype:pdf" fragment in the above query. This limits the search to Adobe PDF files only. Most lecturers put their notes in pdf format. If I was interested in lecture slides I could have searched inside PowerPoint files used "filetype:ppt". The following are some useful file formats allowed by Google:

File Extension    Query               Description of format
txt               filetype:txt        Plain text format
doc               filetype:doc        MS Word document
rtf               filetype:rtf        Rich Text Format
pdf               filetype:pdf        PDF file
ps                filetype:ps         postscript (pdf alternative)
xls               filetype:xls        MS Excel spreadsheet

One cool thing to do is to count the number of webpages belonging to a specific site. For example to count the number of pages for this site run the following query:

+the site:

The above query searches for the word "the" and list all pages that contain it. This works as almost any page with text is going to contain at-least one occurrence of "the". The plus symbol before "the" forces Google to include the word in its query. Without the plus symbol Google will filter "the" as it appears too frequently in documents. The count of pages belonging to the site will be displayed above the search results. The count displayed is not exact, it is a rough estimate and the actual number is likely to be smaller.

The same idea can be used to find all files of a specific type on a website. The following query finds all pdf files on and counts them too:

+the filetype:pdf

Similar to the plus symbol is the minus symbol. It excludes all documents that contain the word after the minus symbol from the search. The ability to exclude documents using the minus symbol is only occasionally useful. I personally do not use it all that often. The following query displays all pages on my website not containing the word programming:

-programming site:

You can combine the plus and minus symbols together and have multiple occurrences of them in the same query. There are some other operators and you can find about them on the Google website.

The best way to get better at searching the web is to practice using all the above options. The time spent on learning to search well will be quickly rewarded. You are not only going to save time but are also going to find pages that you could not have found otherwise.

If you are interested in learning more about searching the web, you will find Search Lores to be a great resource. The searchlores site has many pages so you might want to use some of the techniques described in this article to search and find the relevant ones.

by Usman Latif  [Oct 27, 2003]