Understanding Web Addresses

Website addresses are most certainly very important to anyone who uses a browser. However, most people do not have any great understanding of them and are unable to use them effectively. Taking a little time to learn about them can not only save you time but also provide you with greater access to internet content.

Website addresses are technically known as Uniform Resource Locators (URLs). I purposefully did not use the acronym URL in the title of this tutorial as that would scare away many people. URLs constitute more than just web addresses. The internet offers resources other than webpages as well and URLs can be used to access those resources too. In this tutorial I will be limiting myself to webpage addresses only. I am going to use web/webpage/website interchangeably from here on.

Let us first examine some web addresses and take them apart to get at their constituents. Below is the web address of the page you are viewing:
Note: This website no longer uses the above url format

I have separated by spaces the distinct parts of the above address on the line below. The table that follows gives a description for each part.

http:// / index.php ? content=15
Part                Description  
http://             protocol to use    domain name
/                   path to the resource to be accessed
index.php           web resource user is trying to access
?                   signifies that a request for information follows
content=15          request for "content" labeled "15"

The first part of the address is the scheme or protocol specification. It tells the browser the type of resource you are trying to locate. The "http://" protocol signifies a web page. If the protocol is unspecified, web browsers generally assume that you want to access a webpage. This allows one to save a little bit of typing by always leaving out the "http://" before web addresses.

The next part is the domain name. This is the address of the main webpage. The domain name continues till the first "/" in the address. Some websites allow the www portion to be skipped. If you type the address of this webpage without the "http://www" portion everything will work fine. There is nothing wrong with doing that and you can save some keystrokes.

After the domain name the path to the resource follows. It starts with a "/" and identifies the folder on the "server" where the resource is residing. A server is a computer which sends out pages when people request them by typing web addresses in their browsers. In the case of this webpage the path on the server is just a single "/" which implies the main folder containing web resources. A more complicated example is the following webpage:

Here the path to the resource is "/files/" The "files" portion is the name of a folder under the main web folder. The cool thing is that many websites allow you to browse folders. You can browse the "files" folder by not specifying the resource name. Typing the following will get you access to the files folder:

The ability to browse folders can be very useful. Suppose you did a web search and the search returned a link to a useful powerpoint presentation or a pdf document. Any smart person would expect the website containing the document to have many more interesting resources. Unfortunately powerpoint and pdf documents do not always contain links back to the parent site. The web address of the document might look like:

Using the knowledge about browsing directories you can now try browsing the folder containing the file and see if it contains anything useful. If you get an error message try the next level folder and so on. The following sequence of addresses might be a good way to go about finding the parent site:

Now we come to the resource part of the web address. This is the name of a file or program that you want to access. A good number of web pages are created by programs on demand. In the case of this website, "index.php" is the program which created the webpage you are looking at.

Most programs that create webpages take inputs which specify the page to be created. The inputs are passed using a "?" after the resource name.

After the "?" typically there are some pairs of names and values. In the case of this webpage there is only one pair "content=15". The "content=15" part tells the "index.php" program to create the webpage corresponding to the value 15 for the label "content". Changing the part after the "=" instructs the "index.php" program to create a different webpage. The following are some valid webpages on this website:

Some web addresses have more than one pair of names and values. This kind of address is specified by using the "&" to separate the multiple pairs. This website doesn't use more than one pair but adding something superfluous does not typically cause any problems. Here is an example of a web address with two name value pairs.

The ability to change the value part (the part after "=") can be very handy at times. Suppose I was writing a monthly column and you had retrieved a current column at the web address:

For this web address we have two name value pairs, "month=oct" and "year=2003". The labels "month" and "year" don't have to mean anything, but typically programmers use logical names. A reasonable expectation would be that the column for January 2002 can be retrieved using the address:

Remember when you modify a web address in the above manner you are making a guess. A guess is not supposed to be right all the time and you will be wrong fairly often. This shouldn't discourage you, as there is no harm in trying and at times you will be able to save a lot of time and effort by doing so.

There is a lot more to web addresses than has been mentioned in this tutorial. If you have a particularly nasty web address that you want me to dissect, send me an email. I will post the explanation as an addendum to this tutorial.

If you want to learn more about web addresses the following links will come in handy.

by Usman Latif  [Oct 09, 2003]

RFC 1738: A comprehensive document about URLs.