Previous Page TOC Next Page

Uniform Resource Locators

What is a URL?

A Uniform Resource Locator (URL) is a pointer to specific information on the Web.

To use a URL, you start your browser, select the option that allows you to enter a location, and enter the URL. Your browser then requests the information from the server that the URL identifies. You also use a URL when you click on a hotspot, or hyperlink, in an HTML document. The URL is embedded in the hyperlink itself.

Each hyperlink in an HTML document has two parts: the anchor text or graphic that you click on to trigger the hyperlink, and the URL that describes what to do once you activate the hyperlink.

Think of a URL as the address or location of the page. A URL identifies the protocol, the server, and the information type, providing a link that the browser uses to find and display the page.

Every file on the Internet is uniquely addressable by its URL.

Sections of a URL

A URL has three basic sections:

See the section Local URLs with an Absolute Pathname for more information.

Figure 6 Sections of a URL


Undisplayed Graphic

A fully-qualified URL describes the protocol to use, the server to contact, and the file to request. The file section differs depending on the type of file.

Protocol Portion of the URL

The protocol portion of the URL is the way you want the browser’s system to communicate with the server. The number of protocols supported by browsers and servers seems to be growing constantly. In most cases, the standard protocol is either http (for files on a server) or file (for local files).

There are two instances in which the protocol type might be omitted from a URL. The protocol is left off when using a browser that can detect the protocol type from the remote server. The protocol is also unnecessary when referring to a file located on the same server as the referring link. In these cases, the URL is referred to as a partial URL (described later).

Syntax

In the protocol portion of a URL, specify the protocol followed by a colon and two forward slashes, as in: ://

Server ID Portion of the URL

The server ID portion of the URL identifies the server where the information you want resides, for example in Figure 6, //cluster.mycompany.com:88/; in fact, the normal host ID itself consists of three parts:

Most servers use the default port number (80) so you usually do not have to include the port number in most URLs. The domain name is also optional if the target server is in the same domain as the client system.

The standard Internet domain name services (DNS) resolve these names into a numerical Internet address. In some cases, systems might not be registered or your local domain name server could be having problems. As an alternative, you can specify a system by its numerical Internet address, rather than by the combination of system name and domain name.

See the section Local URLs with an Absolute Pathname for more information.

Syntax:

In the server ID portion of a URL, separate the system name and domain name by a period and separate the domain name and port number by a colon (:).

File ID Portion of the URL

The file ID in a URL specifies the directory, subdirectory (if any), and filename. There are two different kinds of file ID. Data files are the most common file types and are usually HTML files. The data file ID can include an optional section name. The data file ID part of the URL in Figure 6 is:

/datadirectory/test/file1.html#topicname.

Executable program file IDs identify a document that runs a program when accessed. An executable file ID can include an optional program parameter. An example of a URL containing a request for an executable program file is:

http://cluster.mycompany.com:88/cgi/carb.exe?VBcar.exe

Both types consist of three parts:

Syntax:

To specify a file ID, begin with a forward slash, then separate the directory, subdirectory, and filename with a forward slash (/), and include the file extension (for example, .html).

You do not need any punctuation at the end of a fully-qualified URL.

Directory Path

The directory path in a fully-qualified URL is always an absolute path, relative to the server’s home directory. It must begin with a slash to separate it from the server ID.

Filename

The filename specifies the file for the transfer. This is most commonly an HTML file. However, the file can also be an executable file or some other type, depending on the protocol being used. The Common Gateway Interface (CGI) allows a URL to specify an executable file that the server executes. The result of the program is served out as an HTML file.

Section Name

If the file specified is an HTML file that has defined sections, the final part of the file ID can be a section name. In this case, the filename is followed by a number sign (#) and the name of the section. Most HTML files are not broken into sections, so you usually do not need to use a section name. In fact, the only effect that including a section name has is to set the browser’s screen scrolling so that the defined section is at the top of the browser’s display.

Program Parameter

If the file specified is an executable program, the final part of the file ID can be a parameter passed to the program. In this case, the filename is followed by a question mark (?) and the parameter. The following example shows this sort of URL with a call to the carb.exe program that gets VBcar.exe as a parameter.

http://cluster.mycompany.com:88/cgi/carb.exe?VBcar.exe

Path Information

Path information is additional path information, as given by the client. This comprises the trailing part of the URL after the script name but before the query string (if any).

Partial URLs

Not all URLs have all three sections of a fully-qualified URL. There are two general types of partial URLs:

URLs Without a Filename

A default page URL does not specify a file. It includes a protocol and a system ID and might include the directory path portion of the file ID but does not include a filename.

For example:

http://cluster.mycompany.com/datadirectory/test/

This example includes a directory path, but not a filename. By default, the server looks for a particular filename; the name depends upon the server being used or how it is configured.

A server that receives this type of URL takes these actions:

  1. Looks for a default file, for example, default.html or default.htm in the directory. If it exists, returns that file to the Web browser.
  2. If the default file does not exist, determines if the directory is a browsable directory. If so, returns a directory listing to the Web browser.
  3. If the directory is not browsable, returns an error message to the Web browser.

This kind of URL ends with a slash. If you do not, the server could misinterpret the directory name in the URL as a filename. The following example could be interpreted as an error.

http://cluster.mycompany.com/datadirectory/test

If test is a filename, this is a valid URL. However, to correctly point this URL to a file, test should be named test.html (or test.htm). If test is a directory name, then this URL is not properly formed.

Web servers are usually smart enough to figure out that a URL like this is referring to a directory and not a file, but it is not a good idea to depend on the server to fix such mistakes.

When you are creating an HTML document with hyperlinks, it is best to be sure you explicitly specify a directory or a file in the URL. In addition, if you explicitly specify a directory, ensure that a default.html file exists for that directory.

Local URLs

A file ID alone is a valid partial URL. The terminology for partial URLs is not yet standardized. To avoid confusion, a URL that consists of only a file ID is a local URL. This is, in fact, the preferred type of URL for a group of locally-related HTML files when they refer to each other . [ Local URLs are also known as relative URLs. This term can be confused with a relative pathname and so the term relative URL is not used in this document.]

See the section Local URLs with a Relative Pathname for more information.

Local URLs with an Absolute Pathname

If a local URL begins with a slash, it is an absolute pathname. The path specified starts at the server’s default data directory—the root of the directories available to the server.

This is similar to the absolute pathnames in a file system except that tracing the directory path does not begin at the actual root directory of the server’s file system. It is the server’s default data directory that serves as the root of available files.

The leading slash alone is sufficient to serve as a path. A URL of /home.html always refers to a file named home.html in the current server’s default data directory.

Local URLs with a Relative Pathname

If a local URL does not begin with a slash, then the path specified starts at the same directory as the current file—the same as with relative pathnames in a file system.

A filename alone (with no pathname) is the minimum amount of information needed for a URL naming a file that resides in the same directory on the same server as the current file.

The following list shows the syntax for some local URLs with relative pathnames. Note that none begin with a slash, but that a slash separates subdirectory names:

Mapping URLs to Documents

URLs map to documents in the document tree on a Web server. The minimal URL that can reach the server is:

http://server-name

If you are running your Web server on a port other than the default port, include the nonstandard port number in the URL, for example:

http://server-name:1300

As the HTTP server translates URLs to real directions, it looks at the beginning of the URL path for any virtual paths. If the server finds one, it replaces the virtual path with the real directory and processes the request. Virtual paths let you specify different trees for different kinds of information.

URLs and Servers Running on Non-standard Ports

Sometimes a server is running on a non-standard port. In this case you need to include the port number as part of the URL. Table 1 lists the standard ports:

Table 1 Standard Services and their Port Numbers

Service

Port Number

FTP

21

Gopher

70

HTTP

80

TELNET

25

SSL

443

The following URL shows how to specify an FTP server running on port 1375:

ftp://branch.mycompany.com:1375/

Interpreting URLs

The status message area on most browsers displays the URL for each hyperlink in the document. The URL can tell you many things about the resource to which it refers. Once you know the standard URL syntax, you can easily interpret a URL. You can gain the most information from the protocol, domain name, and filename endings in a URL.

Frequently Used Protocols

Table 2 lists some frequently used protocols that might be referenced in the status message area. This table also lists some other protocols that you could use occasionally.

Table 2 Frequently Used Protocols

Protocol

Description

HTTP

HyperText Transfer Protocol—you can find HTML files and possibly other kinds of files at URLs specifying this access method

FTP

File Transfer Protocol—you can find files, and directories of files at FTP sites

Gopher

Gopher Information—a menu system where you can find menus and choices. Selecting menu items can let you reach other resources

WAIS

Wide Area Information Server

SSL

Secure Socket Layer

Frequently Used File Name Extensions

Table 3 lists some frequently used filename extensions that might be referenced in the status message area or that you could use occasionally.

Table 3 Frequently Used Filename Ending

Ending

Description

.html or .htm

HyperText HTML file

.ps

A PostScript file that prints formatted fonts and graphics

.txt

An ASCII text file

.tex

TeX or LaTex file that contains a typesetting language that uses a set of tags for coding

.jpg or .jpeg

A JPEG bitmap image file

.mpg or .mpeg

A video file

.gif

A CompuServe bitmap image file

#anchor

A URL that points to a specific place within a document

Frequently Used Internet Domain Name Endings

Table 4 lists some frequently used domain name endings for Internet host names that might be referenced in the status message area or in URLs that you receive from time-to-time. The endings listed in this table are mostly used in the United States.

The URL ftp://rtfm.mit.edu/pub/usenet/news.answers/mail/country-codes has information about international endings.

Table 4 Frequently Used Domain Name Endings

Ending

Description

.com

Commercial organizations

.edu

Education organizations

.gov

Government organizations

.mil

Military organizations

.org

Nonprofit organizations

.net

Network administration sites


Previous Page Page Top TOC Next Page