Previous Page TOC Next Page

HyperText Transfer Protocol

Introducing HTTP

Web browsers and servers communicate with each other using the HyperText Transfer Protocol (HTTP). The World Wide Web has used HTTP since 1990.

Practical information systems require more functionality than simple retrieval, including search, front-end update, and annotation. HTTP allows an open-ended set of methods that indicate the purpose of a request. It builds on the reference provided by the Uniform Resource Identifier (URI), as a location (URL), or name (URN), for indicating the resource on which a method is to be applied.

This generic, object-oriented protocol accommodates distributed, collaborative hypermedia information systems. A feature of HTTP is the typing and negotiation of data representation, allowing systems to be built independently of the data being transferred.

HTTP is also used as a generic protocol for communication between user agents and proxies/gateways to other Internet protocols. Such Internet protocols include SMTP, NNTP, FTP, Gopher, and WAIS that have already been in use on the Internet. Through these protocols, HTTP, allows basic hypermedia access to resources available from diverse applications and simplifies the implementation of user agents.

When sending a request to the server, the browser includes a list of the file types it understands. The server can then ensure that it transmits the document as one of those file types. This means that servers and browsers can cope with the existing mass of graphic formats such as GIF, TIFF, and JPEG, to name but a few. You can also exchange all types of data using HTTP. For example, you can exchange documents that contain text, graphics, sound, and video.

The Web uses HTTP for many tasks, such as name servers and distributed object management systems, through extension of its request methods.

HTTP Request Methods

HTTP allows the use of an open-ended set of methods to indicate the purpose of a request. It uses information provided in the Uniform Resource Locator (URL) to find a resource. HTTP passes messages in a format similar to that used by Internet Mail and the Multipurpose Internet Mail Extensions (MIME), the form used for Internet mail.

Every time a browser requests a document, it sends the server a list of the MIME types it can support. The server maps file types and file extensions to standard MIME data types. In this manner, Web browsers and servers can negotiate the file type that is transmitted between them.

MIME conventions include a major type name and then a subtype name, separated by a slash, to identify file types. For example, the file type for a text file in the HyperText Markup Language (HTML) is identified as text/html, and a .GIF image file is identified as image/gif.

HTML uses the MIME type text and the subtype html (written as text/html). Web servers and browsers support many other data types such as image/gif.

See the chapter entitled Multipurpose Internet Mail Extensions for more information.

HTTP Communications

HTTP communications employs a disciplined and consistent system for referencing different resources through the use of Uniform Resource Locators (URLs). See the earlier chapter in this book and RFC 1738 for more information about URLs. HTTP also:

HTTP Transactions

All HTTP transactions take place over a TCP/IP connection and usually through the default port 80. An HTTP transaction consists of four phases. The following sequence outlines a simple HTTP transaction.

Connection

In this phase, the browser attempts to connect with the Web server. You can observe this happening on most browsers by watching the status line for a message indicating that it is connecting to the HTTP server.

Request

Once the connection between the browser and server occurs, the browser sends a request to the server. This request specifies:

You can use your Web browser to get information about the various HTTP methods at:
http://www.w3.org/hypertext/WWW/Protocols/HTTP/Methods.html

Every time a browser requests a page, it sends the server a list of the MIME types it can support.

Response

When the server fulfills the request it sends a response to the client. At this stage, your browser might display a "reading response" message on the status line. Again, depending on your browser, you might see a "transferring" message on the status line.

The server responds with a status line, including the message’s protocol version and a success or error code, followed by a MIME-like message containing server information, entity meta information, and possible body content.

The server tries to send only the MIME types the browser supports. The server responds first with the MIME type of the data that it is sending, then it sends the data.

Close

After the server sends the response, either the client, the server, or both close the connection.

As shown in Figure 7, most HTTP communications are initiated by a user agent and consist of a request to be applied to a resource on some origin server. In the simplest case, this may be accomplished through a single connection between the user agent and the origin server.

Figure 7 Simple HTTP Communication Transaction


Undisplayed Graphic

Complex HTTP Transactions

This allows hypermedia access to existing Internet protocols like FTP (File Transfer Protocol), NNTP (Network News Transfer Protocol), Gopher, and WAIS (Wide Area Information Server). HTTP also allows communication between user agents and gateways to go through a proxy server without any loss of data.

A more complicated situation than described in the last section occurs when one or more intermediaries are present in the request/response chain. There are three common forms of intermediary: proxy, gateway, and tunnel.

In Figure 8, three intermediaries between the user agent and origin server are shown. A request or response message that travels the whole chain MUST pass through four separate connections. This distinction is important because some HTTP communication options could apply:

Note that each participant could be engaged in multiple, simultaneous communications. For example, B could be receiving requests from many clients other than A, forwarding requests to servers other than C, plus handle requests from A.

Figure 8 Complex HTTP Communication Transaction


Undisplayed Graphic

Any party to the communication that is not acting as a tunnel can employ an internal cache for handling requests. The effect of a cache is that the request/response chain is shortened if one of the participants along the chain has a cached response applicable to that request.

Figure 9 shows the resulting chain if B has a cached copy of an earlier response from the Origin server (via C) for a request that has not been cached by User Agent or A.

Figure 9 Complex HTTP Communication Transaction With Caching


Undisplayed Graphic

Not all responses are cacheable, and some requests might contain modifiers that place special requirements on cache behavior.

Sample HTTP Transaction

HTTP Request

Client : GET DEFAULT.HTM HTTP/1.0
Accept: text/plain
Accept: image/gif/
Accept: image/x-portable-bitmap
User-Agent: NCSA WinMosaic 1.0

HTTP Response
Server : HTTP/1.0 200 OK
MIME-Version: 1.0
Content-Type: text/html
Date: Thursday, 3-AUG-96 23:37:8 GMT
Content-Length: 1618

<TITLE>PURVEYOR ENCRYPT WEBSERVER</TITLE>

HTTP Request Fields

The client can send any one or more of the following header lines in a HTTP protocol transaction. The list of headers is terminated by an empty line. Table 5 lists some commonly used header request fields. Note that the HTTP protocol lists many more header fields. Refer to the URL http://www.w3.org/hypertext/WWW/Protocols/ for additional information.

Table 5 Commonly Used Header Request Fields

Header

Description

From

In Internet mail format, the name of the requesting user. The Internet mail address in this field does not have to correspond to the internet host that issued the request.

For example, when a request is passed through a gateway, then the original issuer's address should be used.

Accept

This field contains a list of Content-Type values the client can accept in the response to this request.

Example: Accept: text/plain, text/html

If no Accept field is present, then it is assumed that text/plain and text/html are accepted. The set given may vary from request to request from the same user.

Accept-Encoding

Similar to Accept, but lists the Content-Encoding types which are acceptable in the response. For example:
Accept-Encoding: x-compress or

Accept-Encoding: x-zip

Accept-Language

Lists the Language values preferable in the response. A response in an unspecified language is not illegal.

User-Agent

If present gives the software program used by the original client. This is for statistical purposes and to trace protocol violations.

For example: User-Agent: LII-Cello/1.0 libwww/2.5

Referer

This optional header field allows the client to specify, for the server's benefit, the address ( URI ) of the document from which the URI in the request was obtained.

This allows a server to generate lists of back-links to documents, for interest, logging, etc. It allows bad links to be traced for maintenance. For example,
Referer: http://www.w3.org/hypertext/DataSources/Overview.html

Authorization

If this line is present, it contains authorization information.

Basic scheme

Password and IP address protection; part of the WWW Common Library

User/Password

Typically from a USER environment variable or prompted for, with an optional password separated by a colon. Without a password, this provides very low level security. With the password, it provides a low-level security. For example, Authorization: user fred:mypassword

Charge-To

If present contains account information for the costs of the application of the method requested.

If-Modified-Since

This request header is used with the GET command to make it conditional. If the requested document has not changed since the time specified in this field the document is not sent. Instead, a Not Modified 304 reply is sent.

Pragma

Pragma directives should be understood by servers to which they are relevant, e.g. a proxy server. Currently only one pragma is defined: no-cache

When present, the proxy server should not return a document from the cache even though it has not expired, but it should always request the document from the actual server.

Pragmas should be passed through by proxies even though they might have significance to the proxy itself. This is necessary in cases when the request has to go through many proxies, and the pragma should affect all of them.

Internationalization

A large portion of information available on the Web is in English and the western-European writing system; however, use of other writing systems and languages is increasing. Organizations and standards groups are enhancing the Web protocols, standards, and tools to meet global community needs. Much of this work centers around language negotiation and character set encoding.

Character Set Parameters

Documents transmitted with HTTP that are of type text/* (text/html, text/plain, etc.) can have a charset parameter, that specifies the character encoding of the document. The default for text/html is ISO-8859-1; for other text documents it is US-ASCII.

Any character encoding that has been registered with IANA can be used. However, it may be too much to ask of a browser to understand all of them.

HTTP Status Codes

The HTTP status codes fall into five categories. See Appendix A for a list of the codes in each category. Basically, the categories are:

  • Informational (100 codes) — This class of status code indicates a provisional response, consisting only of the Status-Line and optional headers, and is terminated by an empty line. Since HTTP/1.0 did not define any 1xx status codes, servers must not send a 1xx response to an HTTP/1.0 client except under experimental conditions.

  • Successful (200 codes) — This class of status code indicates that the client's request was successfully received, understood, and accepted.

  • Redirection (300 codes) — This class of status code indicates that further action needs to be taken by the user agent in order to fulfill the request. The action required may be carried out by the user agent without interaction with the user if and only if the method used in the second request is GET or HEAD.

  • Client Error (400 codes) — The 4xx class of status code is intended for cases in which the client seems to have erred. If the client has not completed the request when a 4xx code is received, the client ceases sending data to the server. Except when responding to a HEAD request, the server includes an entity containing an explanation of the error situation, and whether it is a temporary or permanent condition.

  • Server Error (500 codes) — The 5xx class of status codes indicate cases in which the server is aware that it has erred or is incapable of performing the request. If the client has not completed the request when it receives a 5xx code, it should immediately cease sending data to the server. Except when responding to a HEAD request, the server should include an entity containing an explanation of the error situation, and whether it is a temporary or permanent condition.

HTTP Reference

You can use your Web browser to get more information about HTTP. A good place to start is with the RFC 1945 or with the following URL:

http://www.w3.org/hypertext/WWW/Protocols/HTTP/HTTP2.html


Previous Page Page Top TOC Next Page