Web browsers and servers communicate with each other using the HyperText Transfer Protocol (HTTP). The World Wide Web has used HTTP since 1990.
Practical information systems require more functionality than simple retrieval, including search, front-end update, and annotation. HTTP allows an open-ended set of methods that indicate the purpose of a request. It builds on the reference provided by the Uniform Resource Identifier (URI), as a location (URL), or name (URN), for indicating the resource on which a method is to be applied.
This generic, object-oriented protocol accommodates distributed, collaborative hypermedia information systems. A feature of HTTP is the typing and negotiation of data representation, allowing systems to be built independently of the data being transferred.
HTTP is also used as a generic protocol for communication between user agents and proxies/gateways to other Internet protocols. Such Internet protocols include SMTP, NNTP, FTP, Gopher, and WAIS that have already been in use on the Internet. Through these protocols, HTTP, allows basic hypermedia access to resources available from diverse applications and simplifies the implementation of user agents.
When sending a request to the server, the browser includes a list of the file types it understands. The server can then ensure that it transmits the document as one of those file types. This means that servers and browsers can cope with the existing mass of graphic formats such as GIF, TIFF, and JPEG, to name but a few. You can also exchange all types of data using HTTP. For example, you can exchange documents that contain text, graphics, sound, and video.
The Web uses HTTP for many tasks, such as name servers and distributed object management systems, through extension of its request methods.
HTTP allows the use of an open-ended set of methods to indicate the purpose of a request. It uses information provided in the Uniform Resource Locator (URL) to find a resource. HTTP passes messages in a format similar to that used by Internet Mail and the Multipurpose Internet Mail Extensions (MIME), the form used for Internet mail.
Every time a browser requests a document, it sends the server a list of the MIME types it can support. The server maps file types and file extensions to standard MIME data types. In this manner, Web browsers and servers can negotiate the file type that is transmitted between them.
MIME conventions include a major type name and then a subtype name, separated by a slash, to identify file types. For example, the file type for a text file in the HyperText Markup Language (HTML) is identified as text/html, and a .GIF image file is identified as image/gif.
HTML uses the MIME type text and the subtype html (written as text/html). Web servers and browsers support many other data types such as image/gif.
|
|
See the chapter entitled Multipurpose Internet Mail Extensions for more information. |
HTTP communications employs a disciplined and consistent system for referencing different resources through the use of Uniform Resource Locators (URLs). See the earlier chapter in this book and RFC 1738 for more information about URLs. HTTP also:
All HTTP transactions take place over a TCP/IP connection and usually through the default port 80. An HTTP transaction consists of four phases. The following sequence outlines a simple HTTP transaction.
In this phase, the browser attempts to connect with the Web server. You can observe this happening on most browsers by watching the status line for a message indicating that it is connecting to the HTTP server.
Once the connection between the browser and server occurs, the browser sends a request to the server. This request specifies:
|
|
You can use your Web browser to get information
about the various HTTP methods at:
|
Every time a browser requests a page, it sends the server a list of the MIME types it can support.
When the server fulfills the request it sends a response to the client. At this stage, your browser might display a "reading response" message on the status line. Again, depending on your browser, you might see a "transferring" message on the status line.
The server responds with a status line, including the messages protocol version and a success or error code, followed by a MIME-like message containing server information, entity meta information, and possible body content.
The server tries to send only the MIME types the browser supports. The server responds first with the MIME type of the data that it is sending, then it sends the data.
After the server sends the response, either the client, the server, or both close the connection.
As shown in Figure 7, most HTTP communications are initiated by a user agent and consist of a request to be applied to a resource on some origin server. In the simplest case, this may be accomplished through a single connection between the user agent and the origin server.
Figure 7 Simple HTTP Communication Transaction
This allows hypermedia access to existing Internet protocols like FTP (File Transfer Protocol), NNTP (Network News Transfer Protocol), Gopher, and WAIS (Wide Area Information Server). HTTP also allows communication between user agents and gateways to go through a proxy server without any loss of data.
A more complicated situation than described in the last section occurs when one or more intermediaries are present in the request/response chain. There are three common forms of intermediary: proxy, gateway, and tunnel.
In Figure 8, three intermediaries between the user agent and origin server are shown. A request or response message that travels the whole chain MUST pass through four separate connections. This distinction is important because some HTTP communication options could apply:
Note that each participant could be engaged in multiple, simultaneous communications. For example, B could be receiving requests from many clients other than A, forwarding requests to servers other than C, plus handle requests from A.
Figure 8 Complex HTTP Communication Transaction
Any party to the communication that is not acting as a tunnel can employ an internal cache for handling requests. The effect of a cache is that the request/response chain is shortened if one of the participants along the chain has a cached response applicable to that request.
Figure 9 shows the resulting chain if B has a cached copy of an earlier response from the Origin server (via C) for a request that has not been cached by User Agent or A.
Figure 9 Complex HTTP Communication Transaction With Caching
Not all responses are cacheable, and some requests might contain modifiers that place special requirements on cache behavior.
Client : GET
DEFAULT.HTM HTTP/1.0
HTTP Response
<TITLE>PURVEYOR ENCRYPT WEBSERVER</TITLE>
The client can send any one
or more of the following header lines in a HTTP protocol transaction.
The list of headers is terminated by an empty line.
Table 5 lists some commonly
used header request fields. Note that the HTTP protocol lists
many more header fields. Refer to the URL http://www.w3.org/hypertext/WWW/Protocols/
for additional information.
Table
5 Commonly Used Header Request Fields
Header
Description
From
In Internet mail format, the name of the requesting
user. The Internet mail address in this field does not
have to correspond to the internet host that issued the
request.
Accept
This field contains a list of Content-Type
values the client can accept in the response to this request.
Accept-Encoding
Similar to Accept, but lists the Content-Encoding
types which are acceptable in the response. For example:
Accept-Encoding: x-zip
Accept-Language
Lists the Language values preferable in the response.
A response in an unspecified language is not illegal.
User-Agent
If present gives the software program used by
the original client. This is for statistical purposes
and to trace protocol violations.
Referer
This optional header field allows the client to specify,
for the server's benefit, the address ( URI ) of the document
from which the URI in the request was obtained.
Authorization
If this line is present, it contains authorization information.
Basic scheme
Password and IP address protection; part of the WWW Common Library
User/Password
Typically from a USER environment variable or
prompted for, with an optional password separated
by a colon. Without a password, this provides very
low level security. With the password, it provides
a low-level security. For example, Authorization:
user fred:mypassword
Charge-To
If present contains account information for the
costs of the application of the method requested.
If-Modified-Since
This request header is used with the GET command
to make it conditional. If the requested document has
not changed since the time specified in this field the
document is not sent. Instead, a Not Modified 304 reply
is sent.
Pragma
Pragma directives should be understood by servers to
which they are relevant, e.g. a proxy server. Currently
only one pragma is defined: no-cache
A large portion of information available on
the Web is in English and the western-European writing
system; however, use of other writing systems and languages
is increasing. Organizations and standards groups are
enhancing the Web protocols, standards, and tools to meet
global community needs. Much of this work centers around
language negotiation and character set encoding.
Documents transmitted
with HTTP that are of type text/* (text/html, text/plain,
etc.) can have a charset parameter, that specifies the character
encoding of the document. The default for text/html is ISO-8859-1;
for other text documents it is US-ASCII.
Any character encoding that has been registered
with IANA can be used. However, it may be too
much to ask of a browser to understand all of them.
The HTTP status
codes fall into five categories.
See Appendix A for a list of the codes in each category.
Basically, the categories are:
You can use your Web browser to get more
information about HTTP. A good place to start is with the RFC 1945
or with the following URL:
http://www.w3.org/hypertext/WWW/Protocols/HTTP/HTTP2.html
Accept: text/plain
Accept: image/gif/
Accept: image/x-portable-bitmap
User-Agent: NCSA WinMosaic 1.0
Server : HTTP/1.0 200 OK
MIME-Version: 1.0
Content-Type: text/html
Date: Thursday, 3-AUG-96 23:37:8 GMT
Content-Length: 1618
HTTP Request Fields
For example, when a request is passed through a gateway,
then the original issuer's address should be used.
Example: Accept: text/plain, text/html
If no Accept field is present, then it is
assumed that text/plain and text/html are accepted.
The set given may vary from request to request from the
same user.
Accept-Encoding: x-compress or
For example: User-Agent: LII-Cello/1.0 libwww/2.5
This allows a server to generate lists of back-links to
documents, for interest, logging, etc. It allows bad links to
be traced for maintenance. For example,
Referer: http://www.w3.org/hypertext/DataSources/Overview.html
When present, the proxy server should not return a
document from the cache even though it has not expired,
but it should always request the document from the actual server.
Pragmas should be passed through by proxies even though
they might have significance to the proxy itself. This is
necessary in cases when the request has to go through many
proxies, and the pragma should affect all of them.
Internationalization
Character Set Parameters
HTTP Status Codes
HTTP Reference