Given the variety of computer hardware and software in use throughout the world, creating Web documents might have become a daunting task if it werent for what can be called a universal markup language, the HyperText Markup Language, or HTML.
You use HTML to create an easy point-and-click interface for network documents served on the Web. HTML combines many old, familiar commands to produce something entirely new. As a markup language with HyperText logic and multimedia integration, HTML allows a file to contain many diverse commands.
HTML 3.2 is the World Wide Web Consortiums (W3C) new specification for HTML. W3C is a Massachusetts Institute of Technology-based group whose membership comprises many companies in the forefront of internet technology, and whose responsibilities include the development of new HTML specifications.
HTML Version 3.2 is the latest Internet draft and is compatible with HTML Version 2.0.
You use, or read, documents on
the Web very differently than you read a printed book. You move
from one point of information to another as needed, not in a linear
fashion as you would read a novel. HyperText [ HyperText can also
be called HyperMedia because it now includes graphics and sound.
Most often the term HyperText is still used, however.]
allows you to do this; you can go forward,
backward, jump to related topics with hyperlinks, or just navigate
the text based on what interests you at the moment. With the Web,
you can jump from a document located on a server in New Zealand to
one on a server in Finland to yet another on a server in Greece.
An HTML file may also contain
links to other files. Using your browser, you
can point-and-click to select a link and the browser automatically
executes the associated command. The usual command is "GET"
another HTML document. Documents link to other documents with links
to still other documents and so on. These links form the World Wide
Web of interconnected documents.
In addition to the links to other documents, HTML
documents may contain embedded graphics, sound files, animation,
or anything else a computer system can handle. A set of file naming
conventions (MIME) define what kind of information
is in each file that might be linked to an HTML document.
HyperText has been a useful concept for some time, but
combining the logic of HyperText with the communications of the Internet
created a whole new array of powerful communications tools.
A pointer to a document gets the latest version
of that document, whether at the same site or at some other site
across the world.
Creating HyperText documents with HTML means you can
combine the logic of HyperText with the communications capabilities
of the Web. With HTML, you can create links to
other documents, and embed graphics, sound, animation, or anything
else a computer system can handle. MIME defines
the kind of information in each file that can be linked to an HTML
document.
As with any tool, however, there are certain
advantages and disadvantages.
Markup languages have many specific advantages.
The source files are compact and portable and do not depend on
specific font installations or output capabilities unavailable
on some output systems.
In a markup language environment, the same source
file can be processed anywhere with acceptable output results. The
formatters, systems, and system capabilities can all be different
from one location to another. While each produce different results,
all are acceptable for the output environment. The source file, using
a markup language such as HTML, is designed to
avoid system-dependent formatting commands.
The compact size, portability, and flexibility of
markup languages was the reason that this approach was chosen as
the method for serving documents to network browsers
via the Web.
When you create a source file
using a markup language, you do not have any direct control over
the final, formatted appearance of an HTML document. With
HTML, the source file only defines what the
displayable entities are, such as heading levels. The final
formatting is done by the browser program that reads the source file.
If you are used to all the format control of a
WYSIWYG editor, this concept can be very hard
to understand. You can mark a piece of text as a level 1 head,
but you cannot define how a level 1 head appears on someones
screen. It might be left-justified or centered, bold or italic,
spaced away from the following text or right next to it. All
these choices are up to the browser. Even the location of line
breaks is a browser function unless you specify a hard break
in the source file.
The source file contains defined paragraph breaks,
but the way those breaks are executed is entirely up to the
users browser program. You cannot define paragraph spacing
or indentation or other formatting principles.
HTML is a subset of the
Standard Generalized Markup Language (SGML). Using HTML, you
create documents, or pages, for distribution on the World Wide
Web with a formatted "look." You can even include
sound if you want. Plain text on the Internet was never like
this. You can add sound and graphic commands to documents and
link them to other documents on the Web to create multimedia
pages.
SGML was
developed in 1985 as a way to define the content and structure
of documents. Markup languages use a series of "tags"
to define, or mark, each text element; for example, to mark text
as a paragraph, a bulleted list item, a numbered list item, or a
hyperlink.
You use HTML to mark up, or embed your instructions
about the document contents directly in the file. The instructions
range from designating paragraph text, a title line, different header
levels and sections to include graphics and hyperlinks to other
documents. This file is the "source" document.
You then process the source file using a
formatting program to produce the final result.
What the document looks like, that is, the
type style and size, how much a paragraph indents, if a bullet
or square is used to set off list commands, depends on the
browser the reader is using. While this means an HTML
author cannot control the exact output of a Web document, it means
that anyone anywhere can access it.
Unlike earlier word processors that are used to
design how a document looks, HTML defines what the
elements of a document are and how they come together to form a
document. What the elements look like on the screen or a printed
page is determined by your browser software
To create an HTML source file, it is essential
that you use a text editor that fits your needs.
Do you need ease of finding, moving, or changing text within one
document and from different windows? Or do you need specific
scientific, mathematical, or multilingual capabilities?
Some software editing tools are compliant with
HTML completely, and some to lesser degrees. The best choice is
an editor that is SGML, and therefore HTML,
compliant. This causes fewer problems; it is easier to maintain
and re-use existing documents created with SGML.
The capability of HTML is expanding as new commands
and revisions are being tested and added to the HTML standard.
HTML tags are not case-sensitive.
You may use uppercase letters, lowercase, or any mixture without
altering the meaning of the command. The only exception is the
special character tags, which require lower case letters.
HTML uses
keywords called tags to mark the source file text. The tags are
what define the text elements. Tags have these characteristics:
A file written in "strict" HTML
tags has the structure shown in Figure
10. The text in bold in the example is for information only
and is not part of the actual code.
Every HTML 3.2 document begins with the following
<!DOCTYPE> declaration
tag to distinguish HTML 3.2 from other versions of HTML. Follow
<!DOCTYPE> with the <HTML>, <HEAD>, <TITLE>,
and <BODY> tags
. The TITLE tag is required. All
other tags are optional.
Figure 10 Strict HTML Tag Use
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
Version number
<HTML> Begins an HTML file.
<HEAD><TITLE> Example of Strict HTML Structure Indicates document title
<BODY> A paragraph goes here. Begins the document text.
<P> Another paragraph goes here. Separates text from the
previous paragraph.
Note that in practice, many documents don't contain a
<!DOCTYPE> declaration. This makes it difficult
for browsers, validation tools, and other software to determine the
version of HTML used in the document.
As the browser encounters each tag it prepares the
output accordingly. A browser uses a different style for the title
than for a paragraph and changes styles according to the start and
end tags.
Although most browsers treat some of these tags as
optional, some browsers and conversion utilities require them.
Consequently, this is called the "strict"
HTML format, since it strictly meets
all requirements.
The tag <HTML>
means "begin HTML" and </HTML>
means "end HTML." These should be at the beginning and
end of the file.
HTML uses the angle bracket (< and >)
symbols to enclose a tag. Also, most tags are "bracketed"
tags: a start tag and end tag, which has the forward slash added just
after the opening angle bracket.
The paragraph separator tagtag
is <P>
and is an exception to the bracketing convention. It separates
paragraphs, rather than serving as a "begin and end"
delimiter.
Line breaks have no meaning in an HTML file. Using line breaks judiciously can make your source
file more readable, but they have no effect on the final output that
the reader sees. This is why the paragraph separator is so critical.
Group the tags and associated text together, and
separate the group from others with blank lines. This structure
means you can quickly tell if you have left out any start or end
tags or other necessary parts of the file or text.
The <HEAD> tag
defines a document heading that is not part of the paragraph text.
The tag begins with <HEAD> and ends with </HEAD>. Although
other information may be included, a document title is normally the
only qualifier. The title begins with the <TITLE>
tag and ends with the </TITLE> tag.
You can use the following qualifiers with the <HEAD>
tag:
TITLE defines the document
title, and is always needed.
ISINDEX for simple keyword searches,
see PROMPT attribute.
BASE defines base URL for resolving
relative URLs.
STYLE reserved for future use with
style sheets.
SCRIPT reserved for future use with
scripting languages.
META used to supply meta information
as name/value pairs
LINK used to define relationships
with other documents.
The TITLE, STYLE, and SCRIPT qualifiers
require both start and end tags. The other tags do not use end tags.
Note that nonconforming browsers do not display the contents of STYLE
and SCRIPT tags.
The <BODY> tag
identifies the beginning of the document
body. The key attributes include: BACKGROUND,
BGCOLOR, TEXT, LINK, VLINK, and ALINK. You can use these to set
a repeating background image, plus background and foreground
colors for normal text and HyperText links. Colors are given
in RGB as hexadecimal numbers (for example,. "#C0FFC0")
or as one of 16 color names:
aqua, black, blue, fuchsia,
These colors are the original standard 16
colors supported with the Windows VGA palette.
Most tags that can appear in the
document body fall into one of two groups: block level
tags, which cause line breaks, and text level tags that
do not. Common block level tags include <H1> to <H6>
(headers), <P> (paragraphs) <LI> (list items), and
<HR> (horizontal rules). Common text level tags include
<EM>, <I>, <B> and <FONT> (character
emphasis), <A> (hypertext links), <IMG> and <APPLET>
(embedded objects) and <BR> (line breaks).
Block tags generally act as containers for
text level and other block level tags, excluding heading
and address tags. Text level tags can contain other text
level tags and attributes only. The exact model depends
on the tag.
HTML has six heading levels, but some browsers
format the "lower" levels so poorly that it is best
to use only the top three levels. The tags are in the bracket
format. The top level (largest) heading is level 1; the start
tag is <H1> and the
end tag is </H1>. A level 2 heading begins with <H2>,
and so on through <H6> for the sixth level.
Extending the previous example yields the HTML
file with the heading tags shown in
Figure 11.
Figure 11 HTML File with Head Tags
Ending a heading includes an implied line break.
The actual choice of the font, point size, justification, and
other elements of typography is left to the
users browser. The HTML markup simply designates the
text that is logically the text of the heading.
Your individual file may be part of a logical
structure of many files, so you do not have to start with a
level 1 heading. You should not skip levels, however. Some
browsers cannot tolerate going from a level 2 heading to a
level 4 without first encountering a level 3.
You can use the ALIGN attribute to set
the text alignment within a heading, for example:
<H1 ALIGN=CENTER> ... centered heading ... </H1>
You use the ADDRESS
tag to enter information about the
author of the document. It requires start and end tags.
Table 8 contains a list and description of the block level
tags.
List items can
contain block and text level items, although you cannot
use headings and address tags. Unordered lists take the
form shown in Figure 12.
Figure 12 Unordered List Tags in HTML
You can use the TYPE attribute
to set the bullet style for the unnumbered list identifier.
Ordered (numbered) lists take the form shown in
Figure 13.
Figure 13 Ordered List Tags in HTML
You can use the OL START attribute to
initialize the sequence number. You can reset it later
with the VALUE attribute to list identifier tags.
Definition lists take the form shown in
Figure 14.
Figure 14 Definition List Tags in HTML
The DT tags can act only as a container
for text level tags. The DD tag can hold block level tags
as well, excluding heading and address tags.
HTML 3.2 supports
tables. Tables take the general form shown in Figure
15.
Figure 15 Table Tags in HTML
The attributes associated with
<TABLE> in this example
are all optional. The default TABLE formatting has
no surrounding border. Generally, the table is sized
automatically, but you can also set the table width
using the WIDTH attribute. The BORDER, CELLSPACING,
and CELLPADDING attributes provide further control
over the table's appearance. Captions can be at the
top or bottom of the table depending on the ALIGN
attribute.
Each table row starts
with a <TR> tag; you can omit
the end tag. You define table cells with the <TD>
tag for data and <TH> tag for headers.
Like <TR>, you do not have to include the trailing end tags,
</TH> and </TD>.
The </TH> and </TD> tags support
several attributes. These attributes include <ALIGN>
and <VALIGN> for aligning cell content, <ROWSPAN>
and <COLSPAN> for cells that span more than one row or
column. A cell can contain a wide variety of other block and
text level tags, including form fields and other tables.
Text level tags do not cause paragraph breaks.
Text level tags control character formatting.
Generally, you can nest text level tags that define character
styles. They can contain other text level tags but not block
level tags.
Character formatting is
split into physical format tags
and logical format tags. There is
considerable debate about which method is better, but the
physical approach is certainly simpler. Text level tags also
include form field tags.
In addition to the default text font (usually
a proportional font such as Times New Roman), the HTML standard
supports the following physical character formats:
<B>
Bold text (usually the bold font of the standard body text)
<I>
Italic text (usually the italic font of the standard body text)
<TT>
Typewriter or monospaced text (a fixed-space font such as Courier New)
<STRIKE>
Strike-through text style
<BIG>
Places text in a large font
<SMALL>
Places text in a small font
<SUB>
Places text in subscript style
<SUP>
Places text in superscript style
These all require the matching end tags; for example,
</B>, </I>, and </TT>.
Some browsers are rather fussy about nesting these
tags properly. If you begin a selection of bold and italic text with
<B><I>, you should end it with the sequence </I></B>
so the italic section is properly nested within the bold section.
A logical character format
leaves the actual decision about physical format to the specific
browser. Many browsers allow you to define exactly how the logical
formats are represented. In most cases, the four most common
logical formats map directly to the physical formats:
<STRONG>
Strong emphasis; typically a bold font.
<EM>
Basic emphasis; typically an italic font.
<CODE>
Computer code and used with extracts from program code (same as TT)
<SAMP>
Used for sample output from programs and scripts (same as TT).
In addition, there are less common logical formats.
One of them is usually a combination of bold and typewriter, such as
the following:
<KBD>
Used to represent text that the user types
The following logical tags are usually rendered as italic:
<DFN>
Used in association with a word being defined
<CITE>
Used in association with citations or references to a book or film,
etc.
<VAR>
Used for variables or arguments to tags.
Remember that all the logical formats are subject to
the definitions of the individual browserincluding any customization
of formats that the user has chosen.
The logical tags also have a matching end tag formed
by adding a slash character (</KBD>, for example).
The form
field tags include <INPUT>,
<SELECT>,
and <TEXTAREA>.
No end tag is required with the INPUT tag.
<INPUT>, <SELECT>, and <TEXTAREA> tags are
allowed only within FORM tags.
You can use the <INPUT> tag for a variety
of form fields including single line text fields, password fields,
checkboxes, radio buttons, submit and reset buttons, hidden fields,
file upload, and image buttons.
The <SELECT> tag requires
the start and end tags and contains one or more <OPTION> tags.
You can use the <SELECT> tag for single or multiselection menus.
<TEXTAREA> tags require
start and end tags. You can use <TEXTAREA> to define multiline
text fields. The content of the tag initializes the field.
The
special text-level tags include A(nchor), IMG, APPLET, FONT,
BR, and MAP.
You use the <A> (anchor) tag to define HyperText links
and their location; for example:
The way to <a href="Tolkien.html">middle
earth</a>.
The anchor attributes include NAME, HREF, REL, REV and TITLE.
Use HREF to supply a URL identifying the linked document or image, etc.
Use NAME to associate a name with this part of a document
for use with URLs that specify a named section of a document.
Do not nest anchors.
The <APPLET> tag requires start
and end tags. All browsers enabled with Java support this tag.
It allows you to embed a JAVA applet into HTML
documents to include an animation.
The attributes associated with <APPLET> include: CODE,
CODEBASE, NAME, ALT, ALIGN, WIDTH, HEIGHT, HSPACE and VSPACE.
<APPLET> uses associated PARAM tags to pass parameters to the applet.
Text Flow Around Images
You use the <IMG>
tag to insert images into HTML
documents. Using an end tag with this tag is forbidden.
The attributes are: SRC, ALT, ALIGN, WIDTH, HEIGHT, BORDER,
HSPACE, VSPACE, USEMAP and ISMAP, for example:
<IMG SRC="canyon.gif"
ALT="Grand Canyon">
You can position images vertically relative to the
current text line or floated to the left or right. And,
you can control the text flow by using the <BR>
tag with the CLEAR attribute. Using an end tag with this
tag is forbidden.
The <FONT> tag requires
start and end tags. This tag allows
you to change the font size and/or color for the enclosed
text. The attributes include: SIZE and COLOR. Colors are given
as RGB in hexadecimal notation or as one of 16 color names
(see The BODY Tag for additional information).
Note that you use the <BR>
tag to force a line break. You can
use the CLEAR attribute to move down past floating images
on either margin, for example:
<BR CLEAR=LEFT>.
The <MAP> tag requires
start and end tags. This tag allows you to define
client-side image maps. The <MAP> tag
contains one or more AREA tags that specify hotspots
on the associated image map and binds
these hotspots to URLs.
Table 9 lists the special characters included in the HTML
specification from the Numeric and Special Graphic entity
set, along with the character name, syntax for use, and
description. These four main special characters
are specifically included in RFC 1866.
Glyph
Name
Syntax
Description
<
lt
<
Less than sign
>
gt
>
Greater than sign
&
amp
&
Ampersand
"
cquot
"
Double quote sign
The leading ampersand is required.
The ampersand and semicolon delimit an entity name
which the user agent replaces with a special character.
The trailing semicolon is necessary when the character
following the entity is not a space or end of line. It
is never incorrect to include the trailing semicolon.
Some browsers always require the
trailing semicolon. RFC 1866 specifies "
as being a double quote, but some older browsers
display it as a single quote. The remaining special
character entity names defined in RFC 1866, as well
as some proposed special character entity names, are
listed in:
http://www.sandia.gov/sci_compute/symbols.html
Use them to find out what they produce
on your browser. Not all of those special characters
are recognized by all browsers.
All entity names are defined as case sensitive. The entity
name of many of the special characters intentionally includes
mixed case that must be entered exactly as specified. Since
most browsers are insensitive to case for HTML names, many
browsers do not require the entity names of the main four
special characters to be lower case.
You can use your Web browser to get
more information about HTML. Some good places to start
include:
http://www.w3.org/hypertext/WWW/MarkUp/MarkUp.html
Be aware that not all servers and browsers
support all of the HTML tags you can find as you explore the
capabilities of this markup language. The HTML language is
under development and new standards have not been solidified.
Markup Languages
Advantages of Markup Languages
Disadvantages of Markup Languages
HTML Basics
HTML Tags
Structure
</TITLE></HEAD> Ends the document title.
<P> Add as many paragraphs as you need, then end the document.
</BODY> Ends the document text.
</HTML> Ends the HTML file.
The HEAD Tag
The BODY Tag
gray, green, lime, maroon,
navy, olive, purple, red,
silver, teal, white, and yellow
Block and Text Level Tags
Headings
<HTML>
<HEAD><TITLE>Sample File</TITLE></HEAD>
<BODY>
<H1>Sample File</H1>
This is a sample of an HTML file.
<H2>Structure</H2>
It includes the structure tags such as HEAD and
BODY.
<P>
The paragraph separator (P) is also sometimes considered
a structure tag.
<H2>Headings</H2>
This file also shows some heading tags. </BODY></HTML>
The ADDRESS Tag
Block Level Tags
Lists
<UL>
<LI> ... first list item
<LI> ... second list item
repeat as necessary
</UL>
<OL>
<LI> ... first list item
<LI> ... second list item
...
</OL>
<DL>
<DT> term name
<DD> term definition
</DL>
Tables
<TABLE BORDER=3 CELLSPACING=2 CELLPADDING=2 WIDTH="80%">
<CAPTION> ... table caption ... </CAPTION>
<TR><TD> first cell <TD> second cell
<TR> ...
...
</TABLE>
Text Level Tags
Physical Format Tags
Logical Formats
Form Field Tags
Special Text-Level Tags
Special Characters
HTML Reference
http://www.ncsa.uiuc.edu/General/Internet/WWW/HTMLPrimer.html#A1.1