Previous Page TOC Next Page

HTML Page Design and Creation

What is HTML?

Given the variety of computer hardware and software in use throughout the world, creating Web documents might have become a daunting task if it weren’t for what can be called a universal markup language, the HyperText Markup Language, or HTML.

You use HTML to create an easy point-and-click interface for network documents served on the Web. HTML combines many old, familiar commands to produce something entirely new. As a markup language with HyperText logic and multimedia integration, HTML allows a file to contain many diverse commands.

HTML 3.2 is the World Wide Web Consortium’s (W3C) new specification for HTML. W3C is a Massachusetts Institute of Technology-based group whose membership comprises many companies in the forefront of internet technology, and whose responsibilities include the development of new HTML specifications.

HTML Version 3.2 is the latest Internet draft and is compatible with HTML Version 2.0.

HyperText, the Web, and HTML

You use, or read, documents on the Web very differently than you read a printed book. You move from one point of information to another as needed, not in a linear fashion as you would read a novel. HyperText [ HyperText can also be called HyperMedia because it now includes graphics and sound. Most often the term HyperText is still used, however.] allows you to do this; you can go forward, backward, jump to related topics with hyperlinks, or just navigate the text based on what interests you at the moment. With the Web, you can jump from a document located on a server in New Zealand to one on a server in Finland to yet another on a server in Greece.

An HTML file may also contain links to other files. Using your browser, you can point-and-click to select a link and the browser automatically executes the associated command. The usual command is "GET" another HTML document. Documents link to other documents with links to still other documents and so on. These links form the World Wide Web of interconnected documents.

In addition to the links to other documents, HTML documents may contain embedded graphics, sound files, animation, or anything else a computer system can handle. A set of file naming conventions (MIME) define what kind of information is in each file that might be linked to an HTML document.

HyperText has been a useful concept for some time, but combining the logic of HyperText with the communications of the Internet created a whole new array of powerful communications tools. A pointer to a document gets the latest version of that document, whether at the same site or at some other site across the world.

Markup Languages

Creating HyperText documents with HTML means you can combine the logic of HyperText with the communications capabilities of the Web. With HTML, you can create links to other documents, and embed graphics, sound, animation, or anything else a computer system can handle. MIME defines the kind of information in each file that can be linked to an HTML document.

As with any tool, however, there are certain advantages and disadvantages.

Advantages of Markup Languages

Markup languages have many specific advantages. The source files are compact and portable and do not depend on specific font installations or output capabilities unavailable on some output systems.

In a markup language environment, the same source file can be processed anywhere with acceptable output results. The formatters, systems, and system capabilities can all be different from one location to another. While each produce different results, all are acceptable for the output environment. The source file, using a markup language such as HTML, is designed to avoid system-dependent formatting commands.

The compact size, portability, and flexibility of markup languages was the reason that this approach was chosen as the method for serving documents to network browsers via the Web.

Disadvantages of Markup Languages

When you create a source file using a markup language, you do not have any direct control over the final, formatted appearance of an HTML document. With HTML, the source file only defines what the displayable entities are, such as heading levels. The final formatting is done by the browser program that reads the source file.

If you are used to all the format control of a WYSIWYG editor, this concept can be very hard to understand. You can mark a piece of text as a level 1 head, but you cannot define how a level 1 head appears on someone’s screen. It might be left-justified or centered, bold or italic, spaced away from the following text or right next to it. All these choices are up to the browser. Even the location of line breaks is a browser function unless you specify a hard break in the source file.

The source file contains defined paragraph breaks, but the way those breaks are executed is entirely up to the user’s browser program. You cannot define paragraph spacing or indentation or other formatting principles.

HTML Basics

HTML is a subset of the Standard Generalized Markup Language (SGML). Using HTML, you create documents, or pages, for distribution on the World Wide Web with a formatted "look." You can even include sound if you want. Plain text on the Internet was never like this. You can add sound and graphic commands to documents and link them to other documents on the Web to create multimedia pages.

SGML was developed in 1985 as a way to define the content and structure of documents. Markup languages use a series of "tags" to define, or mark, each text element; for example, to mark text as a paragraph, a bulleted list item, a numbered list item, or a hyperlink.

You use HTML to mark up, or embed your instructions about the document contents directly in the file. The instructions range from designating paragraph text, a title line, different header levels and sections to include graphics and hyperlinks to other documents. This file is the "source" document.

You then process the source file using a formatting program to produce the final result.

What the document looks like, that is, the type style and size, how much a paragraph indents, if a bullet or square is used to set off list commands, depends on the browser the reader is using. While this means an HTML author cannot control the exact output of a Web document, it means that anyone anywhere can access it.

Unlike earlier word processors that are used to design how a document looks, HTML defines what the elements of a document are and how they come together to form a document. What the elements look like on the screen or a printed page is determined by your browser software

To create an HTML source file, it is essential that you use a text editor that fits your needs. Do you need ease of finding, moving, or changing text within one document and from different windows? Or do you need specific scientific, mathematical, or multilingual capabilities?

Some software editing tools are compliant with HTML completely, and some to lesser degrees. The best choice is an editor that is SGML, and therefore HTML, compliant. This causes fewer problems; it is easier to maintain and re-use existing documents created with SGML.

The capability of HTML is expanding as new commands and revisions are being tested and added to the HTML standard.

HTML tags are not case-sensitive. You may use uppercase letters, lowercase, or any mixture without altering the meaning of the command. The only exception is the special character tags, which require lower case letters.

HTML Tags

HTML uses keywords called tags to mark the source file text. The tags are what define the text elements. Tags have these characteristics:

Structure

A file written in "strict" HTML tags has the structure shown in Figure 10. The text in bold in the example is for information only and is not part of the actual code.

Every HTML 3.2 document begins with the following <!DOCTYPE> declaration tag to distinguish HTML 3.2 from other versions of HTML. Follow <!DOCTYPE> with the <HTML>, <HEAD>, <TITLE>, and <BODY> tags . The TITLE tag is required. All other tags are optional.

Figure 10 Strict HTML Tag Use


<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> Version number

<HTML> Begins an HTML file.

<HEAD><TITLE> Example of Strict HTML Structure Indicates document title

</TITLE></HEAD> Ends the document title.

<BODY> A paragraph goes here. Begins the document text.

<P> Another paragraph goes here. Separates text from the previous paragraph.
<P> Add as many paragraphs as you need, then end the document.
</BODY> Ends the document text.
</HTML> Ends the HTML file.


Note that in practice, many documents don't contain a <!DOCTYPE> declaration. This makes it difficult for browsers, validation tools, and other software to determine the version of HTML used in the document.

As the browser encounters each tag it prepares the output accordingly. A browser uses a different style for the title than for a paragraph and changes styles according to the start and end tags.

Although most browsers treat some of these tags as optional, some browsers and conversion utilities require them. Consequently, this is called the "strict" HTML format, since it strictly meets all requirements.

The tag <HTML> means "begin HTML" and </HTML> means "end HTML." These should be at the beginning and end of the file.

Undisplayed Graphic

HTML uses the angle bracket (< and >) symbols to enclose a tag. Also, most tags are "bracketed" tags: a start tag and end tag, which has the forward slash added just after the opening angle bracket.

The paragraph separator tagtag is <P> and is an exception to the bracketing convention. It separates paragraphs, rather than serving as a "begin and end" delimiter.

Line breaks have no meaning in an HTML file. Using line breaks judiciously can make your source file more readable, but they have no effect on the final output that the reader sees. This is why the paragraph separator is so critical.

Group the tags and associated text together, and separate the group from others with blank lines. This structure means you can quickly tell if you have left out any start or end tags or other necessary parts of the file or text.

The HEAD Tag

The <HEAD> tag defines a document heading that is not part of the paragraph text. The tag begins with <HEAD> and ends with </HEAD>. Although other information may be included, a document title is normally the only qualifier. The title begins with the <TITLE> tag and ends with the </TITLE> tag.

You can use the following qualifiers with the <HEAD> tag:

TITLE — defines the document title, and is always needed.

ISINDEX — for simple keyword searches, see PROMPT attribute.

BASE — defines base URL for resolving relative URLs.

STYLE — reserved for future use with style sheets.

SCRIPT — reserved for future use with scripting languages.

META— used to supply meta information as name/value pairs

LINK — used to define relationships with other documents.

The TITLE, STYLE, and SCRIPT qualifiers require both start and end tags. The other tags do not use end tags. Note that nonconforming browsers do not display the contents of STYLE and SCRIPT tags.

The BODY Tag

The <BODY> tag identifies the beginning of the document body. The key attributes include: BACKGROUND, BGCOLOR, TEXT, LINK, VLINK, and ALINK. You can use these to set a repeating background image, plus background and foreground colors for normal text and HyperText links. Colors are given in RGB as hexadecimal numbers (for example,. "#C0FFC0") or as one of 16 color names:

aqua, black, blue, fuchsia,
gray, green, lime, maroon,
navy, olive, purple, red,
silver, teal, white, and yellow

These colors are the original standard 16 colors supported with the Windows VGA palette.

Block and Text Level Tags

Most tags that can appear in the document body fall into one of two groups: block level tags, which cause line breaks, and text level tags that do not. Common block level tags include <H1> to <H6> (headers), <P> (paragraphs) <LI> (list items), and <HR> (horizontal rules). Common text level tags include <EM>, <I>, <B> and <FONT> (character emphasis), <A> (hypertext links), <IMG> and <APPLET> (embedded objects) and <BR> (line breaks).

Block tags generally act as containers for text level and other block level tags, excluding heading and address tags. Text level tags can contain other text level tags and attributes only. The exact model depends on the tag.

Headings

HTML has six heading levels, but some browsers format the "lower" levels so poorly that it is best to use only the top three levels. The tags are in the bracket format. The top level (largest) heading is level 1; the start tag is <H1> and the end tag is </H1>. A level 2 heading begins with <H2>, and so on through <H6> for the sixth level.

Extending the previous example yields the HTML file with the heading tags shown in Figure 11.

Figure 11 HTML File with Head Tags


<HTML>
<HEAD><TITLE>Sample File</TITLE></HEAD>
<BODY>
<H1>Sample File</H1>
This is a sample of an HTML file.
<H2>Structure</H2>
It includes the structure tags such as HEAD and
BODY.
<P>
The paragraph separator (P) is also sometimes considered
a structure tag.
<H2>Headings</H2>
This file also shows some heading tags. </BODY></HTML>

Ending a heading includes an implied line break. The actual choice of the font, point size, justification, and other elements of typography is left to the user’s browser. The HTML markup simply designates the text that is logically the text of the heading.

Your individual file may be part of a logical structure of many files, so you do not have to start with a level 1 heading. You should not skip levels, however. Some browsers cannot tolerate going from a level 2 heading to a level 4 without first encountering a level 3.

You can use the ALIGN attribute to set the text alignment within a heading, for example:

<H1 ALIGN=CENTER> ... centered heading ... </H1>

The ADDRESS Tag

You use the ADDRESS tag to enter information about the author of the document. It requires start and end tags.

Block Level Tags

Table 8 contains a list and description of the block level tags.

Table 8 Block Level Tags

Tag

Style

Description

H

Headings 1 through 6

Require start and end tags. Use to establish heading levels one through six. You can use the ALIGN attribute to set the text alignment within a heading.

P

Paragraph

The paragraph tag requires a start tag, but does not have an end tag. Use the ALIGN attribute to set the text alignment for a paragraph.
Example: <P ALIGN=RIGHT>

UL

Unordered lists

These require start and end tags, and contain one or more LI tags representing individual list items.

OL

Ordered (numbered) lists

These require start and end tags, and contain one or more LI tags representing individual list items.

DL

Definition lists

These require start and end tags and contain at least one definition term (DT) tag and one definition description (DD) tag.

PRE

Preformatted text

Requires start and end tags. These tags are rendered with a monospaced font and preserve layout defined by white space and line break characters.

DIV

Document divisions

Requires start and end tags. It groups related tags together. You can use this tag with the ALIGN attribute to set the text alignment of the block tags it contains. ALIGN can be one of LEFT, CENTER, or RIGHT.

CENTER

Text alignment

Requires start and end tags. Use to center text lines enclosed by the CENTER tag.

BLOCK-QUOTE

Quoted text

Requires start and end tags. Use to enclose extended quotations. Browsers typically render this format with indented margins.

FORM

Fill-out form

Requires start and end tags. Use to define a fill-out form for processing by HTTP servers. The attributes include ACTION, METHOD, and ENCTYPE. You can not nest FORM tags.



ISINDEX

Primitive HTML forms

Not a container, so the end tag is forbidden. This tag predates FORM. Use for simple kinds of forms that have a single text input field, implied by this tag.

HR

Horizontal rules

Not a container, so the end tag is forbidden. Attributes include ALIGN, NOSHADE, SIZE, and WIDTH.

TABLE

Can be nested

Requires start and end tags. Each table starts with an optional CAPTION followed by one or more TR tags defining table rows. Each row has one or more cells defined by table header (TH) and table data (TD) tagtags. Attributes for the TABLE tag include WIDTH, BORDER, CELLSPACING, and CELLPADDING.

Lists

List items can contain block and text level items, although you cannot use headings and address tags. Unordered lists take the form shown in Figure 12.

Figure 12 Unordered List Tags in HTML


<UL>
<LI> ... first list item
<LI> ... second list item
repeat as necessary
</UL>

You can use the TYPE attribute to set the bullet style for the unnumbered list identifier.

Ordered (numbered) lists take the form shown in Figure 13.

Figure 13 Ordered List Tags in HTML


<OL>
<LI> ... first list item
<LI> ... second list item
...
</OL>

You can use the OL START attribute to initialize the sequence number. You can reset it later with the VALUE attribute to list identifier tags.

Definition lists take the form shown in Figure 14.

Figure 14 Definition List Tags in HTML


<DL>
<DT> term name
<DD> term definition ...
</DL>

The DT tags can act only as a container for text level tags. The DD tag can hold block level tags as well, excluding heading and address tags.

Tables

HTML 3.2 supports tables. Tables take the general form shown in Figure 15.

Figure 15 Table Tags in HTML


<TABLE BORDER=3 CELLSPACING=2 CELLPADDING=2 WIDTH="80%">
<CAPTION> ... table caption ... </CAPTION>
<TR><TD> first cell <TD> second cell
<TR> ...
...
</TABLE>

The attributes associated with <TABLE> in this example are all optional. The default TABLE formatting has no surrounding border. Generally, the table is sized automatically, but you can also set the table width using the WIDTH attribute. The BORDER, CELLSPACING, and CELLPADDING attributes provide further control over the table's appearance. Captions can be at the top or bottom of the table depending on the ALIGN attribute.

Each table row starts with a <TR> tag; you can omit the end tag. You define table cells with the <TD> tag for data and <TH> tag for headers. Like <TR>, you do not have to include the trailing end tags, </TH> and </TD>.

The </TH> and </TD> tags support several attributes. These attributes include <ALIGN> and <VALIGN> for aligning cell content, <ROWSPAN> and <COLSPAN> for cells that span more than one row or column. A cell can contain a wide variety of other block and text level tags, including form fields and other tables.

Text Level Tags

Text level tags do not cause paragraph breaks. Text level tags control character formatting. Generally, you can nest text level tags that define character styles. They can contain other text level tags but not block level tags.

Character formatting is split into physical format tags and logical format tags. There is considerable debate about which method is better, but the physical approach is certainly simpler. Text level tags also include form field tags.

Physical Format Tags

In addition to the default text font (usually a proportional font such as Times New Roman), the HTML standard supports the following physical character formats:

<B> — Bold text (usually the bold font of the standard body text)

<I> — Italic text (usually the italic font of the standard body text)

<TT> — Typewriter or monospaced text (a fixed-space font such as Courier New)

<STRIKE> — Strike-through text style

<BIG> — Places text in a large font

<SMALL> — Places text in a small font

<SUB> — Places text in subscript style

<SUP> — Places text in superscript style

These all require the matching end tags; for example, </B>, </I>, and </TT>.

Some browsers are rather fussy about nesting these tags properly. If you begin a selection of bold and italic text with <B><I>, you should end it with the sequence </I></B> so the italic section is properly nested within the bold section.

Logical Formats

A logical character format leaves the actual decision about physical format to the specific browser. Many browsers allow you to define exactly how the logical formats are represented. In most cases, the four most common logical formats map directly to the physical formats:

<STRONG> — Strong emphasis; typically a bold font.

<EM> — Basic emphasis; typically an italic font.

<CODE> — Computer code and used with extracts from program code (same as TT)

<SAMP> — Used for sample output from programs and scripts (same as TT).

In addition, there are less common logical formats. One of them is usually a combination of bold and typewriter, such as the following:

<KBD> — Used to represent text that the user types

The following logical tags are usually rendered as italic:

<DFN> — Used in association with a word being defined

<CITE> — Used in association with citations or references to a book or film, etc.

<VAR> — Used for variables or arguments to tags.

Remember that all the logical formats are subject to the definitions of the individual browser—including any customization of formats that the user has chosen.

The logical tags also have a matching end tag formed by adding a slash character (</KBD>, for example).

Form Field Tags

The form field tags include <INPUT>, <SELECT>, and <TEXTAREA>.

No end tag is required with the INPUT tag. <INPUT>, <SELECT>, and <TEXTAREA> tags are allowed only within FORM tags.

You can use the <INPUT> tag for a variety of form fields including single line text fields, password fields, checkboxes, radio buttons, submit and reset buttons, hidden fields, file upload, and image buttons.

The <SELECT> tag requires the start and end tags and contains one or more <OPTION> tags. You can use the <SELECT> tag for single or multiselection menus.

<TEXTAREA> tags require start and end tags. You can use <TEXTAREA> to define multiline text fields. The content of the tag initializes the field.

Special Text-Level Tags

The special text-level tags include A(nchor), IMG, APPLET, FONT, BR, and MAP.

Special Characters

Table 9 lists the special characters included in the HTML specification from the Numeric and Special Graphic entity set, along with the character name, syntax for use, and description. These four main special characters are specifically included in RFC 1866.

Table 9 Special Character Set

Glyph

Name

Syntax

Description

<

lt

&lt;

Less than sign

>

gt

&gt;

Greater than sign

&

amp

&amp;

Ampersand

"

cquot

&quot;

Double quote sign

The leading ampersand is required. The ampersand and semicolon delimit an entity name which the user agent replaces with a special character. The trailing semicolon is necessary when the character following the entity is not a space or end of line. It is never incorrect to include the trailing semicolon.

Some browsers always require the trailing semicolon. RFC 1866 specifies &quot; as being a double quote, but some older browsers display it as a single quote. The remaining special character entity names defined in RFC 1866, as well as some proposed special character entity names, are listed in:

http://www.sandia.gov/sci_compute/symbols.html

Use them to find out what they produce on your browser. Not all of those special characters are recognized by all browsers. All entity names are defined as case sensitive. The entity name of many of the special characters intentionally includes mixed case that must be entered exactly as specified. Since most browsers are insensitive to case for HTML names, many browsers do not require the entity names of the main four special characters to be lower case.

HTML Reference

You can use your Web browser to get more information about HTML. Some good places to start include:

http://www.w3.org/hypertext/WWW/MarkUp/MarkUp.html
http://www.ncsa.uiuc.edu/General/Internet/WWW/HTMLPrimer.html#A1.1

Undisplayed Graphic

Be aware that not all servers and browsers support all of the HTML tags you can find as you explore the capabilities of this markup language. The HTML language is under development and new standards have not been solidified.


Previous Page Page Top TOC Next Page