Markup.
HTML markup consists of several key components, including tags (and their attributes), character-based data types, character references and entity references. Another important component is the document type declaration, which triggers standards mode rendering. The following is an example of the classic Hello world program, a common test employed for comparing programming languages, scripting languages and markup languages. This example is made using 9 lines of code:
<!DOCTYPE html>
<html>
<head>
<title>This is a title</title>
</head>
<body>
<p>Hello world!</p>
</body>
</html>
(The text between <html> and </html> describes the web page, and the text between <body> and </body> is the visible page content. The markup text "<title>This is a title</title>" defines the browser page title.).
This Document Type Declaration is for HTML5. If the
<!DOCTYPE html>
declaration is not included, various browsers will revert to "quirks mode" for rendering.Elements.
HTML documents imply a structure of nested HTML elements. These are indicated in the document by HTML tags, enclosed in angle brackets thus:
<p>
In the simple, general case, the extent of an element is indicated by a pair of tags: a "start tag"
<p>
and "end tag" </p>
. The text content of the element, if any, is placed between these tags. Tags may also enclose further tag markup between the start and end, including a mixture of tags and text. This indicates further, nested, elements, as children of the parent element. The start tag may also include attributes within the tag. These indicate other information, such as identifiers for sections within the document, identifiers used to bind style information to the presentation of the document, and for some tags such as the <img>
used to embed images, the reference to the image resource. Some elements, such as the line break <br>
, do not permit any embedded content, either text or further tags. These require only a single empty tag (akin to a start tag) and do not use an end tag. Many tags, particularly the closing end tag for the very commonly-used paragraph element <p>
, are optional. An HTML browser or other agent can infer the closure for the end of an element from the context and the structural rules defined by the HTML standard. These rules are complex and not widely understood by most HTML coders. The general form of an HTML element is therefore: <tag attribute1="value1" attribute2="value2">content</tag>
. Some HTML elements are defined as empty elementsand take the form <tag attribute1="value1" attribute2="value2" >
. Empty elements may enclose no content, for instance, the BR tag or the inline IMG tag. The name of an HTML element is the name used in the tags. Note that the end tag's name is preceded by a slash character, "/", and that in empty elements the end tag is neither required nor allowed. If attributes are not mentioned, default values are used in each case. Element Examples.
Header of the HTML document:<head>...</head>. The title is included in the head, for example:
<head>
<title>The Title</title>
</head>
Headings: HTML headings are defined with the
<h1>
to <h6>
tags:<h1>Heading level 1</h1> <h2>Heading level 2</h2> <h3>Heading level 3</h3> <h4>Heading level 4</h4> <h5>Heading level 5</h5> <h6>Heading level 6</h6>
Paragraphs:
<p>Paragraph 1</p> <p>Paragraph 2</p>
Line breaks:
<br>
. The difference between <br>
and <p>
is that "br" breaks a line without altering the semantic structure of the page, whereas "p" sections the page into paragraphs. Note also that "br" is an empty element in that, while it may have attributes, it can take no content and it may not have an end tag.<p>This <br> is a paragraph <br> with <br> line breaks</p>
This is a link in HTML. To make a link you use the
<a>
tag. The href=
attribute holds the URL address of the link.<a href="http://www.google.com/">A Link to Google!</a>
Comments:
<!-- This is a comment -->
Comments can help in the understanding of the markup and do not display in the webpage.
There are several types of markup elements used in HTML:
<h2>Golf</h2>
establishes "Golf" as a second-level heading. Structural markup does not denote any specific rendering, but most web browsers have default styles for element formatting. Content may be further styled using Cascading Style Sheets (CSS).<b>boldface</b>
indicates that visual output devices should render "boldface" in bold text, but gives little indication what devices that are unable to do this (such as aural devices that read the text aloud) should do. In the case of both <b>bold</b>
and <i>italic</i>
, there are other elements that may have equivalent visual renderings but which are more semantic in nature, such as <strong>strong text</strong>
and <em>emphasised text</em>
respectively. It is easier to see how an aural user agent should interpret the latter two elements. However, they are not equivalent to their presentational counterparts: it would be undesirable for a screen-reader to emphasize the name of a book, for instance, but on a screen such a name would be italicized. Most presentational markup elements have become deprecated under the HTML 4.0 specification in favor of using CSS for styling. href
attribute sets the link's target URL. For example the HTML markup,<a href="http://www.google.com/">blogger</a>
, will render the word "blogger" as a hyperlink. To render an image as a hyperlink, an "img" element is inserted as content into the "a" element. Like "br", "img" is an empty element with attributes but no content or closing tag.<a href="http://example.org"><img src="image.gif" alt="descriptive text" width="50" height="50" border="0">
</a>
.Attributes.
Most of the attributes of an element are name-value pairs, separated by "=" and written within the start tag of an element after the element's name. The value may be enclosed in single or double quotes, although values consisting of certain characters can be left unquoted in HTML (but not XHTML) . Leaving attribute values unquoted is considered unsafe. In contrast with name-value pair attributes, there are some attributes that affect the element simply by their presence in the start tag of the element, like the
ismap
attribute for the img
element.
There are several common attributes that may appear in many elements :
- The
id
attribute provides a document-wide unique identifier for an element. This is used to identify the element so that style sheets can alter its presentational properties, and scripts may alter, animate or delete its contents or presentation. Appended to the URL of the page, it provides a globally unique identifier for the element, typically a sub-section of the page. For example, the ID "Attributes" inhttp://umairitguyblogspot.com/home/HTML#Attributes
- The
class
attribute provides a way of classifying similar elements. This can be used for semantic or presentation purposes. For example, an HTML document might semantically use the designationclass="notation"
to indicate that all elements with this class value are subordinate to the main text of the document. In presentation, such elements might be gathered together and presented as footnotes on a page instead of appearing in the place where they occur in the HTML source. Class attributes are used semantically in microformats. Multiple class values may be specified; for exampleclass="notation important"
puts the element into both the "notation" and the "important" classes. - An author may use the
style
attribute to assign presentational properties to a particular element. It is considered better practice to use an element'sid
orclass
attributes to select the element from within a style sheet, though sometimes this can be too cumbersome for a simple, specific, or ad hoc styling. - The
title
attribute is used to attach subtextual explanation to an element. In most browsers this attribute is displayed as a tooltip. - The
lang
attribute identifies the natural language of the element's contents, which may be different from that of the rest of the document. For example, in an English-language document:
<p>Oh well, <span lang="fr">c'est la vie</span>, as they say in France.</p>
The abbreviation element,
abbr
, can be used to demonstrate some of these attributes :abbr id="anId" class="jargon" style="color:purple;" title="Hypertext Markup Language">HTML</abbr>
This example displays as HTML; in most browsers, pointing the cursor at the abbreviation should display the title text "Hypertext Markup Language."
Most elements also take the language-related attribute
dir
to specify text direction, such as with "rtl" for right-to-left text in, for example, Arabic, Persian or Hebrew.Character And Entity References.
As of version 4.0, HTML defines a set of 252 character entity references and a set of 1,114,050 numeric character references, both of which allow individual characters to be written via simple markup, rather than literally. A literal character and its markup counterpart are considered equivalent and are rendered identically.
The ability to "escape" characters in this way allows for the characters
<
and &
(when written as <
and &
, respectively) to be interpreted as character data, rather than markup. For example, a literal <
normally indicates the start of a tag, and &
normally indicates the start of a character entity reference or numeric character reference; writing it as&
or &
or &
allows &
to be included in the content of an element or in the value of an attribute. The double-quote character ("
), when not used to quote an attribute value, must also be escaped as "
or "
or "
when it appears within the attribute value itself. Equivalently, the single-quote character ('
), when not used to quote an attribute value, must also be escaped as '
or '
(or as '
in HTML5 or XHTML documents) when it appears within the attribute value itself. If document authors overlook the need to escape such characters, some browsers can be very forgiving and try to use context to guess their intent. The result is still invalid markup, which makes the document less accessible to other browsers and to other user agents that may try to parse the document for search and indexing purposes for example.
Escaping also allows for characters that are not easily typed, or that are not available in the document's character encoding, to be represented within element and attribute content. For example, the acute-accented
e
(é
), a character typically found only on Western European and South American keyboards, can be written in any HTML document as the entity reference é
or as the numeric references é
or é
, using characters that are available on all keyboards and are supported in all character encodings. Unicode character encodings such as UTF-8 are compatible with all modern browsers and allow direct access to almost all the characters of the world's writing systems.Data Types.
HTML defines several data types for element content, such as script data and style sheet data, and a plethora of types for attribute values, including IDs, names, URIs, numbers, units of length, languages, media descriptors, colors, character encodings, dates and times, and so on. All of these data types are specializations of character data.
Document Type Declaration.
HTML documents are required to start with a Document Type Declaration (informally, a "doctype"). In browsers, the doctype helps to define the rendering mode—particularly whether to use quirks mode. The original purpose of the doctype was to enable parsing and validation of HTML documents by SGML tools based on the Document Type Definition (DTD). The DTD to which the DOCTYPE refers contains a machine-readable grammar specifying the permitted and prohibited content for a document conforming to such a DTD. Browsers, on the other hand, do not implement HTML as an application of SGML and by consequence do not read the DTD. HTML5 does not define a DTD; therefore, in HTML5 the doctype declaration is simpler and shorter:
<!DOCTYPE html>
An example of an HTML 4 doctype
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
This declaration references the DTD for the "strict" version of HTML 4.01. SGML-based validators read the DTD in order to properly parse the document and to perform validation. In modern browsers, a valid doctype activates standards mode as opposed to quirks mode. In addition, HTML 4.01 provides Transitional and Frameset DTDs, as explained below. Transitional type is the most inclusive, incorporating current tags as well as older or "deprecated" tags, with the Strict DTD excluding deprecated tags. Frameset has all tags necessary to make frames on a page along with the tags included in transitional type.