The advent of the World Wide Web transformed the Internet. Before the Web, the Internet was merely a research and academic network. It was highly successful in what it did and well-used by an elite international community, but it didn't impinge on the general consciousness.

With the coming of the Web, the Internet became mass market. In an extraordinarily short period, the Web remade the Internet into an almost unavoidable feature of everyday life.

This astounding change was very largely due to the simplicity of hypertext markup language (HTML), the language used to create documents on the Web.

A simple success

When it first saw the light of day, HTML consisted of a handful of tags for marking up text so it could be displayed with some basic formatting. A program called a Web browser was used to read these tags and display the formatted page.

The 'hypertext' part of HTML's name referred to the inclusion of document links, called hyperlinks. Clicking a link on a page would transport you to the destination page referred to by the link, and display that page in your browser.

Thus, the Web developed as a series of interlinked, formatted documents.

While from day one HTML did a pretty good job of linking documents, right from the start it's been a pretty rotten markup language.

Markup languages

Markup languages have been with us for a long time. The term 'markup' comes from the printing and publishing industries. In order to turn plain text into a final printed product complete with bold, italics, different typefaces, headings, structured tables and so on, editors marked up copy with standard notations telling the typesetter how to format each element on the page.

That practice continues today with computer-based desktop publishing. For instance, when I'm writing an article for Australian PC User magazine if I want the words 'markup language' to be printed in italics, as I type I'll enclose those words in italics tags: <I>markup language<I>. The magazine's desktop publishing program will automatically strip out the tags and convert the enclosed text to italics, thus: markup language. If you've tried your hand at Web page authoring, you'll recognise the similarity between PC User's typesetting tags and HTML's own tags.

The purist's approach

Markup language purists will point out that this is technically incorrect: the markup language is supposed to describe the structure of a document, not its presentation. Thus, a valid markup tag might take the form:

<page heading>Heading Text</page heading>

indicating that 'Heading Text' should be formatted in the standard page heading style. Note the tag doesn't indicate what this style is: It could be 24-point, bold Garamond or, just as easily, 32-point, green, italic Times New Roman. Instead, it merely indicates which page element to use. The actual way to present page heading elements is separate from the markup language itself.

That's why you'll find <em> instead of <I> used in early HTML (and still supported by browsers today). The <em> tag indicates emphasis; it doesn't specify the style of emphasis to use (italics, bold, highlight), merely that the marked text is an emphasised element of the page.

SGML: mother of markup languages

In the computing world, the mother of all markup languages, Standard Generalised Markup Language (SGML), made its appearance in the 1960s. SGML is a meta-language: that is, it's a language which can be used to define other markup languages. SGML is complex, unwieldy, powerful and infinitely extensible, and it has been used by the military, newspapers, large organisations and academics to define document standards for a variety of purposes. With SGML, you can define markup for everything from a memo to a complete book.

Had SGML been used as the basis of the World Wide Web, chances are you'd not be surfing the Web today. That's because SGML is about as accessible as Annapurna in mid-Winter. It also has no in-built linking support and has to resort to using another system, HyTime, to provide document links.

HTML, on the other hand, is about as simple as a markup language can be. It's at the other extreme from SGML. In fact, HTML is a single type of SGML document.

It is HTML's simplicity that gave it the mass accessibility and appeal which has led to a World Wide Web consisting of hundreds of millions of pages. It's also that same simplicity which has created all sorts of headaches for Web developers and surfers alike.

Tag wars

The original version of HTML provided for little more than headings, paragraph breaks and indented lists.

It didn't take long for people to start clamouring for a little more spice in their Web documents. In particular, they wanted to be able to display images as well as text. Marc Andreesson (who later went on to found Netscape) came to the party by adding an <img> tag to his Mosaic browser.

The Web "before": An early Web page using basic HTML

The Web "after": The same page produced using current HTML

Figure 1. How HTML's evolution has transformed the basic Web page.
(Click either image to see the full screenshots and detailed captions.)

That was the start of the tag war. Browser developers started including support for new tags such as <background> and <font> and the shamefully abused <blink>. It wasn't long before we saw <table> and <frame>. Microsoft's Internet Explorer weighed in with <marquee> and <bgsound>, neither supported by other browsers. Netscape replied with its very own <layer> tag. It wasn't very pretty.

In the meantime, a group of people at the World Wide Web Consortium (known as the W3C) was attempting to provide some sort of sanity in the form of HTML standards. A series of revised standards appeared, adding the most popular and workable new features already incorporated in the rival browsers.

The mess we made

The end result? A Web where site designers spent an aggravating amount of time designing multiple versions of their pages for viewing by numerous incompatible browsers. A Web where surfers stumbled over sites that displayed poorly, if at all, in their own particular browser.

It was a shambles and we were all to blame.

There's no doubt Microsoft and Netscape were driven into a tag proliferation battle by the desire of Web users for something faster, neater, snazzier, louder, and more entertaining. The two companies tried to lure us with new tags offering better content. At the same time the W3C, working at a comparatively snail-like pace, tried to bring some order to the scene by revising the HTML standard. It's not surprising a committee focussed on getting it right was left behind by two highly competitive companies trying to get it delivered.

Structure and presentation

Apart from the strife engendered by non-standard HTML implementations, HTML had been suffering from another problem. As a markup language, HTML's original job was to define the structure of Web documents. But in the rush to produce a more rivetting experience on the Web, this purpose had been lost. HTML then and now is being used to control presentation as well as structure.

Web designers, faced with a lack of tools for controlling the layout of their pages, forced HTML way beyond its bounds. The italics tag replaced the emphasis tag; tables were used to position text and graphics; spacer graphics (invisible GIF graphics files, 1-pixel in size) were used to create space between page elements. For several years, if you looked at the source code behind most Web sites you would have seen HTML forced into Gumby-esque contortions.

The situation was exacerbated by the addition of incompatible implementations of scripting languages and controls (JavaScript, VBScript, ActiveX) used to add a degree of interactivity to Web sites.

Fixing HTML: CSS

The one good thing about HTML's parlous state is that everyone – from surfer to designer to standards-setter – eventually realised how hopeless things had become.

Microsoft and Netscape promised to mend their ways and adhere to standard HTML. They're still working on it, but at least there's been a marked improvement.

The most recent HTML standard from the W3C – HTML 4.01 – supports cascading style sheets. Style sheets allow Web page designers to separate presentation from structure, and give designers a much greater degree of control over the layout of their pages.

Browser support for style sheets has taken a long time to appear, but the most recent versions of both Netscape and Internet Explorer provide support for CSS2, the W3C's second Cascading Style Sheet specification. CSS2 provides for extensive control over positioning of page elements, as well as control over fonts, colours, text spacing, interaction, and other stylistic features.

Fixing HTML: XML

XML is another acronym you'll need to add to your Web lexicon. It stands for eXtensible Markup Language, and it's the big step in evolving a Web language to accommodate all the new uses – multimedia, database publishing, interactive presentations, and so on – appearing on the Web.

The key word in the acronym is, of course, extensible. Extensible is just what HTML isn't, even though we've tried to make it so by pummelling and pulling it out of shape.

XML is the bridge between the power and huge flexibility of SGML and the simplicity and rigidity of HTML. Unlike HTML which is merely a single SGML document type, XML is a genuine subset of SGML. Like SGML, XML is a meta-language: a language which can be used to define other languages.

XML is so central to the next evolution of the Web you'll find support for it built into applications and operating systems, as well as browsers.

New dialects: CDF, CML, SMIL

Because it can be used to define other languages, XML provides almost limitless scope. Already, it has been used by Microsoft as the basis for its push content format, Channel Definition Format (CDF). Some time ago the W3C released MathML (Mathematical Markup Language) as the first application of XML it recommended. XML has also been used to create Chemical Markup Language (CML), a markup language which lets scientists and researchers publish documents containing chemical symbols and formulae.

Yet another XML by-product, Synchronised Multimedia Integration Language (SMIL), allows developers to produce interactive multimedia presentations on the Web. It lets developers separate text, audio, static images and video into separate streams, and then combine them. SMIL provides control over the timing of the display of the various streams, much like current presentations software provides in desktop applications.

CDF, CML, MathML and SMIL signal the beginning of what may turn out to be a flood of XML applications, which will bring new capabilities to the Web. XML languages will offer enormous benefits to vertical markets and specialised industries.

XHTML

Luckily, Web site designers don't have to abandon their HTML coding expertise and suddenly try to learn half a dozen or more XML variants. Instead, HTML is evolving to meet XML. The evolution takes the form of XHTML: Extensible Hypertext Markup Language.

The first incarnation of XHTML is almost identical to HTML 4.01. XHTML 1 uses all the elements of HTML 4 combined with XML's syntax. Where HTML is fairly slack about enforcing its syntax rules, XHTML is far stricter. The result is clean code and what the XMLers like to call well-formed documents. If you're interested in designing XHTML pages, or you'd simply like to understand the differences between XHTML and HTML, check out the W3Schools Introduction to XHTML.

Preparing for XHTML

Although XHTML is already used on many Web sites, HTML is still with us and is showing no signs of disappearing immediately. That means if you're a Web designer, you still have plenty of time to prepare for the change.

If you haven't already done so, you should be learning how to separate style from content. Use Cascading Style Sheets – in the long run they'll make your life infinitely easier. WebMonkey has an excellent tutorial. And start learning XHTML using the article mentioned above or one of the many other tutorials on the Web. You can implement XHTML right now on your site and your visitors won't even know the difference. You will, though... and you'll be ready for the next big evolution on the Web.

Coping as a surfer

If you're a surfer rather than a designer, there's one big thing you can do for yourself: Install the very best browser your hardware will support. Get the latest version of Netscape or Internet Explorer if possible, or, if you're pressed for hard disk space, try Opera, a compact and very fast browser with CSS support. By using a current browser, you'll see the Web as the designers intended.

© 2002  Rose Vines

Support geekgirl's

Do you find the tutorials on this site useful? If so, please show your support by kicking in a few bucks to help sponsor a boy or girl in an Afghanistan orphanage run by afceco.org. For a small amount, it is possible to make a difference in an area of the world which is hurting badly.

Want to know more? Read this post on my blog.

  

Further reading:

The W3C's XML specification
Read the authoritative voice on XML. While you're there, have a good look around the World Wide Web Consortium's site: there's heaps here on HTML, DHTML, HTTP and just about every other Web acronym you can think of.

The W3C's XHTML specification
Go straight to the source for info about XHTML.

XML.com
A site devoted entirely to XML news, commentary and techniques.

MathML
The W3C's specification for its mathematics markup language.

 

Amazon Honor System Click Here to Pay Learn More  
What's this box about?

  


top home net archives menu