W3C home > Mailing lists > Public > www-html@w3.org > August 2001

AW: To XML or not XML?

From: Christian Wolfgang Hujer <Christian.Hujer@itcqis.com>
Date: Wed, 29 Aug 2001 14:32:09 +0200
To: "Mike Ginter" <mginter@ket.org>, <www-html@w3.org>
Message-ID: <000301c13086$9f183800$9c2750d9@andromeda>
Hello Mike,

> I am noticing that there seems to be no html 5 standards.
Yes.
What you call HTML 5 is XHTML 1.1.
What you might call HTML 6 will be XHTML 2.0

> Instead, it looks to me that XML will become the new standard and that
XHTML will be a transitional standard.
Surely not. HTML based on SGML, XHTML bases on XML. So simple. Nothing more,
nothing less.

> I cannot seem to find any pages that give an overview as to what all of us
will have to deal with in the "NEAR" future 1-3 years (I know that nobody
can predict this).

You only need to take a look at the HTML/XHTML roadmap published on w3.org.
You can find it on:
http://www.w3.org/MarkUp/xhtml-roadmap/

And I think it is very easy to predict the "NEAR" future of 1-3 years
regarding the further development of (X)HTML.

XML is not and was not planned to replace HTML.

XHTML 1.0 is a reformulation of HTML 4.01 based on XML instead of SGML.
XHTML Modularization splittet HTML up into several modules.
XHTML 1.1 is an implementation of this module based HTML and incorporates
the Ruby Module as a new feature of HTML.
XHTML Basic 1.0 is a reduced version of XHTML 1.0 / HTML 4.01 Strict and
allows coding of HTML documents that are able to be displayed on more
devices.
XHTML 2.0 will be a reformulation of XHTML 1.1 based on other XML standards.
SGML e.g. had no standard for linking standard, linking was a HTML feature,
not an SGML feature. XML's support technologies have features not found in
SGML like XML Linking, XPath, XML Base, XPointer and XInclude. XHTML 2.0
will make use of these features instead of proprietary HTML features.

XHTML is surely not intended to be a transitional standard.

> I have read that xml will be much more strict when it comes to formatting
your code and that all tags should be lower cased.  Are these true?

Yes and no.

It is true that XML is much more strict when it comes to formatting code.
But the case of tags has nothing to do with XML.
The SGML catalogue type used for HTML was not case sensitive.
XML always is case sensitive.
When implementing HTML as XML application instead of an SGML application,
implementors decided to use lowercase, since most computer languages use
lower case and it is easier to type.
The fact that XHTML tags must be lowercase comes from two points:
a) XML is case sensitive
b) XHTML tags are defined in lower case.
XML defines nothing to be in lowercase except for the xml declaration etc..

To migrate from HTML to XHTML, follow these rules:
1. Use only lower case. Instead of <HTML> write <html>
2. Consistently close all elements. Do not write <UL><LI>Topic 1<LI>Topic
2<LI>Topic 3</UL>, write <ul><li>Topic 1</li><li>Topic 2</li><li>Topic
3</li></ul>
3. Close even empty elements like <BR>, <HR>, <META>, <IMG>, <INPUT> etc..
Do not close them with end tags, use the empty element tag notation and
provide a space in front of the slash to maintain compatibility with the
bugs in Netscape Navigator 4.x, so write <br />, <hr />, <meta ... />, <img
... />, <input ... />
4. Attributes always need values, instead of <HR NOSHADE> (which is nonsense
anyway), write <hr noshade="noshade" />
5. Use only ASCII characters in your document, encode all
NON-ASCII-Characters like umlauts etc. with an entity , e.g. instead of ü
write &uuml; or &#252; (ASCII characters are those with a number less than
or equal to 127, ASCII is a 7 bit character set)
6. When writing documents, use the correct content model. Do not write <form
action="..."><input type="submit" /></form>, write <form
action="..."><div><input type="submit" /></div></form>, and do not write
<ol>
  <li>Topic 1</li>
  <ol>
    <li>Topic 1.1</li>
    <li>Topic 1.2</li>
  </ol>
  <li>Topic 2</li>
</ol>
This is wrong because the content model of <ol/> only allows <li/> to be its
content. Write
<ol>
  <li>
    Topic 1
    <ol>
      <li>Topic 1.1</li>
      <li>Topic 1.2</li>
    </ol>
  </li>
  <li>Topic 2</li>
</ol>


And never forget to validate your pages with the W3C validator that can be
found at
http://validator.w3.org/



> I am a web designer, but I am a reluctant one that doesn't like the tech
side that much.
But you should. ;)

> For this reason I don't keep up with the latest WC3 standards like I
should.
But you should. ;)

> I find WC3's site very confusing to sort through.
Oh, I think it is one of the most structured sites I've ever seen. Look,
it's quite simple.
The top menu bar contains the five most important links:
- Activities
  This shows what the W3C's activities are
- Technical Reports
  This leads you to the standards (Recommendations), near future (proposed /
candidate) standards (Proposed / Candidate Recommendations), far future
standards in development (Working Drafts) and annotations (Notes), often by
third parties.
- Site Index
  (Self explanatory)
- About W3C
  (Self explanatory)
- Contact
  (Self explanatory)

The left menu bar shows the W3C from A to Z.
If you need information about CSS, (X)HTML, XSL etc., you find a good link
there to start with non-technical information (technical information can be
found in the appropriate technical report).

The right menu bar is more about the W3C as organization. It is very
interesting for those interested in the W3C itself, not just its standards
and activities.

And the middle shows the most recent news from the W3C. I suggest to
regularly, at least once a week, take at look at the news on the W3C
homepage.


If you need, for your work, further information about HTML etc. itself, you
can use the following links:

http://www.w3.org/TR/xhtml11/
This is the current version of HTML.

http://www.w3.org/TR/xhtml-basic/
This also is the current version of HTML.

(I usually prefer xhtml-basic)

http://www.w3.org/TR/xhtml-modularization/
This document describes the modules xhtml11 and xhtml-basic are based on.

http://www.w3.org/TR/xhtml1/
This document describes the previous version of html. This is "the same" as
html401 but the first version that is based on XML instead of SGML

http://www.w3.org/TR/html401
This document describes the semantics of the elements and attributes that
are used by xhtml1, xhtml-modularizazion, xhtml11 and xhtml-basic. It is a
revised version of html4

http://www.w3.org/TR/REC-CSS2/
This document describes the Cascading Style Sheets Level 2 that are designed
to be used with HTML. They can also be used with XML, SMIL and SVG, though
the last two needed some extensions to CSS Level 2 that are described in the
documents about SMIL and SVG


It is worth reading these documents. I read all of them.


If you want to be a really professional web designer, you sould also take a
look at:

http://www.w3.org/TR/REC-xml
XML 1.0 (Second Edition). This document technically describes XML. The BNF
syntax grammar is used to describe XML. Though it is not easy to understand
for those not used to it, give it a try (and a second and a third one). The
BNF syntax is similar to regular expressions. It is worth understanding what
",|()+*?" mean, not just for understanding documents describing languages
like XML, but also for writing your own DTDs and for programs / languages
like Perl, vi, sed etc..

http://www.w3.org/TR/xpath
http://www.w3.org/TR/xslt
These documents describe XPath and XSLT. These form the most powerful tool
for creating and managing large web sites.
Basically they were intended to allow "Transformations". A transformation is
a process creating one or more documents from one or more documents, similar
to im/export filters in word processors. With XPath and XSLT you can convert

<a href="http://www.teamone.de/selfhtml/" hreflang="de">SelfHTML</a>
to
<a href="http://www.teamone.de/selfhtml/" hreflang="de">SelfHTML <img
src="/gfx/flags/small/de.gif" alt="(de)"/></a>

Or even
<a href="http://www.informatik-lexikon.de/">Computer Science Lexikon</a>
to
<a href="http://www.teamone.de/selfhtml/" hreflang="de">Computer Science
Lexikon <img src="/gfx/flags/small/de.gif" alt="(de)"/></a>
(For the second example, the destination document is required to be in
correct XHTML, *not* HTML)

Or you may convert
<slide title="HTML and XHTML">
  <ul>
    <li>HTML: based on SGML</li>
    <li>XHTML: based on XML</li>
  </ul>
</slide>

to
<html xml:lang="en">
  <head>
    <title>HTML and XHTML</title>
    <style type="text/css">
      #frame  { width:100%;height:100%; }
      #header { height:4em;vertical-align:top; }
      #main   { padding:2em; }
      #footer { height:2em;text-align:center;vertical-align:bottom; }
    </style>
  </head>
  <body>
    <table id="frame">
      <tr><th id="header"><h1>HTML and XHTML</h1></th></tr>
      <tr><td id="main"><ul><li>HTML: based on SGML</li><li>XHTML: based on
XML</li></ul></td></tr>
      <tr><td id="footer">&#169; Copyright ...</td></tr>
    </table>
  </body>
</html>

These examples are shortened much for clarity.
Using XSL with XSLT and XPath for output, you can even create PDF from XML
(XHTML etc.).

It is worth using XSLT, it saves much time.


And a clarification: I am not involved with the W3C. I do not guarantee for
the correctness of information given here. I just am a fan of the W3C.

Hope I could help you.

Greetings

Christian Hujer

-----Ursprüngliche Nachricht-----
Von: www-html-request@w3.org [mailto:www-html-request@w3.org]Im Auftrag von
Mike Ginter
Gesendet am: Montag, 27. August 2001 20:24
An: www-html@w3.org
Betreff: To XML or not XML?

    I know the answers are probably there but I am having a hard time
finding and understanding, everything that is said.  Forgive my ignorance.
MG
mginter@ket.org
Received on Wednesday, 29 August 2001 08:38:15 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 27 March 2012 18:15:49 GMT