Re: Proposal for developing HTML 5 materials for Web *authors* from Maciej Stachowiak on 2007-11-21 (public-html@w3.org from November 2007)

From: Maciej Stachowiak <mjs@apple.com>
Date: Wed, 21 Nov 2007 05:55:05 -0800
To: Dean Edridge <dean@55.co.nz>
Cc: Karl Dubost <karl@w3.org>, "public-html@w3.org Tracking WG" <public-html@w3.org>, Roger Johansson <roger@456bereastreet.com>
Message-Id: <242600F4-720B-4A88-B0D1-E6910394E964@apple.com>
On Nov 21, 2007, at 5:29 AM, Dean Edridge wrote:

> Maciej Stachowiak wrote:
>>
>>
>> Making a single document that works in both serializations is  
>> significantly trickier than just using quotes around attributes.
>
> Really? What about this below? Only the mime type would need to be  
> changed:
>
> <!DOCTYPE html>
> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
> <head>
>   <title>Demo</title>
> </head>
> <body>
>   <p class="top-paragraph" id="something">
>      Hello World
>   </p>
> </body>
> </html>

Your document, despite being a trivial example, is not conforming to  
either the HTML or XML serializations of HTML5. Even experts can make  
mistakes.

And it gets a lot more complicated if you do things like:
- Apply CSS styles to the body
- Reference an external stylesheet via <link>
- Reference an external script via <script>
- Attempt to use document.write

These are just a few of the most obvious pitfalls. Trying to keep them  
all in mind while authoring content is a whole lot of complexity.

>> A CMS that wants to generate both HTML and XHTML needs to work at a  
>> higher level of abstraction than string pasting and can therefore  
>> produce separate documents for each serialization.
>
> Yes I know, thanks.
> And why is this a reason to have discrepancies between the two  
> languages/serialisations?
> Surely, if anything, it is a good reason to encourage the reduction  
> of discrepancies between the languages.

The languages already have discrepancies. That is not in our power to  
change. Both classic HTML syntax and XML syntax were defined years ago  
and there are some incompatibilities that will probably never be  
resolved. Trying to write in the approximate common subset is really  
hard; most people who try get it wrong, even if they are experts.

>> In any case, a CMS that does target producing single chameleon  
>> markup documents will need to follow the right conventions.
>
> But that wouldn't be so differcult.

You'd think - a lot get it wrong and I'm not sure any get it totally  
right.

>> That doesn't necessarily mean those rules are right for authors  
>> writing pure HTML by hand, or for XML-only document processing  
>> systems.
>
> Why not?

For one thing, if I'm hand-authoring an HTML document, I shouldn't  
have to remember the magic URL talisman. For another, using XML  
minimized syntax in HTML documents is confusing, and restricting it  
only to HTML void elements in XML documents is needlessly restrictive.

>> Anyway, my point is just that I think both ways of writing it are  
>> reasonable in different situations, and should be chosen based on  
>> circumstances.
>
> There is a method that is suitable for all circumstances, that's the  
> beauty of (X)HTML5:
>
> <!DOCTYPE html>
> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
> <head>
>   <title>Demo</title>
> </head>
> <body>
>   <p class="top-paragraph" id="something">
>      Hello World
>   </p>
> </body>
> </html>
>
> Wouldn't it be better to encourage people to markup their webpages  
> like this?
> The less choosing the author has to do the better.

I'd rather encourage people to:

a) Validate, to find the errors in markup like the above.
b) Not include unnecessary cargo cult talismans like the xmlns  
declaration in HTML syntax, unless they actually really need it.

> Unnecessarily having two or more methods of quoting makes it much  
> more difficult to have HTML and XHTML in the world at the same time.

That horse is decades out of the barn. We are unlikely to get it back  
in. Some people may feel that always using quotes is a better  
practice, but I don't think concern about XML syntax is enough reason  
to declare those who disagree categorically wrong. Note that even XML  
has two ways to quote attribute values, double quotes and single  
quotes. It drops the option of leaving simple values unquoted.

> I don't see what is to gain from having unneeded discrepancies  
> between HTML and XHTML.

Me neither, but we've had them since 1998 when XML became a REC and I  
don't expect them to go away any time soon.

> My point is this: in regards to the quoting of attributes there  
> doesn't need to be two or more different ways to write up a (X)HTML  
> document. Of course, I don't have a problem with authors leaving out  
> the namespace attribute when intending to author in text/html as  
> this is easily altered later if someone wanted to convert the  
> document to XHTML.

Just adding an xmlns attribute is not nearly enough to turn a  
nontrivial document into conforming XHTML5. Pretending so is a bad idea.

Anyway, if using XHTML-like syntax is right for you, then you are free  
to use it. The current draft actually makes much more XHTML syntax  
legal in the HTML serialization than previous versions of HTML.  
However, I think it's wrong to try to set it down as some sort of  
mandate for all content authors.

Seriously, does it make any real difference to anyone whether, for  
instance, the Google homepage double quotes its attributes? Would  
there be any benefit to humanity if it was changed?

Regards,
Maciej
Received on Wednesday, 21 November 2007 13:55:54 UTC