W3C home > Mailing lists > Public > public-html@w3.org > May 2010

Re: ISSUE-4 - versioning/DOCTYPEs

From: Boris Zbarsky <bzbarsky@MIT.EDU>
Date: Fri, 14 May 2010 10:28:25 -0400
Message-ID: <4BED5E09.5020705@mit.edu>
To: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
CC: public-html@w3.org
On 5/14/10 10:00 AM, Leif Halvard Silli wrote:
> That is not typical for XHTML vs HTML syntax - XHTML syntax typically
> uses .html as extension.

Or more precisely, most things that are "XHTML syntax" are nothing of 
the sort; they just have a doctype that pretends the document is XHTML 
and some attempts at being XHTML, but aren't even well-formed XML, much 
less valid HTML.

The documents that browsers actually treat as XHTML most definitely do 
not have a .html extension.

> There are some exceptions - most notably in Web browers

Right.  Who are typically the final consumers of the files in question.

>> Having a file with a .html extension would tend to mean you want it
>> treated as an HTML file on most of the currently-popular desktop
>> operating systems.
>
> For parsing, then yes. For editing, then less so.

If you're trying to maintain a polyglot document, agreed.  But the fact 
of the matter is that if you're doing that you need to tell your editor 
so.  The simplest way to do that for HTML5/XHTML5 documents, most 
likely, is to use a .xhtml extension and an HTML5 doctype, right?

>> Hold on.  We were just talking about wysiwyg HTML/XHTML editors, no?
>> Those are very much NOT text editors.
>
> Subject of e-mail: "ISSUE-4 - versioning/DOCTYPEs". KompoZer is an
> example of an editor that relies on the doctype when it decides the
> syntax to follow. Other editors, including both text editors and
> WYSIWYG editors, also seems to rely on the doctype for choosing syntax.

Yes, but is that a hard requirement?  That is, going forward they need 
to be modified anyway to handle whatever the HTML5/XHTML5 doctype(s) 
are.  Given that, does my proposal above to use .xhtml extension and 
HTML5 doctype for polyglot documents not work?

>> Yep.  Then again, the text editor I use on a regular basis does make
>> a quite clear distinction between HTML and XML modes.
>
> I will try to find out what editor you use. ;-)

Emacs.  It's all about modes.  ;)

> But, based on the file suffix *only*?

That's the simplest thing, yes, and the one set up by default, though of 
course you can set up your own conditions for picking the mode using a 
turing-complete programming language that has full access to the file data.

> I admit that it doesn't make
> sense to use HTML4 alike syntax in a .xhtml file. But the question is
> also about .html.

And again, unless the editor _parses_ your polyglot .html file as XML it 
will almost certainly fail to create a useful polyglot document when it 
saves. I have a hard time believing that most editors parse .html files 
as XML even if they sniff the XHTML doctype (again, because most such 
files are not well-formed XML).

> Yes. But I think that, to a degree, some DOCTYPEs already causes
> polyglot mode. E.g. KompoZer turns<img></img>  into<img />.

That's just a matter of the fact that Gecko's editor (and presumably 
KompoZer too, if in a different form) has a hardcodedlist of empty HTML 
tags and tries to make use of it.  This doesn't even have to be a mode 
switch.  It could just be done all the time.

> If we say that HTML4 vs XHTML1 is like HTML5 vs XHTML5, then it is
> simple to discern between HTML4 and XHML1, but impossible to discern
> HTML5 versus XHTML5 (versus quirks-mode HTML).

You can easily tell what the document will be _consumed_ as for HTML5 vs 
XHTML5, no?

>>   Likewise for a
>> non-polyglot-aware X(HT)ML editor used on an XHTML document.
>
> Given the error correction in text/html, this has a much higher chance
> to work, IMHO.

No.  Your typical non-polyglot-aware XML editor will turn <div></div> 
into <div/> and then you lose in the HTML mode.

> Also, even if it is mostly harmless (except for<br
> </br>  - though 2 instead 1 line break is also often pretty harmless),
> XHTML editors tend to prefer<element />  over<element></element>  - at
> least when creating XHTML1 documents.

Right.  See above about <div>; it's not "mostly harmless" but a 
fundamental issue.

> That far - I don't know. ;-) But at least we are on the same page when
> it comes to 'polyglot mode' - such a mode is needed. And some editors
> might choose to offer only that mode, I think. The question is what to
> use to discern between those modes.

When would an editor that has a polyglot mode not want to use it?

-Boris
Received on Friday, 14 May 2010 14:29:01 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 29 October 2015 10:16:02 UTC