Re: ISSUE-4 - versioning/DOCTYPEs

Boris Zbarsky, Fri, 14 May 2010 10:28:25 -0400:
> On 5/14/10 10:00 AM, Leif Halvard Silli wrote:
>> That is not typical for XHTML vs HTML syntax - XHTML syntax typically
>> uses .html as extension.
> Or more precisely, most things that are "XHTML syntax" are nothing of 
> the sort; they just have a doctype that pretends the document is 
> XHTML and some attempts at being XHTML, but aren't even well-formed 
> XML, much less valid HTML.

This is not quite precise, I think.

> The documents that browsers actually treat as XHTML most definitely 
> do not have a .html extension.

This is precise, when it comes to pure browsers.

There are many documents that pretend to be XHTML but which are nothing 
of that sort. But are there also many *editors* which pretend to create 
XHTML syntax but which fails to do so? OK, when it comes to text 
editors, then it is difficult (as it is human to err) unless one tests 
the page in a XHTML browser, which again typically requires .xhtml 

>> There are some exceptions - most notably in Web browers
> Right.  Who are typically the final consumers of the files in question.

I think the fact that they save the file with .xhtml is not related to 
being "the final consumer" but perhaps is related to being a parser 
instead of being an editor.

>>> Having a file with a .html extension would tend to mean you want it
>>> treated as an HTML file on most of the currently-popular desktop
>>> operating systems.
>> For parsing, then yes. For editing, then less so.
> If you're trying to maintain a polyglot document, agreed.  

So, current state of affair is: For a text/html file, then a XHTML 
doctype is a polyglot syntax signal.

> But the fact of the matter is that if you're doing that you need to tell 
> your editor so.  The simplest way to do that for HTML5/XHTML5 documents, 
> most likely, is to use a .xhtml extension and an HTML5 doctype, right?

Simplest way to tell my editor to use polyglot syntax is to use .xhtml 
suffix? I am not sure that I have ever tried an editor that creates a 
different syntax based on whether the file has a  .xhtml or .html 
syntax. (Except for that bug in KompoZer.) When selecting to create a 
new HTML file, then all the editors I have tried offer, AFAIC 
recollect, very precise options between different doctypes - in HTML vs 
XHTML flavors. And nearly all of them offer me to use the .html suffix, 
by default, for such files.

Of course, if you hand edit a file, then testing the file in XHTML 
parsing mode seems necessary, yes, in order to check that it really is 
XHTML well formed. And for that, .xhtml or .xht is practical to use.

>>> Hold on.  We were just talking about wysiwyg HTML/XHTML editors, no?
>>> Those are very much NOT text editors.
>> Subject of e-mail: "ISSUE-4 - versioning/DOCTYPEs". KompoZer is an
>> example of an editor that relies on the doctype when it decides the
>> syntax to follow. Other editors, including both text editors and
>> WYSIWYG editors, also seems to rely on the doctype for choosing syntax.
> Yes, but is that a hard requirement?  That is, going forward they 
> need to be modified anyway to handle whatever the HTML5/XHTML5 
> doctype(s) are.

I suppose they will handle all other doctypes same way as they do now?

>  Given that, does my proposal above to use .xhtml 
> extension and HTML5 doctype for polyglot documents not work?

Is what you say that, for the XHTML MIME type, then the presence of 
<!DOCTYPE html> should be seen as a signal to create polyglot XHTML 

Yes, that should work. Currently, though, there is no such decision 
anywhere. I don't even think that the polyglot specification under 
creation has made such a direct link between the doctype and the 
required syntax.
>>> Yep.  Then again, the text editor I use on a regular basis does make
>>> a quite clear distinction between HTML and XML modes.
>> I will try to find out what editor you use. ;-)
> Emacs.  It's all about modes.  ;)
>> But, based on the file suffix *only*?
> That's the simplest thing, yes, and the one set up by default, though 
> of course you can set up your own conditions for picking the mode 
> using a turing-complete programming language that has full access to 
> the file data.

But "out of the box", doesn't an XHTML1 doctype cause XHTML syntax if 
the suffix is .html?

>> I admit that it doesn't make
>> sense to use HTML4 alike syntax in a .xhtml file. But the question is
>> also about .html.
> And again, unless the editor _parses_ your polyglot .html file as XML 
> it will almost certainly fail to create a useful polyglot document 
> when it saves. I have a hard time believing that most editors parse 
> .html files as XML even if they sniff the XHTML doctype (again, 
> because most such files are not well-formed XML).

KompoZer don't. But I think some pure XML editors might do.

>> Yes. But I think that, to a degree, some DOCTYPEs already causes
>> polyglot mode. E.g. KompoZer turns<img></img>  into<img />.
> That's just a matter of the fact that Gecko's editor (and presumably 
> KompoZer too, if in a different form) has a hardcodedlist of empty 
> HTML tags and tries to make use of it.  This doesn't even have to be 
> a mode switch.  It could just be done all the time.

So, what you say here means that there are some *advantages* to 
creating polyglot syntax in text/html mode, because the limitations are 
defined by text/html parsers more than by XHTML parsers. ;-)

>> If we say that HTML4 vs XHTML1 is like HTML5 vs XHTML5, then it is
>> simple to discern between HTML4 and XHML1, but impossible to discern
>> HTML5 versus XHTML5 (versus quirks-mode HTML).
> You can easily tell what the document will be _consumed_ as for HTML5 
> vs XHTML5, no?

In the current state of that file? Yes. But I have many files. And want 
to know, when I open that file, what syntax to use. I don't want type 
<!-- use xhtml syntax --> in comment or something. It could be that the 
file will be consumed by a XHTML parser somewhere in the chain, so that 
I had a reason to use XHTML syntax in a text/html file.

>>>   Likewise for a
>>> non-polyglot-aware X(HT)ML editor used on an XHTML document.
>> Given the error correction in text/html, this has a much higher chance
>> to work, IMHO.
> No.  Your typical non-polyglot-aware XML editor will turn <div></div> 
> into <div/> and then you lose in the HTML mode.

Then you loose, yes. But the page will still render, as a text/html 
file.  An XML file with an error might not render.

But .. yeah, it seems like the general issue of <div/>  etc, is 
something that the polyglot spec should taken in to account ...

>> Also, even if it is mostly harmless (except for<br
>> </br>  - though 2 instead 1 line break is also often pretty harmless),
>> XHTML editors tend to prefer<element />  over<element></element>  - at
>> least when creating XHTML1 documents.
> Right.  See above about <div>; it's not "mostly harmless" but a 
> fundamental issue.

It is a fundamental difference in behavior, yes.

>> That far - I don't know. ;-) But at least we are on the same page when
>> it comes to 'polyglot mode' - such a mode is needed. And some editors
>> might choose to offer only that mode, I think. The question is what to
>> use to discern between those modes.
> When would an editor that has a polyglot mode not want to use it?

Good question. I don't know. It seems smart to always use it. And it 
seems smart to use it even in text/html. I don't know, on the XHTML 
side, if not using polyglot syntax, could be wished for or not. It is 
mostly on the text/html side that the choice between syntax is a 
practical problem. 
leif halvard silli

Received on Friday, 14 May 2010 16:02:12 UTC