Re: ISSUE-4 - versioning/DOCTYPEs from Henri Sivonen on 2010-05-19 (public-html@w3.org from May 2010)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Wed, 19 May 2010 07:01:35 -0700 (PDT)
To: public-html@w3.org
Cc: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>, Boris Zbarsky <bzbarsky@MIT.EDU>, Daniel Glazman <daniel.glazman@disruptive-innovations.com>, Sam Ruby <rubys@intertwingly.net>
Message-ID: <1488827282.221305.1274277695090.JavaMail.root@cm-mail03.mozilla.org>
"Sam Ruby" <rubys@intertwingly.net> wrote:

> On 05/17/2010 10:55 AM, Henri Sivonen wrote:
> > In my thinking, your blog and planet were instances of case #1.
> > What
> > part of the description of #1 doesn't resonate with you?
> 
> Sometime before the end of 2004, I started serving my weblog as
> application/xhtml+xml.  At first, I wasn't very careful about it:
> 
> http://intertwingly.net/blog/2004/11/15/Vigilance
> 
> Sometime in 2006, I started experimenting with inline SVG:
> 
> http://intertwingly.net/blog/2006/06/17/Inline-SVG

Note how Jacques Distler wrote "atypical character" in the part you quoted. While it seems you have had a positive experience from transitioning from XHTML for the sake of XHTML to XHTML+SVG, your experience isn't typical. A number of other X-Philes (http://www.goer.org/Markup/TheXPhiles/) have gone back to text/html. Authoring SVG in vi is atypical too, so I think your experience isn't a good data point from which to extrapolate requirements for in-band indicators for editors like BlueGriffon and KompoZer.

> >> As for me, I simply want to be conservative in what I send.  This
> >> is the first half of the robustness principle.  This enables
> >>people
> >> who have off-the shelf xml parsers to process my pages.  Not
> >> because they hold any special power over me, but simply because I
> >> enabled it.
> >
> > Interesting. I hadn't thought of your site of being an instance of
> > case #3 (without the power part).
> >
> > Do you know if people actually process your pages (as opposed to
> > your
> > feeds) using off-the-shelf XML parsers without any prior
> > arrangement
> > with you?
> 
> While I know that I do, I don't know specifically of anybody else who
> does.

In my thinking, you have power over yourself to keep your own site consumable by your own XML tools.

Mediawiki is an exception to the rule, but I think it still appears to be the case that one party serving polyglot content as text/html and another party serendipitously benefiting from being able to consume it as XML is very rare.

> If you drill down a few of those links, you will find that the pages
> are 
> also extremely consistent in details such as the indentation I use --
> something that no self-respecting XML or HTML5 processor would care 
> about.  By being conservative in what I send (and by that, I don't 
> simply mean well-formed XML, but avoiding constructs which are likely
> to 
> be confused when parsed using a different parser) I can enable a wide
> number of potential future uses, even ones that I have not thought of
> yet.

For most people out there who've made an effort to follow Appendix C for the past decade, the promise of future uses has been a Wrong Tomorrow. That is, in the light of the experience with Appendix C, I think promoting unspecified future uses isn't something that the WG should engage in.

> To sum things up: my markup is not just XML compliant, but also HTML5
> compliant, and fares well with a large variety of tag soup parsers out
> there.  I have found that conforming to a polyglot syntax is a good 
> first order approximation of what it takes to be universally
> consumable.

Fair enough. It doesn't follow that an in-band indicator is needed, though.

"Boris Zbarsky" <bzbarsky@MIT.EDU> wrote:

> On 5/17/10 4:57 AM, Henri Sivonen wrote:
> > I'm aware of three use cases for polyglot documents:
> ...
> 
> 4)  Serving content as text/html but using an XML toolchain on
>      the server side to generate and process the content.
> 
> seems like an obvious use case...  Not sure how common this is in 
> practice, though.  Note that using XSLT to generate HTML from XML is
> not 
> an example of this use case; this use case involves actually storing
> the 
> XHTML source and processing it as such.

If you process stored content using an XML toolchain and then serve the output, the input doesn't need to be polyglot unless the files from the backing store are *also* served as-is without processing through the toolchain. Also, in the general case, generic XML toolchains aren't safe for producing content that gets served as text/html, so it's necessary for the serializer to be text/html compatible. The input doesn't need to be.

On my own site, I actually serve the backing store files as they are (as text/html) and use them for input to XML tools. However, the XML toolchain doesn't use an XML parser but an HTML parser. Even though it's a bit embarrassing that the parser isn't the one I've written, it's proof that XML parsers haven't been obsoleted just recently by HTML5-compliant parsers: They've already been obsoleted by non-compliant parsers that came before. (My setup uses TagSoup for parsing and GNU JAXP for serialization. The application is written in Jython. It has been working since 2005 and I haven't dared to touch it, because I've misplaced the sources for the exact GNU JAXP version...)

> On 5/17/10 1:52 PM, Sam Ruby wrote:
> > If there is to be a switch of some kind, I
> > would suggest that it be based on something that is likely to make
> an
> > operational difference, and that's why I prefer keying off of the
> > presence of the xmlns attribute on the html element.
> 
> That would make sense to me too.  Good idea!

FWIW, I'm OK with using the xmlns attribute on root as the in-band switch when an editor developer wants to offer an in-band switch for automatically making the output polyglottal when saving a document that was opened from a .html file.

I'm a bit concerned about a slippery slope of getting requests to use it as some kind of in-band switch to enable a more fussy validation mode next, though, if xmlns is "officially" suggested as in-band switch for anything at all. The original rationale for permitting the xmlns attribute as a talisman was easing migration from Appendix C-ish "XHTML" to HTML5. If an editor uses the talisman as a switch for automating something, it's not making things harder for the user. If a validator emits more errors for hand-authored content when it has a talisman, migration doesn't get easier but harder (when the author wants to migrate to text/html-only HTML5--not to polyglot (X)HTML5).

"Leif Halvard Silli" <xn--mlform-iua@målform.no> wrote:

> Fact is that at least KompoZer and NVU produce Appendix C compatible 
> XHTML for documents in text/html mode if the document has an XHTML 
> DOCTYPE. Thus, KompoZer and NVU are dependent on the doctype in order
> to produce XHTML. 

What KompoZer and NVu do is not a use case.

> > Leif, are there additional use cases that I'm missing?
> 
> Authoring.

That's too vague to be evaluated as a use case.

> Validation. Being able to offer validation quickly.

That's vague, too. How does polyglotness enable validation (more) quickly?

> Avoiding other versioning systems from develop.

Let's avoid using doctypes as a polyglot flag first and avoid crossing the other bridges when we get there.

> Why does Mac OS X use use XML configuration files with Apple doctypes,
> if DOCTYPEs are useless?

I don't know, but my scientific guess is: Because the developers of the format had seen other XML formats use doctypes.

> KompoZer operates with a text/html DOM, but makes sure that it creates XHTML compatible output. 

So KompoZer doesn't support editing or even preserving XHTML+SVG or XHTML+MathML?

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Wednesday, 19 May 2010 14:02:10 UTC