Re: NU’s polyglot possibilities (Was: The non-polyglot elephant in the room) from David Sheets on 2013-01-24 (www-tag@w3.org from January 2013)

From: David Sheets <kosmo.zb@gmail.com>
Date: Thu, 24 Jan 2013 15:29:48 -0800
To: Alex Russell <slightlyoff@google.com>
Cc: "Michael[tm] Smith" <mike@w3.org>, Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>, public-html WG <public-html@w3.org>, "www-tag@w3.org List" <www-tag@w3.org>
Message-ID: <CAAWM5TzSEm4kxvVU=Qof=fuDfyvSbK0miQjR=6mGyvt7QW=Tfg@mail.gmail.com>
On Thu, Jan 24, 2013 at 2:14 PM, Alex Russell <slightlyoff@google.com> wrote:
> I find myself asking (without an obvious answer): who benefits from the
> creation of polyglot documents?

Polyglot consumers benefit from only needing an HTML parser *or* an
XML parser for a single representation.
Polyglot producers benefit from only needing to produce a single
representation for both HTML and XML consumers.

> If it's a closed ecosystem in which it's clear that all documents are XML
> (but which might be sent to the "outside" as HTML), then I don't understand
> why that ecosystem doesn't protect its borders by transforming HTML
> documents (via an HTML parser->DOM->XML serialization) to XML.

Why can't the publisher decide to allow both HTML and XML
interpretation? Why does sending documents to the "outside" as HTML
mean that they can no longer be well-formed XML?

> Other possible users/producers seem even less compelling: if there's an open
> ecosystem of documents that admit both HTML and XML, then it's always going
> to be necessary for consuming software to support HTML parsing (and likely
> also XML parsing).

No. Only one of HTML or XML is necessary, AFAICT. Why do you *need* an
HTML parser to consume a polyglot document?

You only need both if you are a general-purpose browser. Lots of
consuming software is not a general-purpose browser.

> If it's a world of HTML consumers that would like to
> access XML documents...well, just publish as (legacy) XHTML, no?

Generic legacy XHTML is not compatible with modern HTML. Defining the
intersection is the point of polyglot.

> What am I missing? Under what conditions can the expectations of producers
> and consumers of polyglot documents be simplified by the addition of
> polyglot markup to their existing world/toolchain?

It is simpler to manage a single representation than two separate but
similar representations (consider that the author may not have control
of their HTTP publication).

Use case:
1. Browsing documents in third-party-managed repo with HTML browser.
2. Save a polyglot doc after viewing.
3. Put polyglot doc into XML system -- it works!

The "addition" of polyglot is an extra set of invariants on the
document that reduces some consumers' burden. Of course, if it doesn't
suit you, you are free to not use it.

David

> On Thu, Jan 24, 2013 at 4:57 AM, Michael[tm] Smith <mike@w3.org> wrote:
>>
>> Leif Halvard Silli <xn--mlform-iua@målform.no>, 2013-01-24 01:23 +0100:
>>
>> > Michael[tm] Smith, Mon, 21 Jan 2013 23:47:40 +0900:
>> > > In the simplest implementation, the validator would need to
>> > > automatically parse and validate the document twice
>> >
>> > 1 Could you do that? Just guide the user through two steps:
>> >   HTML-validation + XHTML-validation?
>>
>> Of course doable. But I think it should be clear from my previous messages
>> that regardless of how feasible it is, I'm not interested in implementing
>> it. I don't want to put time into adding a feature that's intended to help
>> users more easily create conforming Polyglot documents, because I don't
>> think it's a good idea to encourage authors to create Polyglot documents.
>>
>> >   The second step could also
>> >   produce a comparison of the DOM produced by the two steps.
>>
>> That would require the validator to construct a DOM from the document.
>> Twice. The validator by design currently doesn't do any DOM construction
>> at
>> all. It does streaming processing of documents, using SAX events.
>>
>> Anyway, with respect, I hope you can understand that I'm not very
>> interested in continuing a discussion of hypothetical functional details
>> for a feature that I'm not planning to ever implement.
>>
>> > 2 But if the author uses a good, XHTML5-aware authoring tool that
>> >   keeps the code well-formed, then a *single* validation as
>> >   text/html should already bring you quite far.
>>
>> True I guess, if you're actually serving the document as text/html.
>>
>> But really what would get you even farther if you're using XML tools to
>> create your documents is to not try to check them as text/html at all but
>> instead serve them with an XML mime type, in which case the validator will
>> parse them as XML instead of text/html, and everything will work fine.
>>
>> Anyway, yeah, if somebody is manually using XML tools to create their
>> documents then I would think they'd already know whether they're
>> well-formed, and they'd not need to use the validator to tell them whether
>> they're well-formed or not. But of course a lot of documents on the Web
>> are
>> not created manually that way but instead dynamically generated out of a
>> CMS, and many CMSes that are capable of serving up XML don't always get it
>> right and can produce non-well-formed XML.
>>
>> All that said, I don't know why anybody who's serving a document as
>> text/html would normally care much, at the point where it's being served
>> (as opposed to the point where it's being created and preprocessed or
>> whatever), whether it's XML-well-formed or not.
>>
>> > 3 Finally, one very simple thing: polyglot dummy code! The NU
>> >   validator’s Text Field contains a HTML5 dummy that validates,
>> >   but only as HTML, since the namespace isn't declared. Bug
>> >   20712 proposes to add a dummy for the XHTML5 presets as well.[1]
>> > [1] https://www.w3.org/Bugs/Public/show_bug.cgi?id=20712
>>
>> Yeah, I suppose it's worth having the dummy document include the namespace
>> declaration if you've selected one of the XHTML presets. I'll get around
>> to
>> adding it at some point, if Henri doesn't first.
>>
>> >   Such a dummy no doubt serves as a teachable moment for many. And
>> >   as long as you just add the namespace and otherwise keep the
>> >   current dummy document intact, it would also, without banging
>> >   it into anyone’s head, be a polyglot example.
>>
>> True that simple document would be a conforming polyglot instance, but I
>> doubt most users would realize it as such, or care. The value of it would
>> just be for the simple user convenience of not needing to manually add the
>> namespace declaration in order to avoid the error message you get now.
>>
>>   --Mike
>>
>> --
>> Michael[tm] Smith http://people.w3.org/mike
>>
>
Received on Thursday, 24 January 2013 23:30:19 UTC