[Bug 11755] New: The introduction should be clearer about use cases best addressed by polyglot markup

http://www.w3.org/Bugs/Public/show_bug.cgi?id=11755

           Summary: The introduction should be clearer about use cases
                    best addressed by polyglot markup
           Product: HTML WG
           Version: unspecified
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: HTML/XHTML Compatibility Authoring Guide (ed: Eliot
                    Graff)
        AssignedTo: eliotgra@microsoft.com
        ReportedBy: hsivonen@iki.fi
         QAContact: public-html-bugzilla@w3.org
                CC: mike@w3.org, public-html-wg-issue-tracking@w3.org,
                    public-html@w3.org, eliotgra@microsoft.com


The draft says:
"It is often valuable to be able to serve HTML5 documents that are also well
formed XML documents. An author may, for example, use XML tools to generate a
document, and they and others may process the document using XML tools. These
documents are served as text/html."

The quoted part has four problems:

1) It claims "often valuable" in the passive voice without substantiating the
claim beyond what is said in the next sentence, but the next sentence isn't on
a very strong ground as seen below.

2) If an author uses XML tools to generate the document, using a generic XML
serializer is not OK, because a generic serializer might do whatever is OK in
application/xhtml+xml but not necessarily in text/html. As a trivial example, a
generic XML serializer might likely serialize a script element pointing to an
external script as <script src="foo.js"/>, which would be very wrong in
text/html. Thus, the author needs a text/html-aware serializer anyway to be
able to successfully use the output as text/html: either a polyglot serializer
or a text/html-only serializer. Once a text/html-aware serializer is needed
instead of a generic XML serializer, it isn't necessary to make the serializer
polyglot if the goal is simply to produce text/html content using otherwise XML
tools. Monoglot serializers for either text/html or for XML can serialize the
text content of the style and script elements with relative ease. However, a
strictly polyglot serializer can't support inline scripts and styles in the
general case. (The serializer would either have to relax DOM sameness by
generating /* <![CDATA[ */ at start of the text content and /* ]]> */ at the
end of the text content or to ban the characters <, > and & in the script or
style sheet, which would be a drastic restriction.) Using a monoglot serializer
avoids this problem, so polyglot isn't a good solution for creating text/html
content from an XML tool (such as an XSLT processor).

3) Polyglot isn't a very effective way of allowing others to process the
document using XML tools, either. For someone else to be able to consume
text/html content using an XML parser, every document (s)he wants to consume
has to be polyglot. If the content to be consumed is Web content in general,
there's no way to force all of it to be polyglot. From the point of view of the
content consumer, it is easier to consume text/html content with an HTML parser
that exposes the same APIs to the rest of the app that an XML parser would
expose than to make agreements with document authors to get them to write
polyglot markup. Once the consumer includes an HTML parser is the app, there's
no longer value in any of the consumed docs being polyglot. Thus, from the
point of view of a would-be polyglot author, making a document polyglot won't
be of value if someone else whose document needs to be consumed by the same
consumer makes a monoglot document. The would-be polyglot author might as well
be the first one to make a monoglot document that forces the consumer to deal.
Thus, getting authors to use polyglot markup isn't as good a solution to
consuming text/html content with XML tools as putting an HTML parser at the
start of the pipeline is.

4) The last quoted sentence says the documents are served as text/html but
doesn't say why. A polyglot document is by definition a document that also
works as application/xhtml+xml. The main reason not to serve such documents as
application/xhtml+xml only is catering to the userbase of IE version earlier
than IE9. It would be a shame to get a situation where authors keep addressing
a transient problem even when the problem is gone (when IE6 through IE8 users
no longer form a substantial audience). Once the author no longer wishes to
address the IE6 through IE8 audience, the author could use a monoglot XML-only
serializer for point #2 above.

Please either substantiate "often valuable" better or remove the claim. Please
replace the stated use cases with use cases for which using polyglot markup is
indeed the best known solution or, alternatively, please at least mention the
alternative solutions I outlined in points #2 and #3 above. Please mention that
the reason for serving content as text/html when it would work as
application/xhtml+xml is a transient reason.

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

Received on Friday, 14 January 2011 09:11:31 UTC