W3C home > Mailing lists > Public > public-html-bugzilla@w3.org > January 2011

[Bug 11909] New: The principles of Polyglot Markup - validity? well-formed? DOM-equality?

From: <bugzilla@jessica.w3.org>
Date: Fri, 28 Jan 2011 13:15:04 +0000
To: public-html-bugzilla@w3.org
Message-ID: <bug-11909-2486@http.www.w3.org/Bugs/Public/>
http://www.w3.org/Bugs/Public/show_bug.cgi?id=11909

           Summary: The principles of Polyglot Markup - validity?
                    well-formed? DOM-equality?
           Product: HTML WG
           Version: unspecified
          Platform: PC
               URL: http://dev.w3.org/html5/html-xhtml-author-guide/html-x
                    html-authoring-guide.html
        OS/Version: All
            Status: NEW
          Severity: major
          Priority: P2
         Component: HTML/XHTML Compatibility Authoring Guide (ed: Eliot
                    Graff)
        AssignedTo: eliotgra@microsoft.com
        ReportedBy: xn--mlform-iua@xn--mlform-iua.no
         QAContact: public-html-bugzilla@w3.org
                CC: mike@w3.org, public-html-wg-issue-tracking@w3.org,
                    public-html@w3.org, eliotgra@microsoft.com


PROPOSAL:

Suggest having a *normaltive* scope description of Polyglot Markup, and I am
suggesting the following:

]] Polyglot Markup describes a HTML5-valid (validity), HTML5-comaptible
(well-formedness), XML-well-formed (well-formedness), DOM-equal (DOM equality)
subset of HTML5.  It does not, however,  occupy itself with XML-validity.
XML-compatible when necessary for well-formedness reasons.  But always both
HTML-valid and HTML-compatible. [[

This could go into the intro or in a new paragraph. It would be ideal to
establish a vocabulary which could be used throughout the spec. Then one could
say "To use <colgroup> is a  DOM-equality issue". OR "<p/> cannot be used
because it isn't HTML5-valid". Or "An @id cannot begin with a number for
XML-validity reasons". (XML 1.0 has a similar section where it defines what
e.g. well-formed and valid etc means.)

CURENT STATUS

Currently, the principles of Polyglot Markup can be gleaned from the Abstract
("identical document trees" etc), from the Introduction ("valuable to be able
to serve HTML5 documents that are also well formed XML documents" and from the
title of the spec ("HTML-compatible XHTML documents").

DISCUSSION

Regarding XML-validity: For example <div id="999"></div> is valid HTML5. But it
is invalid (but well-formed) XML. If we (as I suggests) do *not* want it to be
XML-valid, then  this should be said. May be polyglots should strive to be
XML-valid also? However, since the weight is on being HTML-compatible rather
than XML-compatible, then this is an argument in favour of ignoring
XML-validity and instead putting the weight on HTML-compliance. But then we
should be conscious about it and state it in the draft.

According to Henri Sivonnen, the Polyglot  spec should only describe a subset
of XML1 and HTML5.  We should only read the specs and pick what is compatible
with both specs. But which subset? 

* Validity subset: The HTML-valid subset? The XML-valid subset? The HTML +
XML-valid subset?
* Well-formed subset?
* Well-formed and valid?
* DOM equal subet?
* All the above?

The two main problems in this list are: DOM equality (this is not described in
a spec that we can look at) and XML-validity (should we care?). But also, to a
degree, HTML-validity/-conformance.  It seems like HTML-conformance/-validity
should not count as as important as HTML-compatibility. 

PROBLEM EXAMPLES:

<colgroup>: The draft says that polyglot markup *requires* <colgroup/>, or else
the XML dom will be different from the HTML DOM. OK. But then we are outside
both validity and well-formedness - then we are in the "equality" land. Which
isn't described in any other standard, which we can formulate a subset of. It
is Polyglot Markup's task to describe the DOM equal subset.  

<xmp> and <plaintext>: to discuss those elements inside Polyglot Markup shows
an emphasis on equality, rather than validity (they are HTML5-invalid) or
well-formedness (they have no XML-well-formedness problems). The only problem
is that they work differently in HTML and XHTML.

attributes - line-feeds, tabs and CR inside attributes: this is not whether a
validity issue or a well-formed issue. It is purely - and only sometimes
important - DOM equality issue.

@id: XML has some global validity rules for @id. For instance, an @id may not
begin with a number. Should it matter to Polyglot Markup?

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Friday, 28 January 2011 13:15:07 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 28 January 2011 13:15:09 GMT