- From: Norman Walsh <ndw@nwalsh.com>
- Date: Thu, 13 Oct 2011 11:12:11 -0400
- To: public-xml-processing-model-comments@w3.org
- Message-ID: <m2hb3dorhw.fsf@nwalsh.com>
At the 13 Oct telcon we agreed that this was an informative message, not a comment on the spec. "Henry S. Thompson" <ht@inf.ed.ac.uk> writes: > I took an action some time ago to review the discussion in section 9.2 > of the HTML5 spec. [1] in regard to how external entity processing > during XML DOCTYPE statement parsing is specified. > > This section contains the following: > > "This specification provides the following additional information > that user agents should use when retrieving an external entity: the > public identifiers given in the following list all correspond to _the > URL given by this link_. > > -//W3C//DTD XHTML 1.0 Transitional//EN > -//W3C//DTD XHTML 1.1//EN > -//W3C//DTD XHTML 1.0 Strict//EN > -//W3C//DTD XHTML 1.0 Frameset//EN > -//W3C//DTD XHTML Basic 1.0//EN > -//W3C//DTD XHTML 1.1 plus MathML 2.0//EN > -//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN > -//W3C//DTD MathML 2.0//EN > -//WAPFORUM//DTD XHTML Mobile 1.0//EN > > "Furthermore, user agents should attempt to retrieve the above > external entity's content when one of the above public identifiers > is used, and should not attempt to retrieve any other external > entity's content." [emphasis added] > > The "URL given by this link" is a data: URI which resolves to a string > of 2125 entity declarations. > > This amounts, as far as I can see, to suggesting (note the use of > 'should' throughout (recall that the HTML5 spec. does not > typographically distinguish RFC2119 language, all uses of 'must', > 'should' etc. are normative unless explicitly noted to the contrary)) > that the XML parser invoked by a user agent for XHTML documents should > > a) Use a catalog [2] which maps all the above public identifiers to > the given fixed string; > > b) Not otherwise process the external subset at all. > > In principle, there's a lot to recommend this approach. It would > evidently solve a bunch of interop problems, and drastically reduce > the load on W3C servers. > > In practice, it leaves open a number of questions, which I think need > to be addressed: > > 1) Why 'should' and not 'must'? > > If ensuring interop is the goal here, surely we want user agents > all to just _do_ this. . . > > 2) Why not a number of other public identifiers? > > For example, -//W3C//DTD XHTML Basic 1.0//EN > -//W3C//DTD SVG 1.0//EN > -//W3C//DTD SVG 1.1//EN > -//W3C//MathML 1.0//EN > > 3) What exactly is that list of entities? How would I know if there > was a mistake of omission? > > 4) What about the _internal_ subset? Should it be processed > (consistent with the catalog story) or not (consistent with what > the XML spec. says processors may do, since the external subset is > "a special kind of external entity", and non-validating XML > processor may stop 'processing' the internal subset once they > choose not to read an external entity)? > > 5) What if the XML declaration for the document at hand includes > "standalone='no'" (or no standalone, which the XML spec. requires > to be interpreted as 'no')? > > (Note that as it stands Polyglot [2] does not allow either an XML > declaration or an internal subset). > > It seems to me the interoperability of existing XHTML toolchains and > HTML5 user agents is implicated by one or more of the above -- what > should the TAG say, and to whom? Should the TAG and the XML > Processing Model WG work together to define a Processor Profile [3] > which could be referenced normatively in section 9.2 of the HTML5 > spec.? > > ht > > [1] http://www.w3.org/TR/2011/WD-html5-20110525/the-xhtml-syntax.html#parsing-xhtml-documents [Last Call WD] > [2] http://www.w3.org/TR/2011/WD-html-polyglot-20110525/ > [3] http://www.w3.org/TR/xml-proc-profiles/ > -- > Henry S. Thompson, School of Informatics, University of Edinburgh > 10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440 > Fax: (44) 131 651-1426, e-mail: ht@inf.ed.ac.uk > URL: http://www.ltg.ed.ac.uk/~ht/ > [mail from me _always_ has a .sig like this -- mail without it is forged spam] Be seeing you, norm -- Norman Walsh Lead Engineer MarkLogic Corporation Phone: +1 413 624 6676 www.marklogic.com
Received on Thursday, 13 October 2011 15:12:43 UTC