XHTML External Subset Handling from Alex Milowski on 2011-04-14 (public-xml-processing-model-wg@w3.org from April 2011)

From: Alex Milowski <alex@milowski.org>
Date: Thu, 14 Apr 2011 08:13:21 -0700
To: XProc WG <public-xml-processing-model-wg@w3.org>
Message-ID: <BANLkTikrov4FZrKn0DPqsLLC-OQoggfC_g@mail.gmail.com>

I just re-read the section on the XHTML syntax that Henry has pointed
out to me several times [1].  They have this interesting statement:

   "user agents should attempt to retrieve the above external entity's
content when one of the above public identifiers is used, and should
not attempt to retrieve any other external entity's content."

They then give a data URI for the "DTD" to use.  Internally, most
browsers do not actually need to process the external subset as all it
contains is entity definitions.  I'll have to see how this actually
works in the context of WebKit or Firefox.

Maybe the right thing to do here is:

   * ensure that tricks like what is going on with HTML5 with XHTML
syntax work properly with our profiles,
   * allow standalone='yes' to turn off external subset processing

While some browser implementations have chosen to turn off external
subset processing, it isn't necessarily justified by some demonstrable
user or specification requirement except that it tracks with the
desire not to make unnecessary network requests.  If we enable
standalone semantics, then the users have control over browser
behavior and the recommended policy for minimizing network traffic
would be to ensure that your documents *are* standalone and then say
so in the XML processing instruction.

[1] http://dev.w3.org/html5/spec/Overview.html#parsing-xhtml-documents

-- 
--Alex Milowski
"The excellence of grammar as a guide is proportional to the paucity of the
inflexions, i.e. to the degree of analysis effected by the language
considered."

Bertrand Russell in a footnote of Principles of Mathematics

Received on Thursday, 14 April 2011 15:13:50 UTC