Re: [xmlProfiles-29] TAG recommendation for work on subset of XML 1.1 from Henry S. Thompson on 2003-02-06 (www-tag@w3.org from February 2003)

From: Henry S. Thompson <ht@cogsci.ed.ac.uk>
Date: 06 Feb 2003 17:07:47 +0000
To: Richard Tobin <richard@cogsci.ed.ac.uk>
Cc: www-tag@w3.org
Message-ID: <f5blm0t72xo.fsf@erasmus.inf.ed.ac.uk>

Richard Tobin <richard@cogsci.ed.ac.uk> writes:

> > I am unconvinced of the necessity for subsetting the language, as
> > opposed to identifying a new conformance class alongside the two
> > already provided ('validating' and 'non-validating').
> > 
> > Call such a conformance class 'minimal' -- it can be trivially defined
> > as a further restriction of 'non-validating', as follows (this is an
> > edited copy of text from section 5.1 of XML 1.0 2e [1], changes in
> > bold):
> 
> I think the minimal conformance Henry describes is either too minimal
> or not minimal enough.  It ignores all entity declarations, and
> doesn't provide attribute defaults, but it still requires parsing
> ATTLIST declarations to do attribute normalization.  If we allow
> processors to ignore part of the internal subset, we might as well
> allow them to ignore the whole thing so that they don't even have to
> parse it.  Attribute normalization seems the least valuable of the
> three since it can usually be handled in the application.
> 
> (Skipping the internal subset is just a few lines of code: you have to
> watch out for comments and quoted literals.)

I'm happy with this too -- I made the proposal in the form I did to
catch all and only what I understood the SOAP requirements to be, but
I suspect you're right that just skipping the whole thing is better.

> The choice between defining a subset and defining a conformance level
> is a choice between two reductions in interoperability: we break
> either the rule that all parsers can handle all documents, or the rule
> that all parsers produce the same infoset for a given document.  Of
> course, both these rules are already broken.  The first is broken by
> unsupported encodings and URI schemes, the second by parsers that
> don't read the external subset or external general entities.

I take the former breakage to be quite different in kind from the
second -- unsupported encodings and URI schemes do not in practice
compromise my ability to get my XML through every processor there is,
because the REC provides a minimal bar which all processors must step
up to.

The second is a very different kind of variation -- it's pervasive,
widely experienced, and mostly well-understood.  By exploiting this my
proposal amounts to suggesting that we can meet the requirements which
the proponents of a subset are advancing, while still preserving the
coherence of XML as a type of document.

Meta point:  I'm concerned that the TAG asked the Core WG to explore a
solution, instead of asking that they explore satisfying a
requirement.  As put to Core, the request would on the face of it not
even allow for exploration of the kind of approach I've suggested,
since that approach doesn't define a subset at all.  None-the-less I
believe it does satisfy the requirement as I understand it, with fewer
negative side-effects than any subset approach would have.

ht
-- 
  Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
                      Half-time member of W3C Team
     2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
	    Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk
		     URL: http://www.ltg.ed.ac.uk/~ht/
 [mail really from me _always_ has this .sig -- mail without it is forged spam]

Received on Thursday, 6 February 2003 12:07:37 UTC