W3C home > Mailing lists > Public > www-xml-infoset-comments@w3.org > July to September 2001

Re: CR-xml-infoset-20010514: unexpanded entity reference

From: Richard Tobin <richard@cogsci.ed.ac.uk>
Date: Thu, 19 Jul 2001 12:22:55 +0100 (BST)
Message-Id: <200107191122.MAA05278@rhymer.cogsci.ed.ac.uk>
To: James Clark <jjc@jclark.com>, Richard Tobin <richard@cogsci.ed.ac.uk>, www-xml-infoset-comments@w3.org
> (a) The parser has seen a declaration of the entity as a parsed external 
> general entity but, for whatever reason, decides not to include it
> 
> (b) A parser is presented with a document that is not standalone (because 
> it lacks a standalone declaration and references an external parameter 
> entity or external DTD) but processes only the declarations in the internal 
> subset, and subsequently encounters a general entity reference for which it 
> has not seen a declaration.

To (b) you could perhaps add documents which *do* have
standalone="true" but which are not in fact standalone (so they
are invalid, but validity is not required for the Infoset).

The infoset ("the one true infoset") of a document does not depend on
whether the parser reads the external subset or expands general
entities.  It is defined as being what would be obtained by a parser
that reads and expands.  Unexpanded entity reference items serve to
allow a processor to indicate that it has not fully determined the one
true infoset, either because it doesn't expand external entities -
your case (a), or has not read everything - your case (b).  I still
don't understand why you don't think it should cover both those cases.

[Actually, there's a third case - the external subset has been read
and there definitely isn't a declaration for the entity.  This is
still just a validity error at least in theory.  Unexpanded entity
references are used for this case as well.  All three cases are
distinguishable by looking at the [system identifier] property of the
unexpanded entity reference - it will have a value, be Unknown, or
have No Value according to which case it is.]

> The infoset in general cannot handle case (b). Specifically, a reference to 
> an entity declared in an external DTD may occur in an attribute value, but 
> there's no way for the infoset to represent this (without substantial 
> changes to the way attributes are handled).

This is a problem for processors in general, not only the infoset.
Erratum E10 to the second edition adds:

  It is an error if an attribute refers to an entity when there is a
  declaration for that entity which the processor has not read. This
  can happen only when a non-validating processor is being used.

> There's also the related issue 
> of attribute value of normalization and of default values: if the processor 
> has not read the declarations, it cannot guarantee the construction of a 
> correct infoset.

Right; the attributes in question will have [attribute type]
"Unknown", and the [all declarations processed] property will be false
to act as a further warning.  In the one true infoset, [all
declarations processed] is always true, and attributes are always
defaulted and normalized and never have type Unknown.

> Your definition of [all declarations processed] is a bit misleading.  If 
> [all declarations processed] is true, then the value of attributes is not 
> always known (because of attribute value normalization), and even the value 
> of the [attributes] property of the element info item is not always known 
> (because of default attributes).

??? Did you mean if [all declarations processed] is *false*?  If so,
then yes, that's the point.  That property tells you that the returned
infoset is inaccurate.

> My suggestion would be to back off on some of the [all declarations 
> processed] stuff you have added relatively recently, and say simply that if 
> a processor hasn't read the declarations, then it cannot in general 
> guarantee the construction of a correct infoset.

Perhaps I'm being dense, but I still can't see the problem.  Saying
that "if a processor hasn't read the declarations, then it cannot in
general guarantee the construction of a correct infoset" is exactly what
we do, and [all declarations processed] is strictly speaking just
some terminology we provide for talking about that.  If a spec wants
to say that it requires all the declarations to have been read, the
Infosetese for that is "[all declarations processed] must be true".

-- Richard
Received on Thursday, 19 July 2001 07:23:01 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 16 March 2009 11:12:24 GMT