- From: Richard Tobin <richard@cogsci.ed.ac.uk>
- Date: Thu, 19 Jul 2001 12:22:55 +0100 (BST)
- To: James Clark <jjc@jclark.com>, Richard Tobin <richard@cogsci.ed.ac.uk>, www-xml-infoset-comments@w3.org
> (a) The parser has seen a declaration of the entity as a parsed external
> general entity but, for whatever reason, decides not to include it
>
> (b) A parser is presented with a document that is not standalone (because
> it lacks a standalone declaration and references an external parameter
> entity or external DTD) but processes only the declarations in the internal
> subset, and subsequently encounters a general entity reference for which it
> has not seen a declaration.
To (b) you could perhaps add documents which *do* have
standalone="true" but which are not in fact standalone (so they
are invalid, but validity is not required for the Infoset).
The infoset ("the one true infoset") of a document does not depend on
whether the parser reads the external subset or expands general
entities. It is defined as being what would be obtained by a parser
that reads and expands. Unexpanded entity reference items serve to
allow a processor to indicate that it has not fully determined the one
true infoset, either because it doesn't expand external entities -
your case (a), or has not read everything - your case (b). I still
don't understand why you don't think it should cover both those cases.
[Actually, there's a third case - the external subset has been read
and there definitely isn't a declaration for the entity. This is
still just a validity error at least in theory. Unexpanded entity
references are used for this case as well. All three cases are
distinguishable by looking at the [system identifier] property of the
unexpanded entity reference - it will have a value, be Unknown, or
have No Value according to which case it is.]
> The infoset in general cannot handle case (b). Specifically, a reference to
> an entity declared in an external DTD may occur in an attribute value, but
> there's no way for the infoset to represent this (without substantial
> changes to the way attributes are handled).
This is a problem for processors in general, not only the infoset.
Erratum E10 to the second edition adds:
It is an error if an attribute refers to an entity when there is a
declaration for that entity which the processor has not read. This
can happen only when a non-validating processor is being used.
> There's also the related issue
> of attribute value of normalization and of default values: if the processor
> has not read the declarations, it cannot guarantee the construction of a
> correct infoset.
Right; the attributes in question will have [attribute type]
"Unknown", and the [all declarations processed] property will be false
to act as a further warning. In the one true infoset, [all
declarations processed] is always true, and attributes are always
defaulted and normalized and never have type Unknown.
> Your definition of [all declarations processed] is a bit misleading. If
> [all declarations processed] is true, then the value of attributes is not
> always known (because of attribute value normalization), and even the value
> of the [attributes] property of the element info item is not always known
> (because of default attributes).
??? Did you mean if [all declarations processed] is *false*? If so,
then yes, that's the point. That property tells you that the returned
infoset is inaccurate.
> My suggestion would be to back off on some of the [all declarations
> processed] stuff you have added relatively recently, and say simply that if
> a processor hasn't read the declarations, then it cannot in general
> guarantee the construction of a correct infoset.
Perhaps I'm being dense, but I still can't see the problem. Saying
that "if a processor hasn't read the declarations, then it cannot in
general guarantee the construction of a correct infoset" is exactly what
we do, and [all declarations processed] is strictly speaking just
some terminology we provide for talking about that. If a spec wants
to say that it requires all the declarations to have been read, the
Infosetese for that is "[all declarations processed] must be true".
-- Richard
Received on Thursday, 19 July 2001 07:23:01 UTC