- From: Richard Tobin <richard@cogsci.ed.ac.uk>
- Date: Thu, 19 Jul 2001 12:22:55 +0100 (BST)
- To: James Clark <jjc@jclark.com>, Richard Tobin <richard@cogsci.ed.ac.uk>, www-xml-infoset-comments@w3.org
> (a) The parser has seen a declaration of the entity as a parsed external > general entity but, for whatever reason, decides not to include it > > (b) A parser is presented with a document that is not standalone (because > it lacks a standalone declaration and references an external parameter > entity or external DTD) but processes only the declarations in the internal > subset, and subsequently encounters a general entity reference for which it > has not seen a declaration. To (b) you could perhaps add documents which *do* have standalone="true" but which are not in fact standalone (so they are invalid, but validity is not required for the Infoset). The infoset ("the one true infoset") of a document does not depend on whether the parser reads the external subset or expands general entities. It is defined as being what would be obtained by a parser that reads and expands. Unexpanded entity reference items serve to allow a processor to indicate that it has not fully determined the one true infoset, either because it doesn't expand external entities - your case (a), or has not read everything - your case (b). I still don't understand why you don't think it should cover both those cases. [Actually, there's a third case - the external subset has been read and there definitely isn't a declaration for the entity. This is still just a validity error at least in theory. Unexpanded entity references are used for this case as well. All three cases are distinguishable by looking at the [system identifier] property of the unexpanded entity reference - it will have a value, be Unknown, or have No Value according to which case it is.] > The infoset in general cannot handle case (b). Specifically, a reference to > an entity declared in an external DTD may occur in an attribute value, but > there's no way for the infoset to represent this (without substantial > changes to the way attributes are handled). This is a problem for processors in general, not only the infoset. Erratum E10 to the second edition adds: It is an error if an attribute refers to an entity when there is a declaration for that entity which the processor has not read. This can happen only when a non-validating processor is being used. > There's also the related issue > of attribute value of normalization and of default values: if the processor > has not read the declarations, it cannot guarantee the construction of a > correct infoset. Right; the attributes in question will have [attribute type] "Unknown", and the [all declarations processed] property will be false to act as a further warning. In the one true infoset, [all declarations processed] is always true, and attributes are always defaulted and normalized and never have type Unknown. > Your definition of [all declarations processed] is a bit misleading. If > [all declarations processed] is true, then the value of attributes is not > always known (because of attribute value normalization), and even the value > of the [attributes] property of the element info item is not always known > (because of default attributes). ??? Did you mean if [all declarations processed] is *false*? If so, then yes, that's the point. That property tells you that the returned infoset is inaccurate. > My suggestion would be to back off on some of the [all declarations > processed] stuff you have added relatively recently, and say simply that if > a processor hasn't read the declarations, then it cannot in general > guarantee the construction of a correct infoset. Perhaps I'm being dense, but I still can't see the problem. Saying that "if a processor hasn't read the declarations, then it cannot in general guarantee the construction of a correct infoset" is exactly what we do, and [all declarations processed] is strictly speaking just some terminology we provide for talking about that. If a spec wants to say that it requires all the declarations to have been read, the Infosetese for that is "[all declarations processed] must be true". -- Richard
Received on Thursday, 19 July 2001 07:23:01 UTC