Comments on WD-xml-infoset-19990517

This is great, guys.  It positively cries out to be rolled into the 
XML spec.

>Query: Should comments and the document type declaration be required rather 
>than optional?

No and yes.  No processor is required to transmit comments, yet processors
are required to process the internal subset, as a side-effect of which
they must have parsed the doctype declaration.

As for comments, I can't emphasize enough that in the original
design process for XML 1.0, we rejected things like <!> and so on
specifically to enable conforming processors to discard comments
lexically.  I really think they should be optional. 

>7.An unordered set of attribute declaration information items, one for each 
>attribute declaration read by the processor. 

The processor is required to read & use some attribute declarations,
namely those in the internal subset, which means that making those
required rather than optional is more or less free.  On the other hand,
it wouldn't be reasonable to make external attribute declarations
required parts of the infoset.

>Query: When Namespace processing is being performed, should the original 
>prefix also be available?
>Query: Should attribute starting with "xmlns" be included even when 
>performing Namespace processing?

Seems harmless *as long as it's optional* - but the prefix and xmlns 
stuff shouldn't be available as a regular attribute/element names, rather 
as distinct information items.

>Query: Should xml:lang and xml:space also be excluded and modeled as 
>character properties instead?

No!  They are ordinary first-class citizen attributes, provided as 
places for authors, if they wish, to put messages to applications.

>3.An ordered list of character information items, one for each character 
>appearing in the normalized  attribute value. 

For ENTITIES, IDREFS, and NMTOKENS, it makes sense to offer a higher-level
item breakdown, e.g. an ordered list of "token information items" - cf. 
recent communication from Syntax WG.

>2.A reference to the entity information item for an external parsed entity, 
>if the processor has read the declaration. 

Shouldn't that be required?  *IF* the processor has read the declaration
there's no excuse for not passing it along.  If the processor hasn't,
the absence of this item might be useful information too.

> "... in some form:"

This trailing adverbial phrase, which appears essentially on every
information item, seems more or less completely free of semantics.
If it is actually saying something useful, that thing should be said
in one place at the top of the document, right?

>4.A reference to the entity information item for the entity in which this 
>character appears. 

This is an optional property for just about everything, which makes
sense.  I think you need to be a bit more precise and say that you
actually mean the most immediately enclosing parsed entity - yes, that
should be self-evident, but...

> 2.8.1. Document Type Declaration: Optional Properties

Uh, the root element type (that little name dangling after <!DOCTYPE) is 
there in the syntax, shouldn't it be *at least* an optional property?
And if you're going to parse the damn thing anyhow, why not make it
required?

Could one also make a case for two binary-valued information items
saying whether the external & internal subsets are provided?  Once again,
I see no benefit in making this optional, since the processor sort of
can't help knowing.

>Entity information items are optional, except for information items 
>representing unparsed external (NDATA) entities, which are required to 
>appear in the information set.

Uh, only if the declarations are read by the processor, right?  And
the "(NDATA)" adds nothing, it's a hangover of SGML jargon which we
don't need, the phrase "unparsed external entities" is very precise,
and in fact you could just say "unparsed entities" with no ambiguity,
they are by definition external.

>Query: Is it confusing to represent the external DTD subset with an entity 
>information item?  (The XML Recommendation treats the external subset 
>essentially as an external parameter entity, except that it does not have 
>an entity name.)

Yeah, I think there are good arguments for both approaches, and it's
not confusing either way.  Based on which leave it as an entity in the
interests of creating less apparatus.

>3.The system identifier of the entity. If the information item represents 
>an internal entity, the system identifier is always null, and if it 
>represents the document entity, the value may be null; otherwise, it
>must have a non-null value. 

OK, I get what you mean, but it's an eyebrow-raiser.  Maybe "For the
document entity, the system identifier item may be null, but it may
not be an empty string".  Or some such.

>Query: Should the information from the XML declaration or text declaration 
>also be optionally available?

Yes!  Maybe even required.  After all, the processor is required to not
only read but use it.

>7.A reference to the entity information item for the entity in which the 
>entity was declared. 

This one is important; because it's information you need to resolve
relative URI references.  I don't suppose it could be made required?

>3.The default value of the attribute. If the attribute was declared with 
>the default value #IMPLIED or #REQUIRED, this value will be null. 

There might be a case for having a required binary-value property saying 
whether or not the value is #FIXED...

>Namespace processing [Namespaces] represents a virtual transformation of an 
>XML document,

Oh yeah?  Step outside and say that... hmmm... I see what you're trying
to say, but the particle "virtual transformation" kind of escapes me.
Would it be better to say that it "provides a mapping of element types
and attribute names to two-part names based on..."

>Query: Is it best for the Information Set to explicitly allow for a document 
>without Namespace processing?

Good question.  I'd bite the bullet and say no, but that'd be an issue that
I think you'd have to throw to the Plenary.  In fact, you should probably
throw it either way.

>An XML processor conforms to the XML Information Set if it provides all the 
>required information items and all required associated information.

Could you stop after the word "items" without semantic loss?

>6. XML 1.0 Reporting Requirements

Why is this section here?  (There may be a good reason,
but you should say what it is.)  I don't think it adds value.  Maybe
it's an appendix?

>1.The information in the XML declaration and text declarations. 
>2.Element content models from ELEMENT declarations. 

I think you should have #1 - it's totally compulsory for the
processor anyhow.  The complete absence of anything about element
declarations, when you have all the attribute apparatus, certainly
stands out as an anomaly.  Should you include some justification for 
this choice?

>3.The grouping and ordering of attribute declarations in ATTLIST 
>  declarations. 
...
>11.Any ignored declarations, including those within an IGNORE conditional 
>   section, as well as entity and attribute declarations ignored because 
>   previous entity declarations overrode them. 

I think all these are better left out.

>Furthermore, the XML Infoset does not provide any method of assigning a 
>single series of numbers to all child nodes of an element or of the 
>document

Blecch.  Yecch.  You should probably say "the design of XML makes it
impossible for the Infoset to provide...." rather than this.

========sorry, out of sequence, back to 2.2.1==============

> 1.The URI part, if any, of the element's name. If Namespace processing is 
>not being performed, the URI part will always be null. 
>2.The local part of the element's name. 

I think the particle "element's name" is infelicitous.  The right 
terminology is "element type".

Should there be an optional item for elements linking back to the 
parent element, and for attributes linking back to the containing
element?

> 8. Other Open Issues

I don't think any of these are very material.

 -Tim

Received on Thursday, 20 May 1999 13:17:28 UTC