Disposition of comments on Infoset WD of 1999-05-17 from John Cowan on 1999-06-16 (www-xml-infoset-comments@w3.org from April to June 1999)

From: John Cowan <cowan@locke.ccil.org>
Date: Wed, 16 Jun 1999 16:48:29 -0400
To: www-xml-infoset-comments@w3.org, XML-infoset-wg Mailing List <w3c-xml-infoset-wg@w3.org>
CC: tbray@textuality.com, connolly@w3.org, mneedlem@dra.com, mrys@microsoft.com, w3c-i18n-ig@w3.org
Message-ID: <37680D9D.39E53EE5@locke.ccil.org>
The following unavoidably lengthy document represents an
interim disposition of comments received on or before
1999-06-16 by the Infoset WG for the Infoset draft of
1999-05-17.

The term "Noted" means that the comment took a position
on an existing issue not yet decided by the WG.

Except as noted, the unquoted language below represents
the Editors' beliefs about the views of the WG, and may
in principle be overridden by the WG at a later date.

================================================================

Tim Bray writes:

> >Query: Should comments and the document type declaration be required rather 
> >than optional?
> 
> [...]  I really think they should be optional. 

Noted.

> The processor is required to read & use some attribute declarations,
> namely those in the internal subset, which means that making those
> required rather than optional is more or less free.  On the other hand,
> it wouldn't be reasonable to make external attribute declarations
> required parts of the infoset.

Noted.

> >Query: When Namespace processing is being performed, should the original 
> >prefix also be available?
> >Query: Should attribute starting with "xmlns" be included even when 
> >performing Namespace processing?
> 
> Seems harmless *as long as it's optional* - but the prefix and xmlns 
> stuff shouldn't be available as a regular attribute/element names, rather 
> as distinct information items.

Noted.

> >Query: Should xml:lang and xml:space also be excluded and modeled as 
> >character properties instead?
> 
> No!  They are ordinary first-class citizen attributes, provided as 
> places for authors, if they wish, to put messages to applications.

Noted.

> For ENTITIES, IDREFS, and NMTOKENS, it makes sense to offer a higher-level
> item breakdown, e.g. an ordered list of "token information items" - cf. 
> recent communication from Syntax WG.

Accepted as new issue.
 
> >2.A reference to the entity information item for an external parsed entity, 
> >if the processor has read the declaration. 
> 
> Shouldn't that be required?  *IF* the processor has read the declaration
> there's no excuse for not passing it along.  If the processor hasn't,
> the absence of this item might be useful information too.

Accepted as new issue.

> > "... in some form:"
> 
> This trailing adverbial phrase, which appears essentially on every
> information item, seems more or less completely free of semantics.
> If it is actually saying something useful, that thing should be said
> in one place at the top of the document, right?

Rejected.  We consider it essential to emphasize repeatedly that the
Infoset is not an API or a guideline for one.

> >4.A reference to the entity information item for the entity in which this 
> >character appears. 
> 
> This is an optional property for just about everything, which makes
> sense.  I think you need to be a bit more precise and say that you
> actually mean the most immediately enclosing parsed entity - yes, that
> should be self-evident, but...

Noted.
 
> > 2.8.1. Document Type Declaration: Optional Properties
> 
> Uh, the root element type (that little name dangling after <!DOCTYPE) is 
> there in the syntax, shouldn't it be *at least* an optional property?
> And if you're going to parse the damn thing anyhow, why not make it
> required?

Rejected.  The root element type provides nothing not provided by the
document element info item, and is essentially an SGML-compatibility
holdover.  In SGML, one needed to know the true root element because
of the possibility of "O O" containers with tags omitted wrapping
the lexically apparent root.  Not so in XML.
 
> Could one also make a case for two binary-valued information items
> saying whether the external & internal subsets are provided?  Once again,
> I see no benefit in making this optional, since the processor sort of
> can't help knowing.

Rejected.  Processors can already make the information apparent by
proper use of the "entity source" properties.
 
> Uh, only if the declarations are read by the processor, right?  And
> the "(NDATA)" adds nothing, it's a hangover of SGML jargon which we
> don't need, the phrase "unparsed external entities" is very precise,
> and in fact you could just say "unparsed entities" with no ambiguity,
> they are by definition external.

Accepted as editorial change.
 
> >Query: Is it confusing to represent the external DTD subset with an entity 
> >information item?  (The XML Recommendation treats the external subset 
> >essentially as an external parameter entity, except that it does not have 
> >an entity name.)
> 
> Yeah, I think there are good arguments for both approaches, and it's
> not confusing either way.  Based on which leave it as an entity in the
> interests of creating less apparatus.

Noted.

> >3.The system identifier of the entity. If the information item represents 
> >an internal entity, the system identifier is always null, and if it 
> >represents the document entity, the value may be null; otherwise, it
> >must have a non-null value. 
> 
> OK, I get what you mean, but it's an eyebrow-raiser.  Maybe "For the
> document entity, the system identifier item may be null, but it may
> not be an empty string".  Or some such.

Rejected.  No benefit is gained by confounding a null value with
an empty string value.
 
> >Query: Should the information from the XML declaration or text declaration 
> >also be optionally available?
> 
> Yes!  Maybe even required.  After all, the processor is required to not
> only read but use it.

Noted.
 
> >7.A reference to the entity information item for the entity in which the 
> >entity was declared. 
> 
> This one is important; because it's information you need to resolve
> relative URI references.  I don't suppose it could be made required?

Rejected.  In the editor's opinion (the WG has not yet fully discussed
the issue), resolving relative URLs is an application-sensitive process,
as it may involve the processing of application-specific information
like the XHTML base@href value.  The language of clause 4.2.2 of
the XML Rec supports this.

In addition, the entity may be declared in the internal subset of a
document which has arrived on an anonymous input stream for which
no URI is available ("standard input"), as noted in XML Rec
clause 4.8.

> >3.The default value of the attribute. If the attribute was declared with 
> >the default value #IMPLIED or #REQUIRED, this value will be null. 
> 
> There might be a case for having a required binary-value property saying 
> whether or not the value is #FIXED...

Rejected.  This information is provided as optional information for
an attribute-declaration info item.
 
> >Namespace processing [Namespaces] represents a virtual transformation of an 
> >XML document,
> 
> Oh yeah?  Step outside and say that... hmmm... I see what you're trying
> to say, but the particle "virtual transformation" kind of escapes me.
> Would it be better to say that it "provides a mapping of element types
> and attribute names to two-part names based on..."

Accepted in principle.  The WG has already decided to use language
based on "views" rather than "processing".
 
> >Query: Is it best for the Information Set to explicitly allow for a document 
> >without Namespace processing?
> 
> Good question.  I'd bite the bullet and say no, but that'd be an issue that
> I think you'd have to throw to the Plenary.  In fact, you should probably
> throw it either way.

Noted. 

> >An XML processor conforms to the XML Information Set if it provides all the 
> >required information items and all required associated information.
> 
> Could you stop after the word "items" without semantic loss?

Rejected.  Providing all required info items is not enough unless
each info item provides all the information that is required for
that type of info item.
 
> >6. XML 1.0 Reporting Requirements
> 
> Why is this section here?  (There may be a good reason,
> but you should say what it is.)  I don't think it adds value.  Maybe
> it's an appendix?

Accepted as editorial change.
 
> >1.The information in the XML declaration and text declarations. 
> >2.Element content models from ELEMENT declarations. 
> 
> I think you should have #1 - it's totally compulsory for the
> processor anyhow.

Noted.

>  The complete absence of anything about element
> declarations, when you have all the attribute apparatus, certainly
> stands out as an anomaly.  Should you include some justification for 
> this choice?

Accepted as editorial change.  Language will be added explaining that
the Infoset does not include DTD/Schema information except as
absolutely necessary.
 
> >3.The grouping and ordering of attribute declarations in ATTLIST 
> >  declarations. 
> ...
> >11.Any ignored declarations, including those within an IGNORE conditional 
> >   section, as well as entity and attribute declarations ignored because 
> >   previous entity declarations overrode them. 
> 
> I think all these are better left out.

Noted.
 
> >Furthermore, the XML Infoset does not provide any method of assigning a 
> >single series of numbers to all child nodes of an element or of the 
> >document
> 
> Blecch.  Yecch.  You should probably say "the design of XML makes it
> impossible for the Infoset to provide...." rather than this.

Accepted in principle as editorial change.
 
> I think the particle "element's name" is infelicitous.  The right 
> terminology is "element type".

Accepted as editorial change.
 
> Should there be an optional item for elements linking back to the 
> parent element, and for attributes linking back to the containing
> element?

Rejected.  This information is reconstructible from the infoset as
it exists.  An earlier version of the Infoset WD provided this
information, which was believed by the WG to be excessively
confusing.

> > 8. Other Open Issues
> 
> I don't think any of these are very material.

Noted.

================================================================

Dan Connolly writes:

> The spec says:
> 
> "There is one processing instruction information item for
> every processing instruction in the document."
> -- http://www.w3.org/TR/1999/WD-xml-infoset-19990517#infoitem.pi
> 
> but I don't see any specification of how to count how many
> processing instructions there are in an XML document.

Accepted in principle as editorial change.

================================================================

Mark Needleman writes:

> But Im unclear as to exactly what the purpose of it is -
> what contexts and applications is it meant to support - for what mechanism
> will it be used.

Accepted in principle as editorial change.

================================================================

Michael Rys, speaking for Microsoft, writes:

> Having a namespace information item that is visible during the
> scope of its existance would be a better approach than just  
> encoding the information in the attribute information item.

Accepted as new issue.

> In principle, we think that a namespace-aware processor should
> produce all information that a namespace-unaware processor 
> surfaces plus the additional information about the namespaces.

Accepted as new issue.

>    We believe, and apparently the DOM committee does as well, that 
>    the prefix should be available in an upward compatible name.  

Accepted as new issue.

> The design allows one to get from an attribute to the DTD item  
> defining the attribute.

Rejected.  The design allows one to get from an attribute instance
to the default value assigned to that attribute, giving the
attribute name and element type for disambiguation.  This is done
so that the default value can be maintained in one place,
rather than copying it into every attribute info item.
The default value is made part of the infoset at all
as a DOM Level 1 requirement.

The optional information associated with attribute-declaration
info items is provided only because processors may in fact
have it available as a consequence of DTD processing.

>  There is 
> - no corresponding facility for elements (i.e., there is no 
>   element declaration item),

See Bray above.

> and 
> - no facility to get from an attribute to its definition if the 
>   definition is in some form other than DTD.

Accepted in principle as editorial change.  The language will be
loosened so that attribute-declaration info items can be
generated by schema mechanisms other than the DTD.
 
> It is highly likely that the future will bring schema formats other  
> than DTD, and since these schemas will use XML instance syntax, 
> they will be extensible, meaning that they will contain an  
> unpredictable range of attributes and elements not forseeable in  
> any closed information model.

Rejected.  The Infoset model is declared open, not closed
in paragraph 3 of the introduction.

>  We need a way to navigate from an 
> attribute or element to its definitional information item, without 
> prejudice on the type of that item (be it based on XML Schema or
> DTD).

Rejected.  "Navigation" is a property of a concrete API, not
an abstract definition of the content of a document.
  
> Note that we understand that the mandate of the WG is to
> describe XML 1.0 plus namespaces and not any upcoming and not yet  
> defined schema definition. However, we strongly feel that
> we should provide for future extension in that area and not lock 
> ourselves in.

Noted.  No "lock in" is proposed.
 
> We strongly believe that having a reference information item that
> provides linking information (either via ID/IDREFs, URIs, 
> XLink/XPointer or any other mechanism) should be part of the
> infoset.

Noted.
 
> 2.1.1 item 1 and 2.1.2 item 4 refers to a "document element."  The  
> XML 1.0 spec defines this a "root" (providing "document element"  
> only as an appositive).  

Rejected.  The term "root" is felt to be easily confused
with its alternative use to specify the actual root of the
info-item tree, namely the document info item.
  
> 2.3: On the question of xml:lang and xml:space, I believe it would  
> be very good if the information model provided a context to  
> determine the language and whitespace settings on any information  
> element - you shouldn't be required to walk the tree to determine 
> the current language or whitespace setting in scope. 

Noted.

> 2.6: In the second paragraph there is specific language regarding  
> CR and LF. Surely, CR and LF (or a collapsed representation with  
> LF) should be representable when xml:space="preserve", right? Is  
> this the intent?

Rejected. Changing bare CR and CR/LF to bare LF is a general
XML rule (clause 2.11).  The infoset is not changed dependent
on the value of xml:space.
 
> 2.9.1: there is language in bullet 1 that indicates that parameter  
> entities are modeled. This is confusing in light of that <!ELEMENT  
> ...> is not modeled. Modeling parameter entities (which most schema  
> proposals are trying very hard to avoid) hardly seems appropriate  
> without a complete modeling of the DTD. 

Accepted as new issue.
 
> While entity information items are optional, it seems that 
> requiring support for all kinds of entities (such as - yuck -
> external parameter entities) if entity information items are 
> provided is too strong. We would rather see the scope of the 
> provided information to be optional as well (e.g., provide only
> internal general entities but not external parameter entities).

Accepted as editorial improvement.  We believe this is what
the draft says.

> Another question w.r.t. entity info items is, whether an unparsed
> external entity needs to be represented or if this could also be
> made optional. For example, an XML database processor may not
> want to deal with such unparsed data in a special way and just
> treat it as character data.

Clause 4 paragraph 3 of the XML Rec makes this information
mandatory.

> 2.11.1: Include the namespace information item.

Rejected.  DTDs do not provide enough information to
reconstruct this value.  An ATTLIST declaration gives
properties attached to a specific attribute name,
irrespective of the prefix-URI binding of the
attribute declared.

> 3. If a namespace-aware processor just adds information, then
> this section needs to be changed.

This comment is too vague to accept or reject.

================================================================

The W3C I18N WG writes:

> -  In section 2.3 [ISet-sec-2.3], should not exclude xml:lang from attribute 
>    information items.

Noted.

> -  In section 2.6.1 [ISet-sec-2.6], should refer to the permitted character 
>    code values in the XML specification rather than include an incomplete 
>    specification of the permitted values.

Accepted in principle as editorial change.  The wording will make clear
that not every value in the range is legal.

-- 
John Cowan	http://www.ccil.org/~cowan		cowan@ccil.org
	You tollerday donsk?  N.  You tolkatiff scowegian?  Nn.
	You spigotty anglease?  Nnn.  You phonio saxo?  Nnnn.
		Clear all so!  'Tis a Jute.... (Finnegans Wake 16.5)
Received on Wednesday, 16 June 1999 16:49:12 UTC