Re: W3C Working Draft 20-December-1999 from David McKelvie on 2000-01-21 (www-xml-infoset-comments@w3.org from January to March 2000)

From: David McKelvie <dmck@cogsci.ed.ac.uk>
Date: Fri, 21 Jan 2000 12:30:25 GMT
To: www-xml-infoset-comments@w3.org
Message-Id: <23746.200001211230@grant.cogsci.ed.ac.uk>

Comments on 

	 XML Information Set
	 W3C Working Draft 20-December-1999

Attributes

Because there is no concept of a 'Attribute Definition Information
Item' in the data model, it would seem possible to have the same
attribute name on two different occurances of the same element type,
which have DIFFERENT attribute types and default values. This sounds
wrong. One solution would to have a new type of information item,
i.e. a 'Attribute Definition Information
Item' which contained the DTD/Schema info about the attribute and have
individual 'Attribute Information Items' refer to it.

If you want to allow editing of XML info sets, then it is necessary to
know if an attribute is defined as #FIXED, since presumably an
application should be discouraged from changing such attribute values.

DTDs

'Document Type Declaration Items' don't appear to contain lists of
element types or attributes defined in the DTD. If they did it would
make it easier for a processor to write out a valid XML file starting
from an XML information set.

If 'Document Type Declaration Items' contained a list of 'Element
Definition Items', this would make outputting DTDs easier. Also if 
'Element Information Items' pointed to these 'Element
Definition Items', this would centralise the element definitions, and
allow one to extend the XML Information Set data model to multiple
(perhaps hyperlinked) XML files, which had different DTDs and maybe
element types with the same name.

It is a pity that 'Element Information Items' don't contain content
models, as this makes it harder to validate changes to the XML info
set.  I would say that it was essential that 'Element Information
Items' say at least whether they are element-only or mixed content,
since applications that write info sets to XML might want to format
them differently.

Character content

Should this proposal say something about what the character content of
#PCDATA that contains unexpanded entity references is?
Maybe this is the wrong level, but if the info set doesnt handle this,
then we are going to hit problems defining a query language that
includes regular expressions over characters, i.e.

does re "ab" match "a&unexpanded;b" ?
does re "a.*b" ?

     David McKelvie

Definition of equality of info sets

Does this standard need to say when two info sets are considered the
same? Do they need to be identically the same, the same on the core
properties, or is it more complex?

The Document Info Item

Should the 'Document Information Item' say what peripheral info set
properties it includes?

Language Technology Group
Human Communication Research Centre, Edinburgh University,
2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND
Tel:(44) 131 650-4630
Fax:(44) 131 650-4587 email: dmck@cogsci.ed.ac.uk

Received on Friday, 21 January 2000 07:30:28 UTC