gloss, index for XML 1.0

Notes on XML 1.0, EC-xml-19980210

Brad Barber, 6 April 1998
Cambridge, MA
bradb@shore.net 617-497-8876

I've spent several weeks with the XML document and have noticed a
few items that may be helpful for the next edition.

Overall I am impressed with the specification and the standard. 
XML will be a winner as it stands.  The grammar specification is
particularly clear.  VC and WFC constraints work well.

                                                --Brad

---------------
human-readable glosses

One of the strengths of XML is that its documents are independent
of the programs that read XML.  Even without the DTD, a person
can reconstruct a document and what each element means.  Textual
element and attribute names are the key ingredients.  For
example, the element "<zipcode>02138</zipcode>" conveys the
meaning of "02138".

For large DTDs, the semantic link between element name and
meaning may be difficult or inconvenient to achieve.  I noticed
in the XML-data spec the use of '<description>' to provide a
gloss for an element.  I think this belongs in XML itself.

An 'elementdecl' or 'AttlistDecl' could be followed by

        <!GLOSS a short description >

It may be best to allow !GLOSS after any component, and to
encourage its use after every declaration.  With !GLOSS, an XML
browser could display a short description when the cursor is held
over the corresponding component.  If both the declaration and
use have !GLOSSes, the browser could display both.

---------------
Index missing.

I used a printed copy of the specification for annotation and the
bulk of my reading.  The specification needs a printed index of
non-terminals and internal links. 

------------------
1. Introduction ... "entity"

I found "entity" confusing as the most general component of an
XML document.  It would be clearer if "entity" was restricted to
'!ENTITY' components.  I never did figure out what "entity"
means.  It is used a lot!

--------------
2.6 Processing instructions ... [17] PITarget ... Name

I think 'Name' should be '( Notation | Name )'.  This formalizes
the use of Notation as a 'PITarget'.  The sentence "The XML
Notation mechanism ..." is not clear.  Also, what happens if a
DTD-defined notation is also an independent processing target?

------------------
2.8 ... External Subset ... [30] extSubset

Shouldn't 'TextDecl' be required in an external DTD?  If it is
missing, a processor can not determine the XML version of the
DTD.  The version of an XML document is likely to differ from
the version of its DTD.

-----------------
3.1 Start-tags ... [40] STag ... Name

The 'Name' of an element is called the "element type".  I found
this confusing.   It should be "element name".  The (structural)
type of the element is its content spec, not its name.  "Element
type" is used throughout the spec to mean the element's name.

The terminology for attributes is good.  The 'Name' of an
attribute is called the "attribute name" and the "attribute type"
is the type of the attribute's value.

-----------------
3.2.1 Element content ... [49] choice ... cp )* ... cp )*

The repetition for 'choice' should be ')+', otherwise '( Xxxx )'
is ambiguous (either 'choice' or 'seq').  To meet a "One meaning
one syntax" rule, '( Xxxx )' should be disallowed altogether.

----------------
3.2.2 Mixed content ... "types of child elements may be
constrained"

Should be "are constrained" as specified in '[39] ... Validity
Constraint: Element valid ... 3'.

-----------------
3.2.2 Mixed content ... [51] Mixed ... Name)* S?

It should be "Name)+" otherwise "( #PCDATA )*" is 'Mixed'.  There
should be only one representation for each option.

----------------
3.3.1 Attribute Types ... [56] ... 'ID'

Any document that uses 'ID' becomes invalid if appended to
another document that uses the same 'ID'.  

I believe that sites will use many XML documents and a small
number of DTDs/prologs.  Sites will combine multiple documents
for archive, distribution, etc.  A combination should itself be
an XML document.

Perhaps, add '<!IDNEW>' to indicate a new naming universe for
'ID' and 'IDREF'.

----------------------
3.3.1 Attribute Types ... [58] NotationType

It took me a while to figure out 'NOTATION'.  Some suggestions:

- move '[58] NotationType' from '[57] EnumeratedType' to '[54]
AttType' (a notation is not an enumeration).

- briefly define 'NotationType' in this section, along with an
example.  Notations are unusual for XML because a notation
modifies the meaning of another attribute value instead of the
element.  Perhaps "A notation specifies the contents of ENTITY
attributes in the same element."

- In '4.7 Notation Declaration' rewrite "Notation declarations
provide a name for the notation, for use in entity and attribute-
list declarations and in attribute specifications, and an
external identifier for the notation ..." and "XML processors
must provide applications with the name and external
identifier(s) of any notation declared and referred to in an
attribute value, attribute definition, or entity declaration." 
These sentences do not appear to match the grammar.  My guess is
that an XML processor must provide notation information for 1)
processing instructions with a notation target, 2) notation
attributes in an element, and 3) unparsed entities (ExternalID). 

-----------------------
3.3.1 Attribute Types ... [58] NotationType

The grammar allows multiple notations in one element
declaration.  I think there should be only one per element. 
Otherwise an application must "know" which notation goes with
which 'ENTITY' attribute.  If you want to allow alternate
notations, then '[73] EntityDef' should also allow multiple
notations.

-----------------------
4.3.1 the Text Declaration ... may each begin ... [77] ...
VersionInfo? EncodingDecl

I think this should be "must begin ... VersionInfo
EncodingDecl?".  'EncodingDecl' should be optional and
'VersionInfo' required.  As in section 2.8, I think that every
DTD should start with a version number in the same way that every
XML document starts with a version number.

------------------------
EOF

Received on Tuesday, 7 April 1998 10:34:56 UTC