RE: Finding News Taxonomies [was: RE: Towards a TAG consideration of CURIEs]

John,

I'm confused by your point #3 below, as it seems to be implying that a
document without a DTD could legitimately have an attribute of type ID
with value "123456", and after looking at the specs I don't see how it
can.  Did I miss something?  Detailed analysis below.

> From: John Cowan
> 
> Misha Wolf scripsit:
> 
> > This:
> >    http://www.iptc.org/docs/newscodes.html#123456
> > is not legal, as "123456" is an illegal fragment identifier.
> 
> Not exactly.  We can decompose this into three claims, two false
> and one true.
> 
> 1) "123456" is an invalid fragment: false.  If you look at the
> syntax rules in RFC 3986, you see that every character in a fragment
> can be a digit.
> 
> 2) "123456" can't be the value of an XML attribute of type ID: false.
> An XML document may contain attributes of type ID in one of two
> ways: every attribute with the name "xml:id" is of type ID, and so
> is any attribute declared in the DTD (internal or external) to have
> type ID.  Such attributes may contain any value, and the document is
> well-formed.
> 
> 3) "123456" can't be the value of an attribute of type ID in a
> *valid* XML document: true.  However, plenty of documents are not
> valid: in particular, any document without a DTD is not valid, and
> there is nothing wrong with having a DTD without expecting or 
> requiring validity.

I don't see how it can.  According to claim #2 above, there are two ways
an attribute can have a value of type ID: either by being declared to be
type ID in a DTD, or by having the the special name "xml:id".  If the
document does not have a DTD, then the first of these two ways is
eliminated, so claim #3 seems to be implying that a document without a
DTD could have an attribute of type ID if the attribute has the special
name "xml:id".  I presume your intent is that such an attribute would be
considered to be type ID by virtue of conforming to the xml:id Version
1.0 specification:
http://www.w3.org/TR/xml-id/

But the conformance section in that document specifies:
http://www.w3.org/TR/xml-id/#xmlid-conformance
[[
Conformance to xml:id for applications that rely on non-validating XML
processors is defined by the recognition of xml:id attributes as
explained in 4 Processing xml:id Attributes and by conformance to the
constraints of this specification.

Conformance to constraints that "must" be assured is mandatory. It is
recommended that applications assure the other constraints as well. This
specification defines no simply optional constraints.
]]

Section 4 "Processing xml:id Attributes" then explains the processing
requirements:
http://www.w3.org/TR/xml-id/#processing
[[
An xml:id processor must assure that the following constraints hold for
all xml:id attributes:

    * The normalized value of the attribute is an NCName according to
the Namespaces in XML Recommendation which has the same version as the
document in which this attribute occurs (NCName for XML 1.0, or NCName
for XML 1.1).
]]

And the relevant portions of NCName for XML 1.0 and NCName for XML 1.1
are:
http://www.w3.org/TR/REC-xml-names/#NT-NCName
[[
[4]   	NCName	   ::=   	NCNameStartChar NCNameChar*	/* An
XML Name, minus the ":" */
[5]   	NCNameChar	   ::=   	NameChar - ':'
[6]   	NCNameStartChar	   ::=   	Letter | '_'
]]
and
http://www.w3.org/TR/xml-names11/#NT-NCName
[[
[4]   	NCName	   ::=   	NCNameStartChar NCNameChar*	/* An
XML Name, minus the ":" */
[5]   	NCNameChar	   ::=   	NameChar - ':'
[6]   	NCNameStartChar	   ::=   	NameStartChar - ':'
]]
and
http://www.w3.org/TR/xml11/#NT-NameStartChar
[[
[4]   	NameStartChar	   ::=   	":" | [A-Z] | "_" | [a-z] 
			| [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] |
[#x370-#x37D] 
			| [#x37F-#x1FFF] | [#x200C-#x200D] |
[#x2070-#x218F] 
			| [#x2C00-#x2FEF] | [#x3001-#xD7FF] |
[#xF900-#xFDCF] 
			| [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]
]]

Thus, since neither the XML 1.0 nor XML 1.1 spec permit [0-9] as the
first character, and the xml:id spec requires conformance to those specs
even for non-validating XML processors, I don't see how an attribute
value of "123456" could  be considered to be of type ID.

David Booth, Ph.D.
HP Software
+1 617 629 8881 office  |  dbooth@hp.com
http://www.hp.com/go/software 

Received on Monday, 9 April 2007 20:52:56 UTC