- From: Michael Kay <mike@saxonica.com>
- Date: Sat, 9 Jun 2007 22:20:58 +0100
- To: "'Dan Maharry'" <dan@mcd.coop>, <xmlschema-dev@w3.org>
Personal response. > All I did was try to write a small set of extension methods > to validate whether a given string was valid according to the > built-in schema string types and the editor in me comes out > and starts nit picking. The W3C Schema docs are very good but > sometimes annoyingly ambiguous without a degree in lateral thinking. You are right. In particular, there is a tendency in the schema specifications to use language that looks formal and precise and technical, but actually cannot be understood without reading the mind of the editor. The use of the adjective "finite-length" is a case in point. I think this adjective is vacuous. It's probably there because the author was struggling to define "character string" in some way other than saying it's a string of characters. I'm surprised to see that there are dictionaries that define "finite" to exclude zero, because in my experience mathematicians have always used "finite" to mean "not infinite", and zero is definitely not infinite. (I pointed out some while ago that it would be hard to write a test case to demonstrate that a processor rejects an infinite-length string). > > Problem #2 : In which string data types is "" invalid? > > The problem with the note about sets is that it states a type > must explicitly rule the empty string as invalid before it > really is invalid. > But what about it being implied elsewhere but not in black > and white as, say the value space of the NMTOKENS type? > > NMTOKENS represents the NMTOKENS attribute type from [XML 1.0 > (Second Edition)]. The *value space* of NMTOKENS is the set > of finite, non-zero-length sequences of *NMTOKEN*s I can't see your problem here. An NMTOKEN cannot be a zero-length string because the XML 1.0 grammar rules it out, quite explicitly. And an NMTOKENS cannot be a zero-length sequence of NMTOKEN values because the adjective "non-zero-length" rules it out, again quite explicitly. > > Problem #3 : Colons or not? > ... > > IDREF represents the IDREF attribute type from [XML 1.0 > (Second Edition)]. The *value space* of IDREF is the set of > all strings that > *match* the NCName production in [Namespaces in XML]. The *lexical > space* of IDREF is the set of strings that *match* the NCName > production in [Namespaces in XML]. I think the first sentence is just trying to be a helpful introduction. It doesn't say anything normative. It's qualified by the more precise statements in the second and third sentences. I agree this isn't good spec writing. It's often useful to explain the background or to give a summary of the purpose of the construct but ideally one should distinguish carefully between that kind of expository material and the formal definition. Many specs fail to achieve this balance between helpfulness and precision, and it's a tough one to get right: editors will get flak on this whatever they do. Probably one of the particular difficulties with moving the schema specs forward is that there are much bigger problems than these demanding the attention of the WG, and the WG has very limited resources: for a spec that is so widely used and implemented, and of such critical importance to the industry, the actual number of people working on the project is tiny. I recently joined the group because I came to the realization that they simply didn't have the resources to deal with the bugs that I was submitting, and that the only way to get a better spec would be to join in the effort. However, for minor comments like these, the best approach is to enter a bug report - one per problem - in the bugzilla database. Michael Kay http://www.saxonica.com/
Received on Saturday, 9 June 2007 21:21:06 UTC