Re: Specifying Datatype Atoms in Regular Expressions

Without going into details of your proposal, I can tell you that many 
members of the schema workgroup believe that trying to put too much 
"structural smarts" into string-like datatypes is (a) a slippery slope and 
(b) tends to lead to misuse of markup.  No doubt:


        <room width="5 feet" length="8 meters"/>

can be a convenient notation, but:

        <room>
                <width units="feet">5</width>
                <length units=" meters ">8</length>
        </room>

is arguably much better markup.  Consider, for example, writing a 
stylesheet or integrating these values into a database: it is much easier 
to see how one deals robustly with the latter form.  Correspondingly, the 
slippery slope is that we wind up having to invent a schema language with 
lots of conceptual duplication for managing structure in both the simple 
and complex types.  When you have data with nested structure, you should 
seriously consider using explicit structure (i.e. complex types).   That 
is not a statement particularly about schemas, it is a statement about 
good use of XML itself.

Indeed, based on this argument there was resistance in the group to even 
including the "list" types that we do have, but on balance a preponderance 
of the workgroup felt that it was useful and also necessary to model 
existing constructs such as NMTOKENS.  While there is indeed the 
opportunity to add features in future versions, there is also some strong 
sentiment among individual members of the group (I can't say how many) to 
avoid further descent down this particular slippery slope.

The obvious concern with the notation recommended above is that it is more 
verbose and less convenient for manual entry.  The XML Recommendation 
itself is very clear that [1]: "Terseness in XML markup is of minimal 
importance."  While individual cases require judgment, it seems a mistake 
in general to try and use schemas to undo this stylistic decision.  Of 
course, nothing prevents you from creating string types with patterns in 
XML schemas, but I hope the above explains why we had not gone much 
further than that.  Thank you.

[1] http://www.w3.org/TR/REC-xml#sec-origin-goals

------------------------------------------------------------------------
Noah Mendelsohn                                    Voice: 1-617-693-4036
Lotus Development Corp.                            Fax: 1-617-693-8676
One Rogers Street
Cambridge, MA 02142
------------------------------------------------------------------------

Received on Monday, 9 April 2001 16:38:24 UTC