W3C home > Mailing lists > Public > www-html-editor@w3.org > July to September 2006

XHTML Modularization 1.1: Lazy datatype patterns in XML Schema

From: Alexandre Alapetite <alexandre@alapetite.net>
Date: Thu, 6 Jul 2006 15:26:10 +0200
To: <www-html-editor@w3.org>
Message-ID: <000401c6a0ff$bebed580$f9043f50@athlon1100>

Dear HTML editors,

One of the advantages of XML Schema over DTDs is the possibility to verify the validity of attributes with regular expression

1) In the current working draft specification for "XHTML Modularization 1.1", chapter 4.3 on "Attribute Types", the MultiLengths
datatype is defined as "A comma separated list of items of type MultiLength"

While the "MultiLength" (without 's') is carefully defined in the "XML Schema datatypes module for XHTML"
the "MultiLengths" type is still only defined as a banal string, which means that this lazy validation allows almost anything
and does not check for constraints required by the specification.

Lines 120-123 of xhtml-datatypes-1.xsd:

 <!-- comma-separated list of MultiLength -->
 <xs:simpleType name="MultiLengths">
   <xs:restriction base="xs:string"/>

A proposition has been made for a more accurate pattern. See [http://lists.w3.org/Archives/Public/www-html/2006Jun/0033.html]
[http://lists.w3.org/Archives/Public/www-html/2006Jun/0031.html] for a proposition of improvement, also reported bellow:

 <xs:simpleType name="MultiLengths">
       comma-separated list of MultiLength
   <xs:restriction base="xs:string">

2) Similarly, the datatypes "ContentType" ("A comma-separated list of media types, as per [RFC2045]") and "ContentTypes" ("A
media type, as per [RFC2045]") are also defined as banal strings, while some basic validation could be done. RFC2045 does
provide with a BNF.

The patterns should of course not list all the possible IANA types [http://www.iana.org/assignments/media-types/], but check at
least for some minimal syntax integrity.

A quickly written proposition (to be tested) only aimed to be illustrative:

ContentType: "([xX]-[a-zA-Z0-9_.+-]+|[a-zA-Z]+)/[a-zA-Z0-9_.+-]+"

ContentTypes: "(([xX]-[a-zA-Z0-9_.+-]+|[a-zA-Z]+)/[a-zA-Z0-9_.+-]+)(,\s*(([xX]-[a-zA-Z0-9_.+-]+|[a-zA-Z]+)/[a-zA-Z0-9_.+-]+))*"

3) Similarly again, the datatype "Charset" ("A character encoding, as per [RFC2045]") should be more strict than a simple

Received on Thursday, 6 July 2006 13:26:26 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:08:55 UTC