- From: Alexandre Alapetite <alexandre@alapetite.net>
- Date: Thu, 6 Jul 2006 15:26:10 +0200
- To: <www-html-editor@w3.org>
Dear HTML editors, One of the advantages of XML Schema over DTDs is the possibility to verify the validity of attributes with regular expression patterns. 1) In the current working draft specification for "XHTML Modularization 1.1", chapter 4.3 on "Attribute Types", the MultiLengths datatype is defined as "A comma separated list of items of type MultiLength" [http://www.w3.org/TR/2006/WD-xhtml-modularization-20060705/abstraction.html#dt_MultiLengths]. While the "MultiLength" (without 's') is carefully defined in the "XML Schema datatypes module for XHTML" [http://www.w3.org/TR/2006/WD-xhtml-modularization-20060705/SCHEMA/xhtml-datatypes-1.xsd], the "MultiLengths" type is still only defined as a banal string, which means that this lazy validation allows almost anything and does not check for constraints required by the specification. Lines 120-123 of xhtml-datatypes-1.xsd: <!-- comma-separated list of MultiLength --> <xs:simpleType name="MultiLengths"> <xs:restriction base="xs:string"/> </xs:simpleType> A proposition has been made for a more accurate pattern. See [http://lists.w3.org/Archives/Public/www-html/2006Jun/0033.html] and [http://lists.w3.org/Archives/Public/www-html/2006Jun/0031.html] for a proposition of improvement, also reported bellow: <xs:simpleType name="MultiLengths"> <xs:annotation> <xs:documentation> comma-separated list of MultiLength </xs:documentation> </xs:annotation> <xs:restriction base="xs:string"> <xs:pattern value="([+-]?(\d+|\d+(\.\d+)?%)|([1-9]\d*)*\*)(,\s*([+-]?(\d+|\d+(\.\d+)?%)|([1-9]\d*)*\*))*"/> </xs:restriction> </xs:simpleType> 2) Similarly, the datatypes "ContentType" ("A comma-separated list of media types, as per [RFC2045]") and "ContentTypes" ("A media type, as per [RFC2045]") are also defined as banal strings, while some basic validation could be done. RFC2045 does provide with a BNF. The patterns should of course not list all the possible IANA types [http://www.iana.org/assignments/media-types/], but check at least for some minimal syntax integrity. A quickly written proposition (to be tested) only aimed to be illustrative: ContentType: "([xX]-[a-zA-Z0-9_.+-]+|[a-zA-Z]+)/[a-zA-Z0-9_.+-]+" ContentTypes: "(([xX]-[a-zA-Z0-9_.+-]+|[a-zA-Z]+)/[a-zA-Z0-9_.+-]+)(,\s*(([xX]-[a-zA-Z0-9_.+-]+|[a-zA-Z]+)/[a-zA-Z0-9_.+-]+))*" 3) Similarly again, the datatype "Charset" ("A character encoding, as per [RFC2045]") should be more strict than a simple string. Cordially, Alexandre http://alexandre.alapetite.net
Received on Thursday, 6 July 2006 13:26:26 UTC