- From: Heiko Studt <studt@fmi.uni-passau.de>
- Date: Tue, 11 Sep 2007 12:03:43 +0200 (CEST)
- To: xmlschema-dev@w3.org
Hi, XSV seems to fail on xsd-allowed 'negated' regular expressions in patterns. This breaks the support of MPEG7:mimetype. (urn:mpeg:mpeg7:schema:2004) Allthough XSV documents lacks functionality in some parts of RegExp, this lack is not documented on its project page. The failing pattern follows (copied out of MPEG7 V2): --- <simpleType name="mimeType"> <restriction base="string"> <whiteSpace value="collapse"/> <pattern value='[!--[\(\)<>@,;:\\"/\[\]\?=]]+/[!--[\(\)<>@,;:\\"/\[\]\?=]]+'/> </restriction> </simpleType> --- Changed to the following, the pattern seems to work right again, but after a slept night I am not 100% sure wether it is the same semantically; MIME is defined in RFC 2045 (5.1 - MIME), but I don't see the special handling of ! ("!"). --- <simpleType name="mimeType"> <restriction base="string"> <whiteSpace value="collapse"/> <pattern value='(!|[^\(\)<>@,;:\\"/\[\]\?=])+/(!|[^\(\)<>@,;:\\"/\[\]\?=])+'/> </restriction> </simpleType> --- RFC 2045 MIME (Part 1): --- 5.1. Syntax of the Content-Type Header Field In the Augmented BNF notation of RFC 822, a Content-Type header field value is defined as follows: content := "Content-Type" ":" type "/" subtype *(";" parameter) ; Matching of media type and subtype ; is ALWAYS case-insensitive. type := discrete-type / composite-type discrete-type := "text" / "image" / "audio" / "video" / "application" / extension-token composite-type := "message" / "multipart" / extension-token extension-token := ietf-token / x-token ietf-token := <An extension token defined by a standards-track RFC and registered with IANA.> x-token := <The two characters "X-" or "x-" followed, with no intervening white space, by any token> subtype := extension-token / iana-token iana-token := <A publicly-defined extension token. Tokens of this form must be registered with IANA as specified in RFC 2048.> parameter := attribute "=" value attribute := token ; Matching of attributes ; is ALWAYS case-insensitive. value := token / quoted-string token := 1*<any (US-ASCII) CHAR except SPACE, CTLs, or tspecials> tspecials := "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / "\" / <"> "/" / "[" / "]" / "?" / "=" ; Must be in quoted-string, ; to use within parameter values --- According to the example of http://www.w3.org/TR/xmlschema-2/#rf-pattern the "-"-Syntax may work as negating (allthough it is unlikly following http://www.w3.org/TR/xmlschema-2/#charcter-classes). --- <simpleType name='better-us-zipcode'> <restriction base='string'> <pattern value='[0-9]{5}(-[0-9]{4})?'/> </restriction> </simpleType> --- A simple fix for this part (while I don't see wether the charClassSub will work afterwards), may be to replace every -[ into [^ if it is preceed by [ or (. This will not solve the issue with MPEG7 completly. Are those things true, known or perhaps even solved somewhere? -- MFG Hopefully I have written down clearly everything needed. Heiko Studt <studt@fmi.uni-passau.de>
Received on Tuesday, 11 September 2007 11:38:09 UTC