- From: Heiko Studt <studt@fmi.uni-passau.de>
- Date: Tue, 11 Sep 2007 12:03:43 +0200 (CEST)
- To: xmlschema-dev@w3.org
Hi,
XSV seems to fail on xsd-allowed 'negated' regular expressions in
patterns. This breaks the support of MPEG7:mimetype. (urn:mpeg:mpeg7:schema:2004)
Allthough XSV documents lacks functionality in some parts of RegExp,
this lack is not documented on its project page.
The failing pattern follows (copied out of MPEG7 V2):
---
<simpleType name="mimeType">
<restriction base="string">
<whiteSpace value="collapse"/>
<pattern
value='[!--[\(\)<>@,;:\\"/\[\]\?=]]+/[!--[\(\)<>@,;:\\"/\[\]\?=]]+'/>
</restriction>
</simpleType>
---
Changed to the following, the pattern seems to work right again, but
after a slept night I am not 100% sure wether it is the same
semantically; MIME is defined in RFC 2045 (5.1 - MIME), but I don't
see the special handling of ! ("!").
---
<simpleType name="mimeType">
<restriction base="string">
<whiteSpace value="collapse"/>
<pattern
value='(!|[^\(\)<>@,;:\\"/\[\]\?=])+/(!|[^\(\)<>@,;:\\"/\[\]\?=])+'/>
</restriction>
</simpleType>
---
RFC 2045 MIME (Part 1):
---
5.1. Syntax of the Content-Type Header Field
In the Augmented BNF notation of RFC 822, a Content-Type header field
value is defined as follows:
content := "Content-Type" ":" type "/" subtype
*(";" parameter)
; Matching of media type and subtype
; is ALWAYS case-insensitive.
type := discrete-type / composite-type
discrete-type := "text" / "image" / "audio" / "video" /
"application" / extension-token
composite-type := "message" / "multipart" / extension-token
extension-token := ietf-token / x-token
ietf-token := <An extension token defined by a
standards-track RFC and registered
with IANA.>
x-token := <The two characters "X-" or "x-" followed, with
no intervening white space, by any token>
subtype := extension-token / iana-token
iana-token := <A publicly-defined extension token. Tokens
of this form must be registered with IANA
as specified in RFC 2048.>
parameter := attribute "=" value
attribute := token
; Matching of attributes
; is ALWAYS case-insensitive.
value := token / quoted-string
token := 1*<any (US-ASCII) CHAR except SPACE, CTLs,
or tspecials>
tspecials := "(" / ")" / "<" / ">" / "@" /
"," / ";" / ":" / "\" / <">
"/" / "[" / "]" / "?" / "="
; Must be in quoted-string,
; to use within parameter values
---
According to the example of http://www.w3.org/TR/xmlschema-2/#rf-pattern
the "-"-Syntax may work as negating (allthough it is unlikly following
http://www.w3.org/TR/xmlschema-2/#charcter-classes).
---
<simpleType name='better-us-zipcode'>
<restriction base='string'>
<pattern value='[0-9]{5}(-[0-9]{4})?'/>
</restriction>
</simpleType>
---
A simple fix for this part (while I don't see wether the charClassSub
will work afterwards), may be to replace every -[ into [^ if it is
preceed by [ or (. This will not solve the issue with MPEG7 completly.
Are those things true, known or perhaps even solved somewhere?
--
MFG
Hopefully I have written down clearly everything needed.
Heiko Studt <studt@fmi.uni-passau.de>
Received on Tuesday, 11 September 2007 11:38:09 UTC