- From: John Schneider <john.schneider@agiledelta.com>
- Date: Wed, 16 Sep 2009 10:22:09 -0700
- To: "'FABLET Youenn'" <Youenn.Fablet@crf.canon.fr>, <public-exi@w3.org>
- Message-ID: <CB246093A5374862BA3AACF2D123DBB2@jcsdell8600>
Dear Youenn, Thank you for your comment on the EXI specification. You are right that the EXI restricted character set generation mechanism does not automatically add whitespace characters to the restricted character set. This is intentional. A restricted character set will include whitespace characters only if those characters are specified by the regular expression. Whitespace characters are not automatically added to a restricted character set because they are not needed to represent schema-valid values. If a schema type derived from xsd:string has a pattern facet and a particular instance value includes leading or trailing whitespace characters that are not matched by that pattern facet, the instance value is schema-invalid. For example, given the following schema declaration: <xsd:element name="ws"> <xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:pattern value="abc|def"/> </xsd:restriction> </xsd:simpleType> </xsd:element> The following instance value is schema-invalid: <ws> abc </ws> because it contains whitespace characters that are not permitted by the pattern facet. Removing the whitespace characters will fix this validity problem. I hope this makes sense. Please let me know if you have follow-up questions. Best wishes, John AgileDelta, Inc. <mailto:john.schneider@agiledelta.com> john.schneider@agiledelta.com <http://www.agiledelta.com/> http://www.agiledelta.com _____ From: public-exi-request@w3.org [mailto:public-exi-request@w3.org] On Behalf Of FABLET Youenn Sent: Monday, August 24, 2009 7:52 AM To: 'public-exi@w3.org' Subject: Comments on restriced charset and whitespaces Dear all, I have the following comment regarding the EXI specification. The current regular expression generation mechanism does not seem to add automatically whitespaces in the character set. This means that whitespaces (if not present in the regular expression) will not appear in the character set although they may actually appear in the string. The solution seems to encode these whitespaces using the 'extended' symbol N + the actual whitespace symbol. Another approach, used for built-in types in preserveLexical="true" is to add whitespaces within the character set, although these whitespaces could also be encoded using the previous strategy. I am wondering whether that 'difference in behavior' is a conscious decision from the WG and whether there is a particular rationale behind this. Regards, Youenn
Received on Wednesday, 16 September 2009 17:22:51 UTC