RE: Comments on restriced charset and whitespaces

Dear Youenn,
 
Thank you for your comment on the EXI specification. You are right that the
EXI restricted character set generation mechanism does not automatically add
whitespace characters to the restricted character set. This is intentional. 
 
A restricted character set will include whitespace characters only if those
characters are specified by the regular expression. Whitespace characters
are not automatically added to a restricted character set because they are
not needed to represent schema-valid values. 
 
If a schema type derived from xsd:string has a pattern facet and a
particular instance value includes leading or trailing whitespace characters
that are not matched by that pattern facet, the instance value is
schema-invalid. For example, given the following schema declaration:
 
<xsd:element name="ws">

      <xsd:simpleType>

            <xsd:restriction base="xsd:string">

                  <xsd:pattern value="abc|def"/>

            </xsd:restriction>

      </xsd:simpleType>

</xsd:element>


The following instance value is schema-invalid:
 
    <ws> abc </ws>

    
because it contains whitespace characters that are not permitted by the
pattern facet. Removing the whitespace characters will fix this validity
problem.
 
I hope this makes sense. Please let me know if you have follow-up questions.
 
    Best wishes,
 
    John
 
AgileDelta, Inc.
 <mailto:john.schneider@agiledelta.com> john.schneider@agiledelta.com
 <http://www.agiledelta.com/> http://www.agiledelta.com
 


  _____  

From: public-exi-request@w3.org [mailto:public-exi-request@w3.org] On Behalf
Of FABLET Youenn
Sent: Monday, August 24, 2009 7:52 AM
To: 'public-exi@w3.org'
Subject: Comments on restriced charset and whitespaces



Dear all, 

 

I have the following comment regarding the EXI specification.



The current regular expression generation mechanism does not seem to add
automatically whitespaces in the character set.
This means that whitespaces (if not present in the regular expression) will
not appear in the character set although they may actually appear in the
string.
The solution seems to encode these whitespaces using the 'extended' symbol N
+ the actual whitespace symbol.

Another approach, used for built-in types in preserveLexical="true" is to
add whitespaces within the character set, although these whitespaces could
also be encoded using the previous strategy.

 

I am wondering whether that 'difference in behavior' is a conscious decision
from the WG and whether there is a particular rationale behind this.

 

Regards,

                Youenn

 

Received on Wednesday, 16 September 2009 17:22:51 UTC