W3C home > Mailing lists > Public > xmlschema-dev@w3.org > May 2003

Re: Are all values stored as strings?

From: <noah_mendelsohn@us.ibm.com>
Date: Mon, 12 May 2003 13:34:46 -0400
To: "Roger L. Costello" <costello@mitre.org>
Cc: "Costello,Roger L." <costello@mitre.org>, xmlschema-dev@w3.org
Message-ID: <OFAA2EF94C.402CDBAD-ON85256D24.0056AB9B@lotus.com>

Roger Costello:

>> is it correct that num's value (32) is always represented as a
"string", regardless of how num is declared?  That is, are all values
just strings, with a "datatype label" associated with the string?

I think it's fair to say that the Schema recommendation doesn't tell you 
how to "represent" things, any more than the XML recommendation tells you 
whether to use SAX or DOM, or whether to use UTF-8 or UTF-16 for the 
strings in your API.

The schema recommendation defines a relation on schemas and instances:  it 
basically tells you some information that you can discover in the course 
of an assessment.  In example, some of the things you can discover 
include:

* That the character children of <num> are the characters "3", "2"
* That the element has been validated by the type unSignedByte (Example 1) 
or numType (Example 2) respectively. 
* In the case of Example 1, the recommendation tells you that the base 
type for xsd:unSignedByte is xsd: decimal.  Crucially, it tells you that 
for a lexical form such as "3", "2" there is a corresponding abstract 
decimal value in the value space, which is the decimal number 32.  So, the 
recommendation is very clear that after the assessment you know both the 
characters and the corresponding value.  Whether you expose either or both 
in any particular API is up to you.  Note that the XML Query language 
(working drafts) let you deal with either or both.
* The story on the your numType (Example 2) follows a similar analysis. In 
this case, the base type is xsd:string.  While there is also a value space 
for this type, it is essentially in 1-to-1 correspondence with the lexical 
space.

If we consider the input documents:

<num>32</num>
and
<num>032</num>

they have different values in the value space for the string-like types, 
the same value in the decimal-based type.

>>  Let me ask it another way, is the value (32) represented by an XML
Schema validator as this:

Again, the recommendation doesn't tell a validator how to optimize its 
representations of anything.  We note that the integer, decimal, float, 
etc. types allow bounds checks such as maxInclusive.  If your integers are 
small enough, and the validator knows this, you can try storing them in a 
32 (or 16 or whatever) binary integer.  In the case of types like integer 
and decimal, you might also get away with bounds checks on the lexical 
forms.  In the case of float, this is unlikely.  Most validators will use 
IEEE binary notations to implement bounds checks on floats and doubles, 
and lexical forms to implement the pattern facet on floats and doubles. If 
your validator can find a better way, that's fine, as long as your results 
are as described by the recommendation. 

While we're at it., note that <xsd:enumeration> is on the value space. For 
integer types:

<xsd:enumeration>32</xsd:enumeration>

also matches 032.  If your base type is string, then "032" does not match. 
 Same for key/keyRef.  In general, the representations a processor will 
need internally is likely to determine the features actually used.  And so 
on.

Hope this helps.

------------------------------------------------------------------
Noah Mendelsohn                              Voice: 1-617-693-4036
IBM Corporation                                Fax: 1-617-693-8676
One Rogers Street
Cambridge, MA 02142
------------------------------------------------------------------







"Roger L. Costello" <costello@mitre.org>
Sent by: xmlschema-dev-request@w3.org
05/09/2003 04:50 PM

 
        To:     xmlschema-dev@w3.org, "Costello,Roger L." <costello@mitre.org>
        cc:     (bcc: Noah Mendelsohn/Cambridge/IBM)
        Subject:        Are all values stored as strings?



Hi Folks,

Consider these two ways of defining an element called "num":

-----------------------------------------------------------------
Version #1:

<xsd:element name="num" type="xsd:unsignedByte"/>

-----------------------------------------------------------------
Version #2:

<xsd:simpleType name="numType">
    <xsd:restriction base="xsd:string">
       <xsd:pattern 
          value="[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5]"/>
   </xsd:restriction>
</xsd:simpleType>

<xsd:element name="num" type="numType

-----------------------------------------------------------------
Now, here is an example of an instance of "num":

<num>32</num>

Question: is it correct that num's value (32) is always represented as a
"string", regardless of how num is declared?  That is, are all values
just strings, with a "datatype label" associated with the string?

Let me ask it another way, is the value (32) represented by an XML
Schema validator as this:

   0010 0000

if "num" is declared using Version #1

and like this: 

   0011 0010

if "num" is declared using Version #2?

/Roger
Received on Monday, 12 May 2003 13:44:15 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 11 January 2011 00:14:37 GMT