RE: validation of binary from Biron,Paul V on 2000-08-14 (www-xml-schema-comments@w3.org from July to September 2000)

From: Biron,Paul V <Paul.V.Biron@kp.org>
Date: Mon, 14 Aug 2000 14:24:37 -0700
To: www-xml-schema-comments@w3.org
Message-Id: <376E771642C1D2118DC300805FEAAF4386DD99@pars-exch-1.ca.kp.org>

> -----Original Message-----
> From:	David Fallside/Santa Teresa/IBM [SMTP:fallside@us.ibm.com]
> Sent:	Monday, August 14, 2000 10:05 AM
> To:	www-xml-schema-comments@w3.org
> Subject:	validation of binary
> 
> 
> It is unclear to me what is the validation of binary datatypes, both from
> the point of view of how it is currently written in the spec, and from the
> point of view of how the WG intends binary types to be validated. For
> example, is a processor meant to throw an error when it encounters a
> hex-encoded type whose value is something like 2t ? One argument against
> requiring such errors are that large binary objects, such as those
> obtained from a database, will be slow to process. The counter argument
> also seems valid, i.e. that small binary objects (e.g. 128 byte keys)
> should be checked, just like other datatypes. Can someone say definitively
> what is intended by the spec?
> 
When no length-related facets are specified, the intension is that the
literal is should be checked against the lexial space (i.e., is it properly
hex or base64 encoded), although it is not required (see section 4.2.12
[1]).  The reasoning is that in the absense of length-related facets, there
is really nothing that the schema processor is supposed to "semantically"
validate, and as you mention, it is likely to be time consuming in many
cases.  Now, schema processors that want to check the lexicals can issue
Warnings if they choose.

When any of the length-related facets are specified (e.g., length [2],
minLength [3] or maxLength [4]), the processors are required to validate
that the raw binary data satisfies the specified facet(s).  Therefore, the
processor must produce the raw binary data by hex or base64 decoding the
literal.  If the literal is not "legal" for the particular lexical space
(i.e., encoding), then the processor is likely to encounter an error when
trying to decode it.  Exactly what error the processor should report in this
case it a good question.  The processor should NOT report an error in the
literal/lexical space, even tho that's where the problem is.  So, it should
probably just report an error in the length-related facet.

The above is as described in the 2000-04-07 version of the datatypes draft.
There is some possibility that this situtuation may change between now and
PR, but nothing is certain.

pvb

References
[1] http://www.w3.org/TR/2000/WD-xmlschema-2-20000407/#dc-encoding
[2] http://www.w3.org/TR/2000/WD-xmlschema-2-20000407/#dt-length
[3] http://www.w3.org/TR/2000/WD-xmlschema-2-20000407/#dt-minLength
[4] http://www.w3.org/TR/2000/WD-xmlschema-2-20000407/#dt-maxLength

Received on Monday, 14 August 2000 19:02:28 UTC