- From: Biron,Paul V <Paul.V.Biron@kp.org>
- Date: Mon, 14 Aug 2000 14:24:37 -0700
- To: www-xml-schema-comments@w3.org
> -----Original Message----- > From: David Fallside/Santa Teresa/IBM [SMTP:fallside@us.ibm.com] > Sent: Monday, August 14, 2000 10:05 AM > To: www-xml-schema-comments@w3.org > Subject: validation of binary > > > It is unclear to me what is the validation of binary datatypes, both from > the point of view of how it is currently written in the spec, and from the > point of view of how the WG intends binary types to be validated. For > example, is a processor meant to throw an error when it encounters a > hex-encoded type whose value is something like 2t ? One argument against > requiring such errors are that large binary objects, such as those > obtained from a database, will be slow to process. The counter argument > also seems valid, i.e. that small binary objects (e.g. 128 byte keys) > should be checked, just like other datatypes. Can someone say definitively > what is intended by the spec? > When no length-related facets are specified, the intension is that the literal is should be checked against the lexial space (i.e., is it properly hex or base64 encoded), although it is not required (see section 4.2.12 [1]). The reasoning is that in the absense of length-related facets, there is really nothing that the schema processor is supposed to "semantically" validate, and as you mention, it is likely to be time consuming in many cases. Now, schema processors that want to check the lexicals can issue Warnings if they choose. When any of the length-related facets are specified (e.g., length [2], minLength [3] or maxLength [4]), the processors are required to validate that the raw binary data satisfies the specified facet(s). Therefore, the processor must produce the raw binary data by hex or base64 decoding the literal. If the literal is not "legal" for the particular lexical space (i.e., encoding), then the processor is likely to encounter an error when trying to decode it. Exactly what error the processor should report in this case it a good question. The processor should NOT report an error in the literal/lexical space, even tho that's where the problem is. So, it should probably just report an error in the length-related facet. The above is as described in the 2000-04-07 version of the datatypes draft. There is some possibility that this situtuation may change between now and PR, but nothing is certain. pvb References [1] http://www.w3.org/TR/2000/WD-xmlschema-2-20000407/#dc-encoding [2] http://www.w3.org/TR/2000/WD-xmlschema-2-20000407/#dt-length [3] http://www.w3.org/TR/2000/WD-xmlschema-2-20000407/#dt-minLength [4] http://www.w3.org/TR/2000/WD-xmlschema-2-20000407/#dt-maxLength
Received on Monday, 14 August 2000 19:02:28 UTC