Re: white space in xsd:hexBinary

On 16 Jan 2012, at 20:20, C. M. Sperberg-McQueen wrote:

> 
> On Jan 16, 2012, at 10:15 AM, Henry Story wrote:
> 
>> Dear XML Schema working Group,
>> 
>> From reading the latest XML Schema spec (which is a big improvement over the previous one!) it seems that it is not possible to put white spaces inside an xsd:hexBinary. I read the text here 
>> 
>> http://www.w3.org/TR/xmlschema11-2/#hexBinary
>> 
>> "[the lexical space of] hexBinary is the same as that recognized by the regular 
>> expression '([0-9a-fA-F]{2})*'."
>> 
>> I was looking for confirmation that that is the correct reading first of all. There is a white space collapse facet which I suppose is meant to remove leading and trailing spaces, but not spaces inside the number.
> 
> The value 'collapse' in the whitespace facet replaces internal sequences
> of whitespace characters with single blank characters.  So it may reduce
> internal whitespace but does not eliminate it.  You are right then to say
> it won't remove spaces inside the literal.
> 
>> Then secondly I was looking to see if there were reasons this was done like this. After all a hexBinary could and usually is a very very long string, and so it is likely to be difficult to read if it cannot be cut up a little bit. It is also very likely that white spaces should enter into such a long number by mistake as people copy and paste information from one system to another, in what could be normal human processing tasks. 
>> 
>> I imagine this rule would make sense if it were possible in some XML formats to use the xsd:hexBinary datatype and have it be followed by a set of hexBinaries each separated by a space.
>> 
>> But in formats that use this datatype that are RDF driven, such as RDF/XML, Turtle, RDFa and so on, this is not the case. Those formats require there to only be 1 binary, so there is really nothing that the spaces can separate. 
>> 
>> ...
>> 
>> But it just seems quite likely that people will end up putting white spaces in there somewhere. Should parsers reject those immediately? And if so why?
> 
> 
> Thank you; this is an excellent question.
> 
> Without checking the decision records for XSD 1.0, I do not myself recall whether the
> option of allowing whitespace within the lexical forms of hexBinary was discussed or not,
> and if discussed what reasoning led us to forbid it.  The topic clearly did come up in
> connection with base64, which explicitly includes whitespace in its lexical space, so 
> the WG ought in principle to have considered the question of whitespace for hexBinary.
> 
> One difference is that base64 is defined by an RFC (by several, in fact, by now) which
> discusses the inclusion of whitespace, while hexBinary does not have a similarly
> prominent definition elsewhere.  (Which means:  maybe we just overlooked the problem?)
> 
> For usability, it seems to me (speaking solely for myself) that allowing whitespace in 
> the lexical forms of xsd:hexBinary (or perhaps better, adding a value 'suppress' for
> the whitespace facet, which simply suppresses all whitespace) would be an improvement.  
> Unfortunately, a change seems likely to be very difficult, given that XSD 1.0 appears 
> to be unambiguous in excluding whitespace from the definition of the lexical space 
> for this type, so allowing whitespace now would introduce an incompatibility with 
> version 1.0 of the spec.  The history of the WG's discussions of questions of this 
> kind makes me think it likely that such an incompatibility might lead immediately to 
> formal objections.  Changes are difficult at this point in any case, since a Candidate 
> Recommendation draft of XSD 1.1 was published some time ago and any substantive 
> changes now would mean substantial delays in completing the XSD 1.1 spec.

Ah that's a pitty. 

Still if one could find out what the reasons might be, and if none were found to be that serious, one could perhaps have more lenient parsers get rid of those whitespaces in the context of RDF/XML. 


> 
> Two work-arounds occur to me.
> 
> One is very ugly and probably won't actually help most of the users you are concerned with:
> introduce whitespace by means of comments.  Using this workaround, your example might
> look like this:
> 
> <#me> a foaf:Person;
> foaf:name "Bob";
> foaf:knows <https://example.edu/p/Alois#MSc>;
> foaf:weblog <http://bob.example/blog>;
> cert:key [ a cert:RSAPublicKey;
>  rdfs:label "made on 23 November 2011 on my laptop";
>  cert:modulus "cb24ed85d64d794b69c701c186acc059501e856000f661c9<!--
>   -->3204d8380e07191c5c8b368d2ac32a428acb970398664368<!--
>   -->dc2a867320220f755e99ca2eecdae62e8d15fb58e1b76ae5<!--
>   -->9cb7ace8838394d59e7250b449176e51a494951a1c366c62<!--
>   -->17d8768d682dde78dd4d55e613f8839cf275d4c8403743e7<!--
>   -->862601f3c49a6366e12bb8f498262c3c77de19bce40b32f8<!--
>   -->9ae62c3780f5b6275be337e2b3153ae2ba72a9975ae71ab7<!--
>   -->24649497066b660fcf774b7543d980952d2e8586200eda41<!--
>   -->58b014e75465d91ecf93efc7ac170c11fc7246fc6ded79c3<!--
>   -->7780000ac4e079f671fd4f207ad770809e0e2d7b0ef5493b<!--
>   -->efe73544d8e1be3dddb52455c61391a1"^^xsd:hexBinary;
>  cert:exponent 65537 ;
> ] .

This would only work in the case of RDF/XML, though not in the case of Turtle, which is just being used there :-) But that is not really the type of error I think we are going to see.

The odd thing is that we have essentially xsd:hexBinary that was defined for XML, which founds its way into RDF, and so leapt into non XML languages such as Turtle and RDFa for html. 

But even so, it seems to me that the parsing is too fragile even for XML. It's too easy to make a mistake there.

> The second workaround is simpler:  use base64Binary, not hexBinary.
> For whatever reason, base64Binary is defined to include whitespace
> in its lexical space.

yes, base64Binary is available, but since all certificate tools tend to publish these
numbers in hexadecimal it is a bit difficult for humans to do comparisons in base64. One does not gain that much space either when doing that. But semantically we allow it.



> 
> I hope this helps.  
> 
> Michael Sperberg-McQueen
> 
> -- 
> ****************************************************************
> * C. M. Sperberg-McQueen, Black Mesa Technologies LLC
> * http://www.blackmesatech.com 
> * http://cmsmcq.com/mib                 
> * http://balisage.net
> ****************************************************************
> 
> 
> 
> 

Social Web Architect
http://bblfish.net/

Received on Monday, 16 January 2012 19:42:38 UTC