W3C home > Mailing lists > Public > www-xml-schema-comments@w3.org > January to March 2012

Re: white space in xsd:hexBinary

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Mon, 16 Jan 2012 12:20:54 -0700
Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, www-xml-schema-comments@w3.org
Message-Id: <970DC3A6-F8C3-488D-A714-42E6C87D3305@blackmesatech.com>
To: Henry Story <henry.story@bblfish.net>

On Jan 16, 2012, at 10:15 AM, Henry Story wrote:

> Dear XML Schema working Group,
> From reading the latest XML Schema spec (which is a big improvement over the previous one!) it seems that it is not possible to put white spaces inside an xsd:hexBinary. I read the text here 
>  http://www.w3.org/TR/xmlschema11-2/#hexBinary
> "[the lexical space of] hexBinary is the same as that recognized by the regular 
>  expression '([0-9a-fA-F]{2})*'."
> I was looking for confirmation that that is the correct reading first of all. There is a white space collapse facet which I suppose is meant to remove leading and trailing spaces, but not spaces inside the number.

The value 'collapse' in the whitespace facet replaces internal sequences
of whitespace characters with single blank characters.  So it may reduce
internal whitespace but does not eliminate it.  You are right then to say
it won't remove spaces inside the literal.

> Then secondly I was looking to see if there were reasons this was done like this. After all a hexBinary could and usually is a very very long string, and so it is likely to be difficult to read if it cannot be cut up a little bit. It is also very likely that white spaces should enter into such a long number by mistake as people copy and paste information from one system to another, in what could be normal human processing tasks. 
> I imagine this rule would make sense if it were possible in some XML formats to use the xsd:hexBinary datatype and have it be followed by a set of hexBinaries each separated by a space.
> But in formats that use this datatype that are RDF driven, such as RDF/XML, Turtle, RDFa and so on, this is not the case. Those formats require there to only be 1 binary, so there is really nothing that the spaces can separate. 
> ...
> But it just seems quite likely that people will end up putting white spaces in there somewhere. Should parsers reject those immediately? And if so why?

Thank you; this is an excellent question.

Without checking the decision records for XSD 1.0, I do not myself recall whether the
option of allowing whitespace within the lexical forms of hexBinary was discussed or not,
and if discussed what reasoning led us to forbid it.  The topic clearly did come up in
connection with base64, which explicitly includes whitespace in its lexical space, so 
the WG ought in principle to have considered the question of whitespace for hexBinary.

One difference is that base64 is defined by an RFC (by several, in fact, by now) which
discusses the inclusion of whitespace, while hexBinary does not have a similarly
prominent definition elsewhere.  (Which means:  maybe we just overlooked the problem?)

For usability, it seems to me (speaking solely for myself) that allowing whitespace in 
the lexical forms of xsd:hexBinary (or perhaps better, adding a value 'suppress' for
the whitespace facet, which simply suppresses all whitespace) would be an improvement.  
Unfortunately, a change seems likely to be very difficult, given that XSD 1.0 appears 
to be unambiguous in excluding whitespace from the definition of the lexical space 
for this type, so allowing whitespace now would introduce an incompatibility with 
version 1.0 of the spec.  The history of the WG's discussions of questions of this 
kind makes me think it likely that such an incompatibility might lead immediately to 
formal objections.  Changes are difficult at this point in any case, since a Candidate 
Recommendation draft of XSD 1.1 was published some time ago and any substantive 
changes now would mean substantial delays in completing the XSD 1.1 spec.

Two work-arounds occur to me.

One is very ugly and probably won't actually help most of the users you are concerned with:
introduce whitespace by means of comments.  Using this workaround, your example might
look like this:

<#me> a foaf:Person;
foaf:name "Bob";
foaf:knows <https://example.edu/p/Alois#MSc>;
foaf:weblog <http://bob.example/blog>;
cert:key [ a cert:RSAPublicKey;
  rdfs:label "made on 23 November 2011 on my laptop";
  cert:modulus "cb24ed85d64d794b69c701c186acc059501e856000f661c9<!--
  cert:exponent 65537 ;
 ] .

The second workaround is simpler:  use base64Binary, not hexBinary.
For whatever reason, base64Binary is defined to include whitespace
in its lexical space.

I hope this helps.  

Michael Sperberg-McQueen

* C. M. Sperberg-McQueen, Black Mesa Technologies LLC
* http://www.blackmesatech.com 
* http://cmsmcq.com/mib                 
* http://balisage.net
Received on Monday, 16 January 2012 19:21:20 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 16 January 2012 19:21:21 GMT