- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Mon, 16 Jan 2012 12:20:54 -0700
- To: Henry Story <henry.story@bblfish.net>
- Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, www-xml-schema-comments@w3.org
On Jan 16, 2012, at 10:15 AM, Henry Story wrote: > Dear XML Schema working Group, > > From reading the latest XML Schema spec (which is a big improvement over the previous one!) it seems that it is not possible to put white spaces inside an xsd:hexBinary. I read the text here > > http://www.w3.org/TR/xmlschema11-2/#hexBinary > > "[the lexical space of] hexBinary is the same as that recognized by the regular > expression '([0-9a-fA-F]{2})*'." > > I was looking for confirmation that that is the correct reading first of all. There is a white space collapse facet which I suppose is meant to remove leading and trailing spaces, but not spaces inside the number. The value 'collapse' in the whitespace facet replaces internal sequences of whitespace characters with single blank characters. So it may reduce internal whitespace but does not eliminate it. You are right then to say it won't remove spaces inside the literal. > Then secondly I was looking to see if there were reasons this was done like this. After all a hexBinary could and usually is a very very long string, and so it is likely to be difficult to read if it cannot be cut up a little bit. It is also very likely that white spaces should enter into such a long number by mistake as people copy and paste information from one system to another, in what could be normal human processing tasks. > > I imagine this rule would make sense if it were possible in some XML formats to use the xsd:hexBinary datatype and have it be followed by a set of hexBinaries each separated by a space. > > But in formats that use this datatype that are RDF driven, such as RDF/XML, Turtle, RDFa and so on, this is not the case. Those formats require there to only be 1 binary, so there is really nothing that the spaces can separate. > > ... > > But it just seems quite likely that people will end up putting white spaces in there somewhere. Should parsers reject those immediately? And if so why? Thank you; this is an excellent question. Without checking the decision records for XSD 1.0, I do not myself recall whether the option of allowing whitespace within the lexical forms of hexBinary was discussed or not, and if discussed what reasoning led us to forbid it. The topic clearly did come up in connection with base64, which explicitly includes whitespace in its lexical space, so the WG ought in principle to have considered the question of whitespace for hexBinary. One difference is that base64 is defined by an RFC (by several, in fact, by now) which discusses the inclusion of whitespace, while hexBinary does not have a similarly prominent definition elsewhere. (Which means: maybe we just overlooked the problem?) For usability, it seems to me (speaking solely for myself) that allowing whitespace in the lexical forms of xsd:hexBinary (or perhaps better, adding a value 'suppress' for the whitespace facet, which simply suppresses all whitespace) would be an improvement. Unfortunately, a change seems likely to be very difficult, given that XSD 1.0 appears to be unambiguous in excluding whitespace from the definition of the lexical space for this type, so allowing whitespace now would introduce an incompatibility with version 1.0 of the spec. The history of the WG's discussions of questions of this kind makes me think it likely that such an incompatibility might lead immediately to formal objections. Changes are difficult at this point in any case, since a Candidate Recommendation draft of XSD 1.1 was published some time ago and any substantive changes now would mean substantial delays in completing the XSD 1.1 spec. Two work-arounds occur to me. One is very ugly and probably won't actually help most of the users you are concerned with: introduce whitespace by means of comments. Using this workaround, your example might look like this: <#me> a foaf:Person; foaf:name "Bob"; foaf:knows <https://example.edu/p/Alois#MSc>; foaf:weblog <http://bob.example/blog>; cert:key [ a cert:RSAPublicKey; rdfs:label "made on 23 November 2011 on my laptop"; cert:modulus "cb24ed85d64d794b69c701c186acc059501e856000f661c9<!-- -->3204d8380e07191c5c8b368d2ac32a428acb970398664368<!-- -->dc2a867320220f755e99ca2eecdae62e8d15fb58e1b76ae5<!-- -->9cb7ace8838394d59e7250b449176e51a494951a1c366c62<!-- -->17d8768d682dde78dd4d55e613f8839cf275d4c8403743e7<!-- -->862601f3c49a6366e12bb8f498262c3c77de19bce40b32f8<!-- -->9ae62c3780f5b6275be337e2b3153ae2ba72a9975ae71ab7<!-- -->24649497066b660fcf774b7543d980952d2e8586200eda41<!-- -->58b014e75465d91ecf93efc7ac170c11fc7246fc6ded79c3<!-- -->7780000ac4e079f671fd4f207ad770809e0e2d7b0ef5493b<!-- -->efe73544d8e1be3dddb52455c61391a1"^^xsd:hexBinary; cert:exponent 65537 ; ] . The second workaround is simpler: use base64Binary, not hexBinary. For whatever reason, base64Binary is defined to include whitespace in its lexical space. I hope this helps. Michael Sperberg-McQueen -- **************************************************************** * C. M. Sperberg-McQueen, Black Mesa Technologies LLC * http://www.blackmesatech.com * http://cmsmcq.com/mib * http://balisage.net ****************************************************************
Received on Monday, 16 January 2012 19:21:20 UTC