W3C home > Mailing lists > Public > public-xg-webid@w3.org > January 2012

Re: White Spaces in xsd:hexBinary

From: Henry Story <henry.story@bblfish.net>
Date: Sat, 21 Jan 2012 22:06:21 +0100
Message-Id: <AAE0C69B-3CBD-4ABA-A93C-E89462F0018B@bblfish.net>
To: WebID XG <public-xg-webid@w3.org>, Liste SW-W3C <semantic-web@w3.org>

So if one can summarise the discussion on the XML Schame mailing list, 
on the issue of spaces in xsd:hexBinary which I think we 
are likely to end up finding in the real world and indeed currently 
do find in Tim Berner's Lee's foaf file

$ curl http://www.w3.org/People/Berners-Lee/card
@prefix cert:  <http://www.w3.org/ns/auth/cert#> .
<#i> cert:key  [ a cert:RSAPublicKey;
    cert:modulus "d7 a0 e9 1e ... cc d1 e4 12 ab..."^^xsd:hexBinary

( which is probably just a simple oversight by timbl, as he quickly 
switched from the cert:hex datatype that did allow the notation above)

(1) There were a few calls to start an investigation to see if 
anything really dramatically bad would happen if whitespaces were
  + Henry Thomson argued for this [1]
  + Noa Mendelson asks that one should ask around to see if changes 
would have a bad consequences [2] 
  + Michal Kay is against [3]
  + Noah Mendelsohn seems to think it's too much work given the stage 
     the spec is at [4]

(2) It is indeed not currently legal to put white spaces in the 
    hexBinary  BUT...  there is a bit of wiggle room.

  How to interpret such strings if one sticks closely to the current 
specification  of xml-schema-2  [6] was laid out very clearly by C. M. 
Sperberg-McQueen [7]. What he says requires closer looking into, but 
perhaps the following is something give a reason to look more carefully.

 Depending on what type of processing you do of your XML, earlier 
layers of your  XML could remove the white space. Such processing does 
happen as for example the  following is legal

         <rdfs:label>made on 23 November 2011 on my
         <cert:modulus>0F<!--* hi, mom! *-->B7</cert:modulus>
         <cert:exponent> 65537 </cert:exponent>

    The above is legal XML, but that type of processing does not happen 
in Turtle. So one could argue that there is a difference, and that for 
example Turtle should  be thought of as incorporating certain steps that 
are not in XML. For example it coulddo the normalisation and remove all 
white space.  Sperber-McQueen was thinking that this had been ruled out 
in the RDF camp, but perhaps that was only the rdf/xml camp, and perhaps 
things have changed since then.

   The saxon parser could even do such a preprocessing step to remove
those white spaces explains Michael Kay [8]

[1]  Henry Thomson:
[2] Noa Mendleson
[3]  Miachel Kay 
[4] http://lists.w3.org/Archives/Public/www-xml-schema-comments/2012JanMar/0025.html

[6]  http://www.w3.org/TR/xmlschema-2/
[7]  C. M. Sperberg-McQueen's answer to how one should interpret such a binary
[8] http://lists.w3.org/Archives/Public/www-xml-schema-comments/2012JanMar/0017.html

On 16 Jan 2012, at 18:26, Henry Story wrote:

> On 16 Jan 2012, at 17:40, Dave Reynolds wrote:
>> That regex and the associated EBF seems unambiguous to me, no spaces
>> between hexOctets. I see no wriggle room :)
> yes, I'd like to know why it is defined like that, and if that needs to be constraining on formats such as RDF that have other restrictions such as only allowing one binary to appear in an xsd:hexBinary string. After all the RDF version could say: we concatenate all binaries into one big binary, since that is the only interpretation that could be meant by someone who had entered white spaces.
> I wrote their group an e-mail to check
> http://lists.w3.org/Archives/Public/www-xml-schema-comments/2012JanMar/0011.html
> Henry
>> Dave
> Social Web Architect
> http://bblfish.net/

Social Web Architect
Received on Saturday, 21 January 2012 21:07:07 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:06:30 UTC