W3C home > Mailing lists > Public > semantic-web@w3.org > January 2012

White Spaces in xsd:hexBinary

From: Henry Story <henry.story@bblfish.net>
Date: Mon, 16 Jan 2012 13:22:07 +0100
Message-Id: <8965C809-F33B-4FFF-B6FC-3542382BD5BD@bblfish.net>
Cc: WebID XG <public-xg-webid@w3.org>
To: Liste SW-W3C <semantic-web@w3.org>, RDFa WG <public-rdfa-wg@w3.org>, Linked Data community <public-lod@w3.org>
The WebID incubator group has encountered a subtle RDF problem with xsd:hexBinary,  and we would like some feedback on this. It is not clear yet who we should be asking here, so I have sent this out a bit widely. 

The WebID Protocol requires users who need a global login to publish their public key at their WebID Profile. The profile is described here

  http://webid.info/spec
or
  http://www.w3.org/2005/Incubator/webid/spec/#turtle
or the latest editor's draft
  https://dvcs.w3.org/hg/WebID/raw-file/tip/spec/index-respec.html#turtle

The Turtle example in the editor's draft gives as example the following

----------8<----------------------------------------------
@prefix cert: <http://www.w3.org/ns/auth/cert#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdfs: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

<#me> a foaf:Person;
  foaf:name "Bob";
  foaf:knows <https://example.edu/p/Alois#MSc>;
  foaf:weblog <http://bob.example/blog>;
  cert:key [ a cert:RSAPublicKey;
    rdfs:label "made on 23 November 2011 on my laptop";
    cert:modulus "cb24ed85d64d794b69c701c186acc059501e856000f661c93204d8380e07191c5c8b368d2ac32a428acb970398664368dc2a867320220f755e99ca2eecdae62e8d15fb58e1b76ae59cb7ace8838394d59e7250b449176e51a494951a1c366c6217d8768d682dde78dd4d55e613f8839cf275d4c8403743e7862601f3c49a6366e12bb8f498262c3c77de19bce40b32f89ae62c3780f5b6275be337e2b3153ae2ba72a9975ae71ab724649497066b660fcf774b7543d980952d2e8586200eda4158b014e75465d91ecf93efc7ac170c11fc7246fc6ded79c37780000ac4e079f671fd4f207ad770809e0e2d7b0ef5493befe73544d8e1be3dddb52455c61391a1"^^xsd:hexBinary;
    cert:exponent 65537 ;
   ] .
----------8<----------------------------------------------

So as you receive this e-mail you may notice immediately a problem: the various e-mail services will have cut up that long line into a number 70 character lines. So imagine in the future that I send someone a mail to paste this onto a server, and they follow my advice. Is what they paste the same semantically as what I sent them?

Currently a profile with internal white spaces will pass the OpenLink verifier

   http://id.myopenlink.net/ods/webid_demo.vsp

but it won't pass the foafssl one

   https://foafssl.org/test/WebId

Which is right? 
The xmlschema definition of hexBinary explains in english how it works

    http://www.w3.org/TR/xmlschema-2/#hexBinary

and if one wants a few more details one has to look at the following 
(at the end of the document)

<xs:simpleType name="hexBinary" id="hexBinary">
   <xs:annotation>
     <xs:appinfo>
       <hfp:hasFacet name="length"/>
       <hfp:hasFacet name="minLength"/>
       <hfp:hasFacet name="maxLength"/>
       <hfp:hasFacet name="pattern"/>
       <hfp:hasFacet name="enumeration"/>
       <hfp:hasFacet name="whiteSpace"/>
       <hfp:hasProperty name="ordered" value="false"/>
       <hfp:hasProperty name="bounded" value="false"/>
       <hfp:hasProperty name="cardinality" value="countably infinite"/>
       <hfp:hasProperty name="numeric" value="false"/>
     </xs:appinfo>
     <xs:documentation source="http://www.w3.org/TR/xmlschema-2/#binary"/>
   </xs:annotation>
   <xs:restriction base="xs:anySimpleType">
     <xs:whiteSpace fixed="true" value="collapse" id="hexBinary.whiteSpace"/>
   </xs:restriction>
 </xs:simpleType>

which I interpret to say that hexBinary has the whiteSpace facet in collapsed mode.

this is defined in
  http://www.w3.org/TR/xmlschema-2/#rf-whiteSpace


+ replace: All occurrences of #x9 (tab), #xA (line feed) and #xD (carriage return) are replaced with #x20 (space)
+ collapse: After the processing implied by replace, contiguous sequences of #x20's are collapsed to a single #x20, and leading and trailing #x20's are removed

As I see it this means that all spaces in the sequence are going to be reduced down to 1 space. This means that


 """cb 24            ed
    85"""^^xsd:hexBinary

will be converted to 

  "cb 24 ed 85"^^xsd:hexBinary

but never to 

  "cb24ed85"^^xsd:hexBinary

which it would need to be if one wanted to find it to be equal to it.
(but I may have misread something!)


Even if this were something then that RDF/XML had to stand by, it is not clear if then Turtle or RDFa - especially RDFa in html - the precisely same rules apply.
For example wherever an xsd:hexBinary appears in any RDF notation it will not be possible to put more than one number. It seems therefore that one is forced to interpret the string as one number, and so that one should remove all white spaces.

If one did that one would have a much more resilient format. One could also just remove white spaces on the principle
of "be lenient in what you accept and strict in what you produce"



	Henry



Social Web Architect
http://bblfish.net/
Received on Monday, 16 January 2012 12:24:29 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 21:45:46 GMT