Re: White Spaces in xsd:hexBinary

On 16 Jan 2012, at 14:06, Ivan Herman wrote:

> Hi Henry,
> 
> RDFa is pretty much silent on this. 
> 
> In general, RDFa does not consider semantic or syntactic checking of literals. Ie, the spec does not say anything about "abcd"^^xsd:float, for example; after all, RDFa is just a serialization format. I guess the same holds for Turtle. The question is whether RDFa processors (ie, parsers) or Turtle parsers would look at this and would raise an error or a warning if the literal is not well formed. AFAIK, neither Turtle nor RDFa processors/parsers are required to do that.
> 
> We may discuss it whether this is good or bad; I am just stating what I believe the facts are.
> 
> (I am not sure that was very helpful, though...)

Well knowing that RDFa and Turtle don't make any special rules is useful. I had not checked that up myself.

Next it would be useful to know if the following statement in Tim Berners Lee's profile is clearly broken RDF or if it is broken if parsers should remove the white space anyway

$ curl http://www.w3.org/People/Berners-Lee/card

...
 <#i> cert:key  [ a cert:RSAPublicKey;
    cert:modulus """d7 a0 e9 1e ed dd cc 90 5d 5e cc d1 e4 12 ab 0c 
5b db e1 18 fa 99 b7 13 2d 91 54 52 f0 b0 9a f5 
eb c0 09 6c a1 db de ec 32 72 3f 5d dd 2b 05 56 
4e 2c e6 7e ff ba 8e 86 77 8e 11 4a 02 a3 90 7c 
2e 6c 6b 28 cf 16 fe e7 7d 0e f0 c4 4d 2e 3c cd 
3e 0b 6e 8c fd d1 97 e3 aa 86 ec 19 99 80 72 9a 
f4 45 1f 79 99 bc e5 5e b3 4b d5 a5 35 04 70 46 
37 00 f7 30 8e 37 2b db 6e 07 5e 0b b8 a8 db a9 
36 86 fa 4a e5 13 17 a4 43 82 bb 09 d0 92 94 c1 
68 5b 10 97 ff d5 9c 44 6a e5 67 fa ec e6 b6 aa 
27 89 79 06 b5 24 a6 49 89 bd 48 cf ea ec 61 d1 
2c c0 b6 3d db 88 5d 2d ad b0 b3 58 c6 66 aa 93 
f5 a4 43 fb 91 fc 2a 3d c6 99 eb 46 15 9b 05 c5 
75 8c 9f 13 ed 28 44 09 4c c5 39 e5 82 e1 1d e3 
6c 67 33 a6 7b 51 25 ef 40 7b 32 9e f5 e9 22 ca 
57 46 a5 ff c6 7b 65 0b 4a e3 66 10 fc a0 cd 7b"""^^xsd:hexBinary ;
        cert:exponent "65537"^^xsd:integer ] .
...

I had originally created a cert:hex datatype that was a lot more flexible. It is still described in the spec here: http://www.w3.org/ns/auth/cert#hex
And indeed that is what Tim Bernerns Lee was using until recently.

The reason I had created that cert:hex originally was just because it seemed clear that people would make a lot of mistakes writing out xsd:hex up. We recently swiched to xsd:hex because we wanted better support from SPARQL and other tools, in the hope that the verification process could now be expressed in one simple ASK query, as explained here

  http://www.w3.org/2005/Incubator/webid/spec/#verifying-the-webid-claim

So perhaps we need some feedback from the semantic side on the interpretation of xsd:hex. 



> 
> Ivan
> 
> On Jan 16, 2012, at 13:22 , Henry Story wrote:
> 
>> The WebID incubator group has encountered a subtle RDF problem with xsd:hexBinary,  and we would like some feedback on this. It is not clear yet who we should be asking here, so I have sent this out a bit widely. 
>> 
>> The WebID Protocol requires users who need a global login to publish their public key at their WebID Profile. The profile is described here
>> 
>> http://webid.info/spec
>> or
>> http://www.w3.org/2005/Incubator/webid/spec/#turtle
>> or the latest editor's draft
>> https://dvcs.w3.org/hg/WebID/raw-file/tip/spec/index-respec.html#turtle
>> 
>> The Turtle example in the editor's draft gives as example the following
>> 
>> ----------8<----------------------------------------------
>> @prefix cert: <http://www.w3.org/ns/auth/cert#> .
>> @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
>> @prefix foaf: <http://xmlns.com/foaf/0.1/> .
>> @prefix rdfs: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
>> 
>> <#me> a foaf:Person;
>> foaf:name "Bob";
>> foaf:knows <https://example.edu/p/Alois#MSc>;
>> foaf:weblog <http://bob.example/blog>;
>> cert:key [ a cert:RSAPublicKey;
>>   rdfs:label "made on 23 November 2011 on my laptop";
>>   cert:modulus "cb24ed85d64d794b69c701c186acc059501e856000f661c93204d8380e07191c5c8b368d2ac32a428acb970398664368dc2a867320220f755e99ca2eecdae62e8d15fb58e1b76ae59cb7ace8838394d59e7250b449176e51a494951a1c366c6217d8768d682dde78dd4d55e613f8839cf275d4c8403743e7862601f3c49a6366e12bb8f498262c3c77de19bce40b32f89ae62c3780f5b6275be337e2b3153ae2ba72a9975ae71ab724649497066b660fcf774b7543d980952d2e8586200eda4158b014e75465d91ecf93efc7ac170c11fc7246fc6ded79c37780000ac4e079f671fd4f207ad770809e0e2d7b0ef5493befe73544d8e1be3dddb52455c61391a1"^^xsd:hexBinary;
>>   cert:exponent 65537 ;
>>  ] .
>> ----------8<----------------------------------------------
>> 
>> So as you receive this e-mail you may notice immediately a problem: the various e-mail services will have cut up that long line into a number 70 character lines. So imagine in the future that I send someone a mail to paste this onto a server, and they follow my advice. Is what they paste the same semantically as what I sent them?
>> 
>> Currently a profile with internal white spaces will pass the OpenLink verifier
>> 
>>  http://id.myopenlink.net/ods/webid_demo.vsp
>> 
>> but it won't pass the foafssl one
>> 
>>  https://foafssl.org/test/WebId
>> 
>> Which is right? 
>> The xmlschema definition of hexBinary explains in english how it works
>> 
>>   http://www.w3.org/TR/xmlschema-2/#hexBinary
>> 
>> and if one wants a few more details one has to look at the following 
>> (at the end of the document)
>> 
>> <xs:simpleType name="hexBinary" id="hexBinary">
>>  <xs:annotation>
>>    <xs:appinfo>
>>      <hfp:hasFacet name="length"/>
>>      <hfp:hasFacet name="minLength"/>
>>      <hfp:hasFacet name="maxLength"/>
>>      <hfp:hasFacet name="pattern"/>
>>      <hfp:hasFacet name="enumeration"/>
>>      <hfp:hasFacet name="whiteSpace"/>
>>      <hfp:hasProperty name="ordered" value="false"/>
>>      <hfp:hasProperty name="bounded" value="false"/>
>>      <hfp:hasProperty name="cardinality" value="countably infinite"/>
>>      <hfp:hasProperty name="numeric" value="false"/>
>>    </xs:appinfo>
>>    <xs:documentation source="http://www.w3.org/TR/xmlschema-2/#binary"/>
>>  </xs:annotation>
>>  <xs:restriction base="xs:anySimpleType">
>>    <xs:whiteSpace fixed="true" value="collapse" id="hexBinary.whiteSpace"/>
>>  </xs:restriction>
>> </xs:simpleType>
>> 
>> which I interpret to say that hexBinary has the whiteSpace facet in collapsed mode.
>> 
>> this is defined in
>> http://www.w3.org/TR/xmlschema-2/#rf-whiteSpace
>> 
>> 
>> + replace: All occurrences of #x9 (tab), #xA (line feed) and #xD (carriage return) are replaced with #x20 (space)
>> + collapse: After the processing implied by replace, contiguous sequences of #x20's are collapsed to a single #x20, and leading and trailing #x20's are removed
>> 
>> As I see it this means that all spaces in the sequence are going to be reduced down to 1 space. This means that
>> 
>> 
>> """cb 24            ed
>>   85"""^^xsd:hexBinary
>> 
>> will be converted to 
>> 
>> "cb 24 ed 85"^^xsd:hexBinary
>> 
>> but never to 
>> 
>> "cb24ed85"^^xsd:hexBinary
>> 
>> which it would need to be if one wanted to find it to be equal to it.
>> (but I may have misread something!)
>> 
>> 
>> Even if this were something then that RDF/XML had to stand by, it is not clear if then Turtle or RDFa - especially RDFa in html - the precisely same rules apply.
>> For example wherever an xsd:hexBinary appears in any RDF notation it will not be possible to put more than one number. It seems therefore that one is forced to interpret the string as one number, and so that one should remove all white spaces.
>> 
>> If one did that one would have a much more resilient format. One could also just remove white spaces on the principle
>> of "be lenient in what you accept and strict in what you produce"
>> 
>> 
>> 
>> 	Henry
>> 
>> 
>> 
>> Social Web Architect
>> http://bblfish.net/
>> 
>> 
> 
> 
> ----
> Ivan Herman, W3C Semantic Web Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> FOAF: http://www.ivan-herman.net/foaf.rdf
> 
> 
> 
> 
> 

Social Web Architect
http://bblfish.net/

Received on Monday, 16 January 2012 14:01:49 UTC