Re: White Spaces in xsd:hexBinary

On Mon, 2012-01-16 at 15:51 +0100, Henry Story wrote: 
> On 16 Jan 2012, at 15:27, Dave Reynolds wrote:
> 
> > [Trimmed cross-posting]
> > 
> > On Mon, 2012-01-16 at 15:05 +0100, Ivan Herman wrote: 
> >> Henry, 
> >> 
> >> I think that it would be worth asking a Schema expert. It may well be that the lexical space of the datatype is liberal enough to make the profile of Tim valid, provided that the lexical to value space conversion makes the right things with the spaces...
> > 
> > I'm not a schema expert but my reading of [1] is that spaces are not
> > permitted in the lexical form of hexBinary, however they are (scroll
> > down a bit on that page) permitted in base64Binary which may thus be a
> > better option for WebID.
> 
> xsd:hexBinary and xsd:base64Binary both map to the same value space, which is part of the
> advantage of using them. So, in fact it should be possible to use both. 
> 
> But it makes more sense to publish those in hex format because that is the way most crypto tools such as OpenSSO and browsers show them, which makes it easier for humans to see them.
> 
> Your reading of [1] is I think similar to my reading. The question is then given the saying "be lenient in what you accept and strict in what you produce" should servers parse those with white spaces too. Or is there a way something can go wrong if one does?

I can't offhand think of a problem that would prevent a server
*choosing* to ignore white spaces in hexBinary.

However, most existing toolkits will/should follow the specs and so are
likely to at least give warnings for non-conformant data. so any server
developer would need to be aware of this deliberate "leniency" in WebID
usage.

Dave

> 
> Henry
> 
> > 
> > Dave
> > 
> > [1] http://www.w3.org/TR/xmlschema-2/#hexBinary
> > 
> >> 
> >> Ivan
> >> 
> >> On Jan 16, 2012, at 14:59 , Henry Story wrote:
> >> 
> >>> 
> >>> On 16 Jan 2012, at 14:06, Ivan Herman wrote:
> >>> 
> >>>> Hi Henry,
> >>>> 
> >>>> RDFa is pretty much silent on this. 
> >>>> 
> >>>> In general, RDFa does not consider semantic or syntactic checking of literals. Ie, the spec does not say anything about "abcd"^^xsd:float, for example; after all, RDFa is just a serialization format. I guess the same holds for Turtle. The question is whether RDFa processors (ie, parsers) or Turtle parsers would look at this and would raise an error or a warning if the literal is not well formed. AFAIK, neither Turtle nor RDFa processors/parsers are required to do that.
> >>>> 
> >>>> We may discuss it whether this is good or bad; I am just stating what I believe the facts are.
> >>>> 
> >>>> (I am not sure that was very helpful, though...)
> >>> 
> >>> Well knowing that RDFa and Turtle don't make any special rules is useful. I had not checked that up myself.
> >>> 
> >>> Next it would be useful to know if the following statement in Tim Berners Lee's profile is clearly broken RDF or if it is broken if parsers should remove the white space anyway
> >>> 
> >>> $ curl http://www.w3.org/People/Berners-Lee/card
> >>> 
> >>> ...
> >>> <#i> cert:key  [ a cert:RSAPublicKey;
> >>>   cert:modulus """d7 a0 e9 1e ed dd cc 90 5d 5e cc d1 e4 12 ab 0c 
> >>> 5b db e1 18 fa 99 b7 13 2d 91 54 52 f0 b0 9a f5 
> >>> eb c0 09 6c a1 db de ec 32 72 3f 5d dd 2b 05 56 
> >>> 4e 2c e6 7e ff ba 8e 86 77 8e 11 4a 02 a3 90 7c 
> >>> 2e 6c 6b 28 cf 16 fe e7 7d 0e f0 c4 4d 2e 3c cd 
> >>> 3e 0b 6e 8c fd d1 97 e3 aa 86 ec 19 99 80 72 9a 
> >>> f4 45 1f 79 99 bc e5 5e b3 4b d5 a5 35 04 70 46 
> >>> 37 00 f7 30 8e 37 2b db 6e 07 5e 0b b8 a8 db a9 
> >>> 36 86 fa 4a e5 13 17 a4 43 82 bb 09 d0 92 94 c1 
> >>> 68 5b 10 97 ff d5 9c 44 6a e5 67 fa ec e6 b6 aa 
> >>> 27 89 79 06 b5 24 a6 49 89 bd 48 cf ea ec 61 d1 
> >>> 2c c0 b6 3d db 88 5d 2d ad b0 b3 58 c6 66 aa 93 
> >>> f5 a4 43 fb 91 fc 2a 3d c6 99 eb 46 15 9b 05 c5 
> >>> 75 8c 9f 13 ed 28 44 09 4c c5 39 e5 82 e1 1d e3 
> >>> 6c 67 33 a6 7b 51 25 ef 40 7b 32 9e f5 e9 22 ca 
> >>> 57 46 a5 ff c6 7b 65 0b 4a e3 66 10 fc a0 cd 7b"""^^xsd:hexBinary ;
> >>>       cert:exponent "65537"^^xsd:integer ] .
> >>> ...
> >>> 
> >>> I had originally created a cert:hex datatype that was a lot more flexible. It is still described in the spec here: http://www.w3.org/ns/auth/cert#hex
> >>> And indeed that is what Tim Bernerns Lee was using until recently.
> >>> 
> >>> The reason I had created that cert:hex originally was just because it seemed clear that people would make a lot of mistakes writing out xsd:hex up. We recently swiched to xsd:hex because we wanted better support from SPARQL and other tools, in the hope that the verification process could now be expressed in one simple ASK query, as explained here
> >>> 
> >>> http://www.w3.org/2005/Incubator/webid/spec/#verifying-the-webid-claim
> >>> 
> >>> So perhaps we need some feedback from the semantic side on the interpretation of xsd:hex. 
> >>> 
> >>> 
> >>> 
> >>>> 
> >>>> Ivan
> >>>> 
> >>>> On Jan 16, 2012, at 13:22 , Henry Story wrote:
> >>>> 
> >>>>> The WebID incubator group has encountered a subtle RDF problem with xsd:hexBinary,  and we would like some feedback on this. It is not clear yet who we should be asking here, so I have sent this out a bit widely. 
> >>>>> 
> >>>>> The WebID Protocol requires users who need a global login to publish their public key at their WebID Profile. The profile is described here
> >>>>> 
> >>>>> http://webid.info/spec
> >>>>> or
> >>>>> http://www.w3.org/2005/Incubator/webid/spec/#turtle
> >>>>> or the latest editor's draft
> >>>>> https://dvcs.w3.org/hg/WebID/raw-file/tip/spec/index-respec.html#turtle
> >>>>> 
> >>>>> The Turtle example in the editor's draft gives as example the following
> >>>>> 
> >>>>> ----------8<----------------------------------------------
> >>>>> @prefix cert: <http://www.w3.org/ns/auth/cert#> .
> >>>>> @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
> >>>>> @prefix foaf: <http://xmlns.com/foaf/0.1/> .
> >>>>> @prefix rdfs: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
> >>>>> 
> >>>>> <#me> a foaf:Person;
> >>>>> foaf:name "Bob";
> >>>>> foaf:knows <https://example.edu/p/Alois#MSc>;
> >>>>> foaf:weblog <http://bob.example/blog>;
> >>>>> cert:key [ a cert:RSAPublicKey;
> >>>>> rdfs:label "made on 23 November 2011 on my laptop";
> >>>>> cert:modulus "cb24ed85d64d794b69c701c186acc059501e856000f661c93204d8380e07191c5c8b368d2ac32a428acb970398664368dc2a867320220f755e99ca2eecdae62e8d15fb58e1b76ae59cb7ace8838394d59e7250b449176e51a494951a1c366c6217d8768d682dde78dd4d55e613f8839cf275d4c8403743e7862601f3c49a6366e12bb8f498262c3c77de19bce40b32f89ae62c3780f5b6275be337e2b3153ae2ba72a9975ae71ab724649497066b660fcf774b7543d980952d2e8586200eda4158b014e75465d91ecf93efc7ac170c11fc7246fc6ded79c37780000ac4e079f671fd4f207ad770809e0e2d7b0ef5493befe73544d8e1be3dddb52455c61391a1"^^xsd:hexBinary;
> >>>>> cert:exponent 65537 ;
> >>>>> ] .
> >>>>> ----------8<----------------------------------------------
> >>>>> 
> >>>>> So as you receive this e-mail you may notice immediately a problem: the various e-mail services will have cut up that long line into a number 70 character lines. So imagine in the future that I send someone a mail to paste this onto a server, and they follow my advice. Is what they paste the same semantically as what I sent them?
> >>>>> 
> >>>>> Currently a profile with internal white spaces will pass the OpenLink verifier
> >>>>> 
> >>>>> http://id.myopenlink.net/ods/webid_demo.vsp
> >>>>> 
> >>>>> but it won't pass the foafssl one
> >>>>> 
> >>>>> https://foafssl.org/test/WebId
> >>>>> 
> >>>>> Which is right? 
> >>>>> The xmlschema definition of hexBinary explains in english how it works
> >>>>> 
> >>>>> http://www.w3.org/TR/xmlschema-2/#hexBinary
> >>>>> 
> >>>>> and if one wants a few more details one has to look at the following 
> >>>>> (at the end of the document)
> >>>>> 
> >>>>> <xs:simpleType name="hexBinary" id="hexBinary">
> >>>>> <xs:annotation>
> >>>>>  <xs:appinfo>
> >>>>>    <hfp:hasFacet name="length"/>
> >>>>>    <hfp:hasFacet name="minLength"/>
> >>>>>    <hfp:hasFacet name="maxLength"/>
> >>>>>    <hfp:hasFacet name="pattern"/>
> >>>>>    <hfp:hasFacet name="enumeration"/>
> >>>>>    <hfp:hasFacet name="whiteSpace"/>
> >>>>>    <hfp:hasProperty name="ordered" value="false"/>
> >>>>>    <hfp:hasProperty name="bounded" value="false"/>
> >>>>>    <hfp:hasProperty name="cardinality" value="countably infinite"/>
> >>>>>    <hfp:hasProperty name="numeric" value="false"/>
> >>>>>  </xs:appinfo>
> >>>>>  <xs:documentation source="http://www.w3.org/TR/xmlschema-2/#binary"/>
> >>>>> </xs:annotation>
> >>>>> <xs:restriction base="xs:anySimpleType">
> >>>>>  <xs:whiteSpace fixed="true" value="collapse" id="hexBinary.whiteSpace"/>
> >>>>> </xs:restriction>
> >>>>> </xs:simpleType>
> >>>>> 
> >>>>> which I interpret to say that hexBinary has the whiteSpace facet in collapsed mode.
> >>>>> 
> >>>>> this is defined in
> >>>>> http://www.w3.org/TR/xmlschema-2/#rf-whiteSpace
> >>>>> 
> >>>>> 
> >>>>> + replace: All occurrences of #x9 (tab), #xA (line feed) and #xD (carriage return) are replaced with #x20 (space)
> >>>>> + collapse: After the processing implied by replace, contiguous sequences of #x20's are collapsed to a single #x20, and leading and trailing #x20's are removed
> >>>>> 
> >>>>> As I see it this means that all spaces in the sequence are going to be reduced down to 1 space. This means that
> >>>>> 
> >>>>> 
> >>>>> """cb 24            ed
> >>>>> 85"""^^xsd:hexBinary
> >>>>> 
> >>>>> will be converted to 
> >>>>> 
> >>>>> "cb 24 ed 85"^^xsd:hexBinary
> >>>>> 
> >>>>> but never to 
> >>>>> 
> >>>>> "cb24ed85"^^xsd:hexBinary
> >>>>> 
> >>>>> which it would need to be if one wanted to find it to be equal to it.
> >>>>> (but I may have misread something!)
> >>>>> 
> >>>>> 
> >>>>> Even if this were something then that RDF/XML had to stand by, it is not clear if then Turtle or RDFa - especially RDFa in html - the precisely same rules apply.
> >>>>> For example wherever an xsd:hexBinary appears in any RDF notation it will not be possible to put more than one number. It seems therefore that one is forced to interpret the string as one number, and so that one should remove all white spaces.
> >>>>> 
> >>>>> If one did that one would have a much more resilient format. One could also just remove white spaces on the principle
> >>>>> of "be lenient in what you accept and strict in what you produce"
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>>  Henry
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> Social Web Architect
> >>>>> http://bblfish.net/
> >>>>> 
> >>>>> 
> >>>> 
> >>>> 
> >>>> ----
> >>>> Ivan Herman, W3C Semantic Web Activity Lead
> >>>> Home: http://www.w3.org/People/Ivan/
> >>>> mobile: +31-641044153
> >>>> FOAF: http://www.ivan-herman.net/foaf.rdf
> >>>> 
> >>>> 
> >>>> 
> >>>> 
> >>>> 
> >>> 
> >>> Social Web Architect
> >>> http://bblfish.net/
> >>> 
> >> 
> >> 
> >> ----
> >> Ivan Herman, W3C Semantic Web Activity Lead
> >> Home: http://www.w3.org/People/Ivan/
> >> mobile: +31-641044153
> >> FOAF: http://www.ivan-herman.net/foaf.rdf
> >> 
> >> 
> >> 
> >> 
> >> 
> > 
> > 
> > 
> 
> Social Web Architect
> http://bblfish.net/
> 

Received on Monday, 16 January 2012 15:13:40 UTC