xsd:hexBinary and xsd:base64Binary entailments

Dear SPARQL group,

The WebID XG is developing authentication mechanism which relies on SPARQL without requiring it.
It relies on SPARQL because it is just so much easier to explain what is needed: one simple ASK 
query, as described in section 3.2.4.2 of the spec

  http://www.w3.org/2005/Incubator/webid/spec/#verifying-the-webid-claim
  http://webid.info/spec (to tweet)

We are using a number of technologies - from crypto to SPARQL, making use of linked data and https - 
in as simple a manner as we can whilst balancing the requirements of real deployments.

Doing this we have come across 2 issues.

1. xsd:hexBinary and white space
================================

We are hoping to have WebID deployed very widely to authenticate people across a vast number of resources. So we do need
the SPARQL query to be flexible. Using Jena arc I found that an xsd:hexBinary query on an RDF document containing a 
white space in the data, does not give the right result:

----------------------------------------------------------------- 
hjs@bblfish[0]$ cat q1.sparql 
PREFIX : <http://me.example/p#> 
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> 

SELECT ?S WHERE { 
  ?S :related "AAAA"^^xsd:hexBinary . 
} 


hjs@bblfish[0]$ cat c1.rdf 

<rdf:RDF xmlns="http://me.example/p#" 
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> 

    <rdf:Description rdf:about="http://me.example/p#me"> 
        <related rdf:datatype="http://www.w3.org/2001/XMLSchema#hexBinary"> 
AAAA 
</related> 
    </rdf:Description> 
</rdf:RDF> 

hjs@bblfish[0]$ arq --query=q1.sparql --data=c1.rdf 
----- 
| S | 
===== 
------------------------------------------------------------

I filed a bug report on the list and the conclusion seems to be that Jena
is working according to spec.

 https://issues.apache.org/jira/browse/JENA-170

So it seems to me that this shows that the spec is creating behaviour that is too 
fragile. If someone makes a white space mistake in publishing their data and this does
not give the correct query result, it is going to create a lot of customer confusion. Especially
if it is possible to the write the same query using the graph api to get the right results. Also the
hexBinary definition does say that hexBinary has a whitespace fact, so that is just going to add to
the number of difficult to defend bug reports.

   http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/#hexBinary 

I suggest that this be defined more clearly in the SPARQL D-entailment section of the spec.


2. xsd:hexBinary and xsd:base64Binary
=====================================

We are using xsd:hexBinary because there is no xsd:hexInteger which we would have suited us more correctly. We are using xsd:hexBinary because hex is what most tools use. The XML D-SIG spec uses what seems to be the equivalent of xsd:base64Binary named there a ds:CryptoBinary 

   http://www.w3.org/TR/xmldsig-core/#sec-CoreSyntax

So thinking a bit longer term it would be very useful if a simple GRDDL of such a document could produce a graph that can be queried with the same ASK query defined in the WebID spec. For that there needs to be a D-entailment relation between xsd:hexBinary and xsd:base64Binary. There is no reason there should not be such an entailment, since obviously both are binaries - they are just encoded differently.

All the best,

	Henry Story




Social Web Architect
http://bblfish.net/

Received on Tuesday, 29 November 2011 23:50:13 UTC