Re: WebID-ISSUE-61 (xsd): xsd datatypes [ontologies]

On 16 Nov 2011, at 22:57, bergi wrote:

>> Currently  we use the cert:hex datatype, which was especially
>> invented to be easy to read for humans: it is possible nearly to copy
>> and paste a hex from a certificate viewer or from openssl tools and
>> get it right. It is extreemly lenient. But it is less standard than
>> using the  xsd datatypes that are discussed in the RDF semantics
>> document  http://www.w3.org/TR/rdf-mt/#dtype_interp
>> 
>> - xsd:base64Binary
>> - xsd:hexBinary
> 
> -1
> 
> What the modulus really contains is an integer. The xsd:*Binary types
> don't define how to convert the data. There are many different ways to
> store integer values (big/little endian or even binary coded
> decimal...). If we use one of *Binary types we must add a description of
> their usage to the rsa:modulus property. I think that's not the idea of
> data types - They should be self describing.

There is a contradiction between your position here and your position below. Below you take a string which is a sequence of characters to be something self  explanatory, something which evidently can take the place of a number, but above you have a sequence of bytes which you see as problematically unable to do so. 

I imagine the endianess has been taken care of in these bytes. Do you know that they have not?

> 
> 
> xsd:string
> 
> +1
> 
> With xsd:string everything is like it's now, except it's no longer
> necessary to provide a data type in the RDF document and SPARQL query.
> One could argue that the rsa:modulus property requires a description how
> the contained data is coded, but in my opinion that's OK for a literal.
> The ASK SPARQL query should also work with "FILTER regex()".

You have exactly the same issue you have with cert:hex except that you can then
no longer switch between types as you can now. Currently because both modulus and exponent
are defined as numbers you can use any way to represent an integer, including xsd:integer or
cert:hex . If at some point for example the xsd crowd came up with some nicer definitions of 
hexadecimal or base64 numbers those could immediately be used.

With xsd:string in comparison to base64Binary you don't get any automatica RDF tools support.
That is with base64Binary or hexBinary, as with xsd:integer or xsd:decimal, the values are converted
directly into numbers that can be compared by the tools for equality.

Perhaps we can all try the following out. 

I created 2 data files which I think contain the same information

$ cat hexdata.n3 
@prefix : <http://localhost/test#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .


:key :mod "E7E92EB9E08692CB8EB9072222B7"^^xsd:hexBinary .


$ cat base64data.n3 
@prefix : <http://localhost/test#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .


:key :mod """AOfpLrnghpLLjrkHIiK3"""^^xsd:base64Binary 

And then I wrote a SPARQL query

$ cat test.sparql 
PREFIX : <http://localhost/test#> 
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> 

ASK {
  :key :mod "E7E92EB9E08692CB8EB9072222B7"^^xsd:hexBinary .
}

Finally I ran the sparql query against both files

$ arq --graph base64data.n3 --query test.sparql 
Ask => No


$ arq --graph hexdata.n3 --query test.sparql 
Ask => Yes

So unless I made a mistake in my conversion from hex to base64 - which is quite possible - then it looks like Jena arc does not give me the right results out of the box.

Also what I notice is that base64 only reduces the length of the string by one third or one quarter approximatively.  I was expecting more - but it had been such a long time ago that I had looked in that space.

Anyway, this is what I mean by empirical evidence. +1 or -1 without any discussion are not worth much. We are trying to come to conclusions through some reasoned process here.

Henry



Social Web Architect
http://bblfish.net/

Received on Wednesday, 16 November 2011 23:36:22 UTC