Re: WebID-ISSUE-61 (xsd): xsd datatypes [ontologies] from bergi on 2011-11-17 (public-xg-webid@w3.org from November 2011)

From: bergi <bergi@axolotlfarm.org>
Date: Thu, 17 Nov 2011 12:04:03 +0100
To: Henry Story <henry.story@bblfish.net>
CC: WebID Incubator Group WG <public-xg-webid@w3.org>
Message-ID: <4EC4EA23.9010100@axolotlfarm.org>
Am 17.11.2011 00:35, schrieb Henry Story:
> 
> On 16 Nov 2011, at 22:57, bergi wrote:
> 
>>> Currently  we use the cert:hex datatype, which was especially
>>> invented to be easy to read for humans: it is possible nearly to copy
>>> and paste a hex from a certificate viewer or from openssl tools and
>>> get it right. It is extreemly lenient. But it is less standard than
>>> using the  xsd datatypes that are discussed in the RDF semantics
>>> document  http://www.w3.org/TR/rdf-mt/#dtype_interp
>>>
>>> - xsd:base64Binary
>>> - xsd:hexBinary
>>
>> -1
>>
>> What the modulus really contains is an integer. The xsd:*Binary types
>> don't define how to convert the data. There are many different ways to
>> store integer values (big/little endian or even binary coded
>> decimal...). If we use one of *Binary types we must add a description of
>> their usage to the rsa:modulus property. I think that's not the idea of
>> data types - They should be self describing.
> 
> There is a contradiction between your position here and your position
> below. Below you take a string which is a sequence of characters to be
> something self  explanatory, something which evidently can take the
> place of a number, but above you have a sequence of bytes which you see
> as problematically unable to do so. 

The *Binary data types are just strings, but with very specific rules
how to fill them and how to extract the contained data. That's also true
for pure strings if they are used for anything else than labels. The
difference between *Binary and string is the first possible position for
the description.

string->data types: *Binary
string->property: rsa:modulus

This isn't based on some spec it's just my point of view.

> 
> I imagine the endianess has been taken care of in these bytes. Do you
> know that they have not?

Maybe it's more clear if we talk about some real world examples. Thats
how it would look like in Java:

byte[] binary = {1, 2, 3, 4};
BigInteger modulus = new BigInteger(binary);


The first line represents the *Binary data types. The second line
converts the binary to an integer. That's not defined by the *Binary
data types. The endianness is defined in the BigInteger constructor.
Usually numbers bigger than CPU registers are big endian, but we
shouldn't rely on that.

> 
>>
>>
>> xsd:string
>>
>> +1
>>
>> With xsd:string everything is like it's now, except it's no longer
>> necessary to provide a data type in the RDF document and SPARQL query.
>> One could argue that the rsa:modulus property requires a description how
>> the contained data is coded, but in my opinion that's OK for a literal.
>> The ASK SPARQL query should also work with "FILTER regex()".
> 
> You have exactly the same issue you have with cert:hex except that you
> can then
> no longer switch between types as you can now. Currently because both
> modulus and exponent
> are defined as numbers you can use any way to represent an integer,
> including xsd:integer or
> cert:hex . If at some point for example the xsd crowd came up with some
> nicer definitions of 
> hexadecimal or base64 numbers those could immediately be used.

Couldn't we still use these types in the future by adding the specific
data type to the triple object?

> 
> With xsd:string in comparison to base64Binary you don't get any
> automatica RDF tools support.
> That is with base64Binary or hexBinary, as with xsd:integer or
> xsd:decimal, the values are converted
> directly into numbers that can be compared by the tools for equality.

I think support means the frameworks have already types/classes for the
xsd:* data types. In Jena for example there are getHashCode and isEqual
methods [1][2].

> 
> Perhaps we can all try the following out. 
> 
> I created 2 data files which I think contain the same information
> 
> $ cat hexdata.n3 
> @prefix : <http://localhost/test#> .
> @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
> 
> 
> :key :mod "E7E92EB9E08692CB8EB9072222B7"^^xsd:hexBinary .
> 
> 
> $ cat base64data.n3  
> @prefix : <http://localhost/test#> .
> @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
> 
> 
> :key :mod """AOfpLrnghpLLjrkHIiK3"""^^xsd:base64Binary 
> 
> And then I wrote a SPARQL query
> 
> $ cat test.sparql 
> PREFIX : <http://localhost/test#> 
> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> 
> 
> ASK {
>   :key :mod "E7E92EB9E08692CB8EB9072222B7"^^xsd:hexBinary .
> }
> 
> Finally I ran the sparql query against both files
> 
> $ arq --graph base64data.n3 --query test.sparql  
> Ask => No
> 
> 
> $ arq --graph hexdata.n3 --query test.sparql 
> Ask => Yes
> 
> So unless I made a mistake in my conversion from hex to base64 - which
> is quite possible - then it looks like Jena arc does not give me the
> right results out of the box.

You added leading zeros before the conversion to Base64. The right value
would be "5+kuueCGksuOuQciIrc=", but that also doesn't work. The SPARQL
spec contains a list of data types [3] which are threated in a special
way. *Binary is not in that list. Unless an implementation has
additional features for the *Binary data types we don't get any benefit
in our SPARQL queries.

> 
> Also what I notice is that base64 only reduces the length of the string
> by one third or one quarter approximatively.  I was expecting more - but
> it had been such a long time ago that I had looked in that space.
> 
> Anyway, this is what I mean by empirical evidence. +1 or -1 without any
> discussion are not worth much. We are trying to come to conclusions
> through some reasoned process here.
> 
> Henry
> 
> 
> 
> Social Web Architect
> http://bblfish.net/
> 


[1]
http://jena.sourceforge.net/javadoc/com/hp/hpl/jena/datatypes/xsd/XSDbase64Binary.html
[2]
http://jena.sourceforge.net/javadoc/com/hp/hpl/jena/datatypes/xsd/XSDhexBinary.html
[3] http://www.w3.org/TR/rdf-sparql-query/#operandDataTypes
Received on Thursday, 17 November 2011 11:04:41 UTC