W3C home > Mailing lists > Public > public-xg-webid@w3.org > November 2011

Re: WebID-ISSUE-61 (xsd): xsd datatypes [ontologies]

From: Henry Story <henry.story@bblfish.net>
Date: Thu, 17 Nov 2011 16:10:23 +0100
Cc: WebID Incubator Group WG <public-xg-webid@w3.org>
Message-Id: <95DE2344-8AF8-4CE3-BC21-F0564DEB39C0@bblfish.net>
To: bergi <bergi@axolotlfarm.org>
Great responses Bergi. Thanks for the input.


On 17 Nov 2011, at 12:04, bergi wrote:

> Am 17.11.2011 00:35, schrieb Henry Story:
>> 
>> On 16 Nov 2011, at 22:57, bergi wrote:
>> 
>>>> Currently  we use the cert:hex datatype, which was especially
>>>> invented to be easy to read for humans: it is possible nearly to copy
>>>> and paste a hex from a certificate viewer or from openssl tools and
>>>> get it right. It is extreemly lenient. But it is less standard than
>>>> using the  xsd datatypes that are discussed in the RDF semantics
>>>> document  http://www.w3.org/TR/rdf-mt/#dtype_interp
>>>> 
>>>> - xsd:base64Binary
>>>> - xsd:hexBinary
>>> 
>>> -1
>>> 
>>> What the modulus really contains is an integer. The xsd:*Binary types
>>> don't define how to convert the data. There are many different ways to
>>> store integer values (big/little endian or even binary coded
>>> decimal...). If we use one of *Binary types we must add a description of
>>> their usage to the rsa:modulus property. I think that's not the idea of
>>> data types - They should be self describing.
>> 
>> There is a contradiction between your position here and your position
>> below. Below you take a string which is a sequence of characters to be
>> something self  explanatory, something which evidently can take the
>> place of a number, but above you have a sequence of bytes which you see
>> as problematically unable to do so. 
> 
> The *Binary data types are just strings, but with very specific rules
> how to fill them and how to extract the contained data. That's also true
> for pure strings if they are used for anything else than labels. The
> difference between *Binary and string is the first possible position for
> the description.
> 
> string->data types: *Binary
> string->property: rsa:modulus
> 
> This isn't based on some spec it's just my point of view.
> 
>> 
>> I imagine the endianess has been taken care of in these bytes. Do you
>> know that they have not?
> 
> Maybe it's more clear if we talk about some real world examples. Thats
> how it would look like in Java:
> 
> byte[] binary = {1, 2, 3, 4};
> BigInteger modulus = new BigInteger(binary);
> 
> 
> The first line represents the *Binary data types. The second line
> converts the binary to an integer. That's not defined by the *Binary
> data types. The endianness is defined in the BigInteger constructor.
> Usually numbers bigger than CPU registers are big endian, but we
> shouldn't rely on that.

yes, it's just a binary sequence of bytes, which is not a number.
Just as a string is a sequence of characters and is not a number.
At this point what applies to one, applies to the other too.

ie. if you have

:key stringModulus "af de 23"^^xsd:string .

or

:key byteModulus "afde23"^^xsd:hexBinary .

in neither case do you have a relation from a key to a number. Rather you have a relation from a key to in one case a character string, and in the other case a byte string.

What makes it possible to use these constructs as relations to numbers is the notion that behind those relations one can find the hidden relation

:key rsa:public_exponent [ :string ""af de 23"^^xsd:string ] .

or 

:key rsa:public_exponent [ :binary "af de 23"^^xsd:hexBinary ] .

That is why on the original text of the issue I raised here I mentioned the following rule

{ ?key rsa:modulus ?mod .
  ?mod :binary ?modBin } <=> { ?key :byteModulus ?modBin } 

http://www.w3.org/2005/Incubator/webid/track/issues/61

If you look at what the RDF Semantics specs says on Literals, that is exactly what literals do. They are functions from strings to objects such as numbers or byte sequences. i.e.: that is what cert:hex is: a function from strings to numbers. So by using a literal like cert:hex we are in fact directly mapping to the correct object: namely integers. That's what the mathematical definition of the modulus is.

> 
>> 
>>> 
>>> 
>>> xsd:string
>>> 
>>> +1
>>> 
>>> With xsd:string everything is like it's now, except it's no longer
>>> necessary to provide a data type in the RDF document and SPARQL query.
>>> One could argue that the rsa:modulus property requires a description how
>>> the contained data is coded, but in my opinion that's OK for a literal.
>>> The ASK SPARQL query should also work with "FILTER regex()".
>> 
>> You have exactly the same issue you have with cert:hex except that you
>> can then
>> no longer switch between types as you can now. Currently because both
>> modulus and exponent
>> are defined as numbers you can use any way to represent an integer,
>> including xsd:integer or
>> cert:hex . If at some point for example the xsd crowd came up with some
>> nicer definitions of 
>> hexadecimal or base64 numbers those could immediately be used.
> 
> Couldn't we still use these types in the future by adding the specific
> data type to the triple object?

not if we have a relation from a key to a string, because a string is not an integer, just like a byte array is not a number.

Currently you can in fact use xsd:integer to defined your modulus! It's just that it would feel very odd to people working in this field to see the modulus as an integer, and it would have little relation to most presentation most people would see it in. But we should probably mention that in the spec. (If we have not already done so)


>> 
>> With xsd:string in comparison to base64Binary you don't get any
>> automatica RDF tools support.
>> That is with base64Binary or hexBinary, as with xsd:integer or
>> xsd:decimal, the values are converted
>> directly into numbers that can be compared by the tools for equality.
> 
> I think support means the frameworks have already types/classes for the
> xsd:* data types. In Jena for example there are getHashCode and isEqual
> methods [1][2].
> 
>> 
>> Perhaps we can all try the following out. 
>> 
>> I created 2 data files which I think contain the same information
>> 
>> $ cat hexdata.n3 
>> @prefix : <http://localhost/test#> .
>> @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
>> 
>> 
>> :key :mod "E7E92EB9E08692CB8EB9072222B7"^^xsd:hexBinary .
>> 
>> 
>> $ cat base64data.n3  
>> @prefix : <http://localhost/test#> .
>> @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
>> 
>> 
>> :key :mod """AOfpLrnghpLLjrkHIiK3"""^^xsd:base64Binary 
>> 
>> And then I wrote a SPARQL query
>> 
>> $ cat test.sparql 
>> PREFIX : <http://localhost/test#> 
>> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> 
>> 
>> ASK {
>>  :key :mod "E7E92EB9E08692CB8EB9072222B7"^^xsd:hexBinary .
>> }
>> 
>> Finally I ran the sparql query against both files
>> 
>> $ arq --graph base64data.n3 --query test.sparql  
>> Ask => No
>> 
>> 
>> $ arq --graph hexdata.n3 --query test.sparql 
>> Ask => Yes
>> 
>> So unless I made a mistake in my conversion from hex to base64 - which
>> is quite possible - then it looks like Jena arc does not give me the
>> right results out of the box.
> 
> You added leading zeros before the conversion to Base64. The right value
> would be "5+kuueCGksuOuQciIrc=", but that also doesn't work. The SPARQL
> spec contains a list of data types [3] which are threated in a special
> way. *Binary is not in that list. Unless an implementation has
> additional features for the *Binary data types we don't get any benefit
> in our SPARQL queries.

Thanks for noticing that mistake. 

The next question would then be: how easy or difficult is it to add to the 
core of Jena - or other such tools - new transformation functions that would know how
to convert new Datatypes (or existing xsd ones) to the underlying objects, so that one would
get equality over different presentations of the same objects.



> 
>> 
>> Also what I notice is that base64 only reduces the length of the string
>> by one third or one quarter approximatively.  I was expecting more - but
>> it had been such a long time ago that I had looked in that space.
>> 
>> Anyway, this is what I mean by empirical evidence. +1 or -1 without any
>> discussion are not worth much. We are trying to come to conclusions
>> through some reasoned process here.
>> 
>> Henry
>> 
>> 
>> 
>> Social Web Architect
>> http://bblfish.net/
>> 
> 
> 
> [1]
> http://jena.sourceforge.net/javadoc/com/hp/hpl/jena/datatypes/xsd/XSDbase64Binary.html
> [2]
> http://jena.sourceforge.net/javadoc/com/hp/hpl/jena/datatypes/xsd/XSDhexBinary.html
> [3] http://www.w3.org/TR/rdf-sparql-query/#operandDataTypes

Social Web Architect
http://bblfish.net/
Received on Thursday, 17 November 2011 15:11:04 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 17 November 2011 15:11:04 GMT