WebID-ISSUE-61 (xsd): xsd datatypes [ontologies]

WebID-ISSUE-61 (xsd): xsd datatypes [ontologies]

http://www.w3.org/2005/Incubator/webid/track/issues/61

Raised by: Henry Story
On product: ontologies

Currently  we use the cert:hex datatype, which was especially invented to be easy to read for humans: it is possible nearly to copy and paste a hex from a certificate viewer or from openssl tools and get it right. It is extreemly lenient. But it is less standard than using the  xsd datatypes that are discussed in the RDF semantics document  http://www.w3.org/TR/rdf-mt/#dtype_interp

 - xsd:base64Binary
 - xsd:hexBinary

and that are coninously developed and document in the latest xml schema document
   http://www.w3.org/TR/xmlschema11-2/#base64Binary

Here are the definitions of those types from that spec:

base64Binary:
-----------

  Definition:   base64Binary represents arbitrary Base64-encoded binary data.  For base64Binary data the entire binary stream is encoded using the Base64 Encoding defined in [RFC 3548], which is derived from the encoding described in [RFC 2045].
  Value Space: The ·value space· of base64Binary is the set of finite-length sequences of zero or more binary octets.  The length of a value is the number of octets.

 This allows for multiple line encodings.

hexBinary
--------

http://www.w3.org/TR/xmlschema11-2/#hexBinary
 Definition:  hexBinary represents arbitrary hex-encoded binary data. 
 Value Space: The ·value space· of hexBinary is the set of finite-length sequences of zero or more binary octets.  The length of a value is the number of octets.
The set recognized by hexBinary is the same as that recognized by the regular expression '([0-9a-fA-F]{2})*'.

white space can be collapsed: which means

 After the processing implied by replace, contiguous sequences of #x20's are collapsed to a single #x20, and any #x20 at the start or end of the string is then removed.

So hexBinarys cannot have white space between them - (but bblfish may be wrong)

Proposal
=====

Since the objects are binaries and not numbers one cannot use the same relations used currently: that is rsa:modulus is a relation from an rsa key to a number, but what we need is a relation from an rsa key to a binary of its modulus. Ie if we call the relation of a number to its binary :binary then the following rule applies:
 
 { ?key rsa:modulus ?mod .
   ?mod :binary ?modBin } <=> { ?key :mod ?modBin } 

where the :mod relation would be the relation from a key to the binary of its modulus. 


PROS AND CONS
==========

(at this stage of the analysis)

assuming the following foaf profile

  <http://bblfish.net/#hjs> cert:key [ rsa:mod """
                              5+kuueCGksuOuQciIrf7hjSRiahB8c3hd8hPjTH/6k+N
                              BKN+H0MRHPiSVCVwvvhstF2zmE6Ms0NwzSDWHuSO
                              qjEwu6+CKE8tvL0Y0OHkbkhVDhenLPQagKIWjXe0k4
                              CDIcizyNj1L8zRwsN0TaxrYZZPlaTx2/VpMI3ApaVKyb
                              /4+mJ4UZDBol9TMkTfyBbPq3iISMz6rt3vsNgksXar0D
                              CftGag2V2E1L/t8HvuDe0UaqKajsIlVtu/iUMSYKu41dZ
                              JCVCYm/DrqcX0m1aUwHAYWKtSap9Z5p7PnJVowqp2
                             /3jnsf7h6WlUN9yQtm/FeEeMp+3Mx7DokAYYTElTaQ==
               """^^xsd:base64Binary



 + it could make things like making a simple SPARQL ASK query possible, which would be a lot more efficient. Since RDF stores should know the equivalences between the various ways of writing out binaries,  one could query the DB with 

   SPARQL ASK {
      <http://bblfish.net/#hjs> cert:key [ rsa:mod "E7E92EB9E08692CB8EB9072222B7FB86349189A841F1CDE177C84F8D31FFEA4F8D04A37E1F43111CF892542570BEF86CB45DB3984E8CB34370CD20D61EE48EAA3130BBAF82284F2DBCBD18D0E1E46E48550E17A72CF41A80A2168D77B493808321C8B3C8D8F52FCCD1C2C3744DAC6B61964F95A4F1DBF569308DC0A5A54AC9BFF8FA62785190C1A25F533244DFC816CFAB788848CCFAAEDDEFB0D824B176ABD0309FB466A0D95D84D4BFEDF07BEE0DED146AA29A8EC22556DBBF89431260ABB8D5D6490950989BF0EBA9C5F49B5694C0701858AB526A9F59E69ECF9C9568C2AA76FF78E7B1FEE1E9695437DC90B66FC578478CA7EDCCC7B0E89006184C495369"^^xsd:hexBinary;
                                           rsa:exponent 65537 ] .
   }

(notice 512 characters )

   which would be the same query as 

   SPARQL ASK {
      <http://bblfish.net/#hjs> cert:key [ rsa:mod "E7E92EB9E08692CB8EB9072222B7FB86349189A841F1CDE177C84F8D31FFEA4F8D04A37E1F43111CF892542570BEF86CB45DB3984E8CB34370CD20D61EE48EAA3130BBAF82284F2DBCBD18D0E1E46E48550E17A72CF41A80A2168D77B493808321C8B3C8D8F52FCCD1C2C3744DAC6B61964F95A4F1DBF569308DC0A5A54AC9BFF8FA62785190C1A25F533244DFC816CFAB788848CCFAAEDDEFB0D824B176ABD0309FB466A0D95D84D4BFEDF07BEE0DED146AA29A8EC22556DBBF89431260ABB8D5D6490950989BF0EBA9C5F49B5694C0701858AB526A9F59E69ECF9C9568C2AA76FF78E7B1FEE1E9695437DC90B66FC578478CA7EDCCC7B0E89006184C495369"^^xsd:hexBinary;
                            rsa:exponent 65537 ] .
   }

(notice 344 characters)

  Ie, the two ASK queries should theoretically be able to return the same result for any profile containing the same numbers, howerver they end up being written out.

  [ but we need to verify if this is really the case ]

 
  - hex does not allow white space, making it more difficult to read. base64 does allow newlines it seems and white space perhaps"
  + it would be standard and supported by all tools ==> verify
  - xsd:base64Binary or xsd:hexBinary is a binary encoding, but we don't really know what it is and encoding of. The spec says "hexBinary represents arbitrary hex-encoded binary data.". http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/#base64Binary
A modulus and an exponent is a number - a positive natural number to be precise = but arbitrary data could be anything. So there would be some research to be done here to see how one can use that if one wants it to be a number, or how one should specify it, and what the binary number ordering problems are that come with it." 

Received on Wednesday, 16 November 2011 16:19:25 UTC