W3C home > Mailing lists > Public > public-xg-webid@w3.org > November 2011

Re: WebID-ISSUE-61 (xsd): xsd datatypes [ontologies]

From: Melvin Carvalho <melvincarvalho@gmail.com>
Date: Wed, 16 Nov 2011 18:35:44 +0100
Message-ID: <CAKaEYhJtX6hRbDQjB5_rWcp_yL3vmt8vST+42iexUfxAv81pfA@mail.gmail.com>
To: WebID Incubator Group WG <public-xg-webid@w3.org>
On 16 November 2011 17:19, WebID Incubator Group Issue Tracker
<sysbot+tracker@w3.org> wrote:
>
> WebID-ISSUE-61 (xsd): xsd datatypes [ontologies]
>
> http://www.w3.org/2005/Incubator/webid/track/issues/61
>
> Raised by: Henry Story
> On product: ontologies
>
> Currently  we use the cert:hex datatype, which was especially invented to be easy to read for humans: it is possible nearly to copy and paste a hex from a certificate viewer or from openssl tools and get it right. It is extreemly lenient. But it is less standard than using the  xsd datatypes that are discussed in the RDF semantics document  http://www.w3.org/TR/rdf-mt/#dtype_interp
>
>  - xsd:base64Binary
>  - xsd:hexBinary
>
> and that are coninously developed and document in the latest xml schema document
>   http://www.w3.org/TR/xmlschema11-2/#base64Binary
>
> Here are the definitions of those types from that spec:
>
> base64Binary:
> -----------
>
>  Definition:   base64Binary represents arbitrary Base64-encoded binary data.  For base64Binary data the entire binary stream is encoded using the Base64 Encoding defined in [RFC 3548], which is derived from the encoding described in [RFC 2045].
>  Value Space: The ·value space· of base64Binary is the set of finite-length sequences of zero or more binary octets.  The length of a value is the number of octets.
>
>  This allows for multiple line encodings.
>
> hexBinary
> --------
>
> http://www.w3.org/TR/xmlschema11-2/#hexBinary
>  Definition:  hexBinary represents arbitrary hex-encoded binary data.
>  Value Space: The ·value space· of hexBinary is the set of finite-length sequences of zero or more binary octets.  The length of a value is the number of octets.
> The set recognized by hexBinary is the same as that recognized by the regular expression '([0-9a-fA-F]{2})*'.
>
> white space can be collapsed: which means
>
>  After the processing implied by replace, contiguous sequences of #x20's are collapsed to a single #x20, and any #x20 at the start or end of the string is then removed.
>
> So hexBinarys cannot have white space between them - (but bblfish may be wrong)
>
> Proposal
> =====
>
> Since the objects are binaries and not numbers one cannot use the same relations used currently: that is rsa:modulus is a relation from an rsa key to a number, but what we need is a relation from an rsa key to a binary of its modulus. Ie if we call the relation of a number to its binary :binary then the following rule applies:
>
>  { ?key rsa:modulus ?mod .
>   ?mod :binary ?modBin } <=> { ?key :mod ?modBin }
>
> where the :mod relation would be the relation from a key to the binary of its modulus.
>
>
> PROS AND CONS
> ==========
>
> (at this stage of the analysis)
>
> assuming the following foaf profile
>
>  <http://bblfish.net/#hjs> cert:key [ rsa:mod """
>                              5+kuueCGksuOuQciIrf7hjSRiahB8c3hd8hPjTH/6k+N
>                              BKN+H0MRHPiSVCVwvvhstF2zmE6Ms0NwzSDWHuSO
>                              qjEwu6+CKE8tvL0Y0OHkbkhVDhenLPQagKIWjXe0k4
>                              CDIcizyNj1L8zRwsN0TaxrYZZPlaTx2/VpMI3ApaVKyb
>                              /4+mJ4UZDBol9TMkTfyBbPq3iISMz6rt3vsNgksXar0D
>                              CftGag2V2E1L/t8HvuDe0UaqKajsIlVtu/iUMSYKu41dZ
>                              JCVCYm/DrqcX0m1aUwHAYWKtSap9Z5p7PnJVowqp2
>                             /3jnsf7h6WlUN9yQtm/FeEeMp+3Mx7DokAYYTElTaQ==
>               """^^xsd:base64Binary
>
>
>
>  + it could make things like making a simple SPARQL ASK query possible, which would be a lot more efficient. Since RDF stores should know the equivalences between the various ways of writing out binaries,  one could query the DB with
>
>   SPARQL ASK {
>      <http://bblfish.net/#hjs> cert:key [ rsa:mod "E7E92EB9E08692CB8EB9072222B7FB86349189A841F1CDE177C84F8D31FFEA4F8D04A37E1F43111CF892542570BEF86CB45DB3984E8CB34370CD20D61EE48EAA3130BBAF82284F2DBCBD18D0E1E46E48550E17A72CF41A80A2168D77B493808321C8B3C8D8F52FCCD1C2C3744DAC6B61964F95A4F1DBF569308DC0A5A54AC9BFF8FA62785190C1A25F533244DFC816CFAB788848CCFAAEDDEFB0D824B176ABD0309FB466A0D95D84D4BFEDF07BEE0DED146AA29A8EC22556DBBF89431260ABB8D5D6490950989BF0EBA9C5F49B5694C0701858AB526A9F59E69ECF9C9568C2AA76FF78E7B1FEE1E9695437DC90B66FC578478CA7EDCCC7B0E89006184C495369"^^xsd:hexBinary;
>                                           rsa:exponent 65537 ] .
>   }
>
> (notice 512 characters )
>
>   which would be the same query as
>
>   SPARQL ASK {
>      <http://bblfish.net/#hjs> cert:key [ rsa:mod "E7E92EB9E08692CB8EB9072222B7FB86349189A841F1CDE177C84F8D31FFEA4F8D04A37E1F43111CF892542570BEF86CB45DB3984E8CB34370CD20D61EE48EAA3130BBAF82284F2DBCBD18D0E1E46E48550E17A72CF41A80A2168D77B493808321C8B3C8D8F52FCCD1C2C3744DAC6B61964F95A4F1DBF569308DC0A5A54AC9BFF8FA62785190C1A25F533244DFC816CFAB788848CCFAAEDDEFB0D824B176ABD0309FB466A0D95D84D4BFEDF07BEE0DED146AA29A8EC22556DBBF89431260ABB8D5D6490950989BF0EBA9C5F49B5694C0701858AB526A9F59E69ECF9C9568C2AA76FF78E7B1FEE1E9695437DC90B66FC578478CA7EDCCC7B0E89006184C495369"^^xsd:hexBinary;
>                            rsa:exponent 65537 ] .
>   }
>
> (notice 344 characters)
>
>  Ie, the two ASK queries should theoretically be able to return the same result for any profile containing the same numbers, howerver they end up being written out.
>
>  [ but we need to verify if this is really the case ]
>
>
>  - hex does not allow white space, making it more difficult to read. base64 does allow newlines it seems and white space perhaps"
>  + it would be standard and supported by all tools ==> verify
>  - xsd:base64Binary or xsd:hexBinary is a binary encoding, but we don't really know what it is and encoding of. The spec says "hexBinary represents arbitrary hex-encoded binary data.". http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/#base64Binary
> A modulus and an exponent is a number - a positive natural number to be precise = but arbitrary data could be anything. So there would be some research to be done here to see how one can use that if one wants it to be a number, or how one should specify it, and what the binary number ordering problems are that come with it."

-1 xsd:base64Binary -- henry is this your proposal?

+1 xsd:hexBinary

+1 xsd:string

Basic reasoning, in general I prefer simplicity, I'd rather reuse if
it's practical, than create a new term.

Happy to go with whatever is decided, including keeping cert:hex.  But
if we are changing cert:int, it would be logical to package a change
of cert:hex at the same time.

>
>
>
>
>
Received on Wednesday, 16 November 2011 17:36:23 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 16 November 2011 17:36:25 GMT