Re: Input needed from RDF group on JSON-LD skolemization from David Booth on 2013-06-17 (public-rdf-comments@w3.org from June 2013)

From: David Booth <david@dbooth.org>
Date: Mon, 17 Jun 2013 12:07:19 -0400
To: Markus Lanthaler <markus.lanthaler@gmx.net>
CC: 'public-rdf-comments' <public-rdf-comments@w3.org>
Message-ID: <51BF3437.40602@dbooth.org>
On 06/17/2013 05:31 AM, Markus Lanthaler wrote:
> On Monday, June 17, 2013 2:21 AM, David Booth wrote:
>> On 06/16/2013 01:45 PM, Pat Hayes wrote:
>>> Well, I guess that I agree with David that if you call it JSON-to-RDF
>>> and say it converts to RDF datasets, then it ought to actually
>>> convert to RDF datasets. You could fix this my just changing the
>>> first sentence above, which is not strictly true at present. The
>>> algorithm changes JSON-LD to something like an RDF dataset that is
>>> sometimes an RDF dataset.
>>
>> that would make the claim true, but at the expense of larger goals,
>> because it would make JSON-LD significantly less useful as an RDF
>> serialization.
>
> Why would it make "JSON-LD significantly less useful as an RDF
> serialization"? It may make the to RDF algorithm "less useful" because in
> your specific implementation you would have to wrap it in another algorithm
> which does nothing else than replacing bnode ids with skolem IRIs. Such
> layering is a normal engineering process and we even give guidance how to do
> it.

it would make JSON-LD significantly less useful as an RDF serialization 
because there would be more information loss in interpreting JSON-LD as 
RDF.  remember that we are talking about standards here -- not private 
agreements.  if two independent parties following the JSON-LD standard 
de-serialize the same JSON-LD document to RDF, they should arrive at 
essentially the same RDF Dataset.  some slight differences are 
unavoidable due to datatype mismatches and choice of skolem IRIs.  but 
otherwise the RDF Datasets should be identical: they should be 
isomorphic, with the same URIs in the same places, and the exact same 
number of triples.

if JSON-LD does not require skolemization of JSON-LD bnodes (in 
positions where they would otherwise become illegal RDF bnodes) then 
either: (a) there would be significant and unnecessary information loss 
by discarding those triples that would be illegal RDF; or (b) the RDF 
interpretation of JSON-LD would be unpredictable in unnecessarily 
significant ways.

does that explanation help?

>
>
>>> But I don't see why you would need to go to RFC2119 to do this, just
>>> either change the claim for the algorithm, or put the skolemization
>>> step into the algorithm itself.
>>
>> the point is to require skolemization in the algorithm, rather than
>> making it optional as it is currently stands.  wehether that is
>> achieved through careful prose or standard 2119 terminology is an
>> editorial matter.
>
> The problem I have with requiring skolemization is that it isn't
> implementable in some cases. For instance, if you are a client you can't
> guarantee that you replace bnode ids with *new, globally unique* IRIs
> because you don't have an IRI space you control. Using an arbitrary base or
> UUIDs will generate clashes (even if the likelihood might be small).

if the client is not going to re-publish the resulting RDF then I do not 
see why it would matter.  The Client would only need to ensure that the 
skolem IRIs are unique within its data, and this could be easily done 
using a counter.  the point of the requirement that skolem IRIs be 
"globally unique" is not that they really do need to be globally unique, 
it is to: (a) suggest how to avoid clashes; and (b) to be clear that if 
there is a clash then you have done it wrong.   in other words, as long 
as there are no *observable* clashes then there is no problem if the 
skolem IRIs cannot be guaranteed to be unique.

also, even if a client does not own a domain name, there are other ways 
that "globally unique" IRIs can be minted.  any standard set of 
properties, whose combined values are unique, can be used to create 
globally unique IRIs.  for example, the combination of an email address 
a time and a sequence number can be used, such as

 
http://w3.org/.well-known/genid/delegated/emailtime/{ENCODED_EMAIL}/{UTC_DATETIME}/{SEQUENCE}

(if w3.org chose to delegate that portion of its URI space for this 
purpose).

does that address your concerns?  and if not, can you further explain 
the use case you have in mind?

thanks,
David
Received on Monday, 17 June 2013 16:07:50 UTC