Re: Input needed from RDF group on JSON-LD skolemization from David Booth on 2013-07-03 (public-linked-json@w3.org from July 2013)

From: David Booth <david@dbooth.org>
Date: Wed, 03 Jul 2013 00:08:19 -0400
To: Markus Lanthaler <markus.lanthaler@gmx.net>
CC: public-linked-json@w3.org
Message-ID: <51D3A3B3.1030300@dbooth.org>
On 07/02/2013 01:30 PM, Markus Lanthaler wrote:
> On Tuesday, July 02, 2013 7:01 PM, David Booth wrote:
> [...]
>>> I don't think so. It may be the author of the document who decides to
> just
>>> expose parts of a JSON-LD document as "RDF". I anticipate that there
> will be
>>> quite some APIs that will gradually transform the JSON APIs to JSON-LD
> APIs.
>>> Without allowing bnode-predicates this becomes considerably harder to do
> as
>>> the example illustrates.
>>
>> Okay, I think I now see what you mean.  You are talking about a
>> situation in which an author is incrementally migrating from JSON to
>> JSON-LD.  If that is correct, then I can see why the author may not
>> want
>> to take the time to look up and specify an appropriate context URI for
>> each property.
>>
>> But why couldn't the author just treat the properties as relative URIs
>> instead of blank nodes?  I.e., why not do something like this instead:
>>
>>   >>> {
>>   >>>     "@context": {
>>   >>>       "@vocab": "http://example/",
>>   >>>       "name": "http://xmlns.com/foaf/0.1/name",
>>   >>>       ...
>>   >>>     }
>>   >>> }
>>
>> or even something like this, making use of the base URI:
>>
>>   >>> {
>>   >>>     "@context": {
>>   >>>       "@vocab": "",
>>   >>>       "name": "http://xmlns.com/foaf/0.1/name",
>>   >>>       ...
>>   >>>     }
>>   >>> }
>>
>> Wouldn't something like that work?  I don't yet see why specifically
>> blank nodes would be needed for this.
>
> Sure, it would work.. but it would also set an expectation on consumers of
> such data to be able to dereference the resulting IRIs to get the definition
> of those properties. JSON-LD is all about Linked Data.
>
> Yes, advocating bnodes in the context of Linked Data is strange but I find
> it better to use identifiers which are explicitly marked as being only
> locally valid if you can neither guarantee their stability nor provide
> dereferenceable IRIs.
>
> Is there a reason why you don't like bnodes-as-predicates apart from the
> fact that standard RDF doesn't allow them?

Yes: blank nodes are evil.  :)  There is a whole history of discussion 
about the evils and benefits of blank nodes, and I don't think it would 
make sense to delve deeply into that discussion, but the pros/cons 
basically boil down to this: blank nodes are convenient for RDF authors 
but a pain for downstream RDF consumers.  (Sandro Hawke gets credit for 
so succinctly summarizing the pros/cons that way.)

It is true that IRIs generated this way would not be dereferenceable, 
but this seems to me like a perfect example of why dereferenceable IRIs 
are a "SHOULD" instead of a "MUST".  And a benefit of using IRIs is that 
later on, those IRIs could potentially be made dereferenceable, and that 
is not possible with blank nodes, as blank nodes are never dereferenceable.

Regarding stability, AFAICT relative IRIs would be nearly as stable as 
any versioned IRI: the IRI may change if the author decides to version 
it, but aside from that it is exactly the same every time the data is 
generated, even if other data elements are added, etc.  That is far 
better than blank nodes, which have no stability at all.  (That's one of 
the reasons they are such a pain for downstream RDF consumers.)

In summary, it seems to me that in comparing blank nodes with relative 
IRIs: (a) blank nodes are far less friendly to downstream RDF 
consumption; (b) neither would likely be dereferenceable initially, but 
relative IRIs could later be made deferenceable, whereas blank nodes 
cannot; and (c) relative IRIs would be far more stable than blank nodes 
-- comparable stability to other versioned IRIs.

The only significant downside I see to relative IRIs is that they create 
an expectation of being dereferenceable, and that expectation 
(presumably) would not initially be met.  That seems to me like a small 
price to pay for the concrete benefits that are obtained from having 
IRIs instead of blank nodes.

>
>
>>>> Any client may ignore any information it
>>>> wants, but it is important that different JSON-LD standards-compliant
>>>> parsers, both parsing the same JSON-LD document in an attempt to obtain
>>>> the JSON-LD standards-compliant RDF interpretation of that JSON-LD
>>>> document, should obtain the same set of RDF triples (except for blank
>>>> node labels and possibly data type conversion).
>>>
>>> And that's the case right now. Every compliant JSON-LD parser is
> required to
>>> produce exactly the same generalized RDF dataset.
>>
>> It is also good to have JSON-LD parsers produce the same *extended* RDF
>> datasets if the user chooses to get extended RDF.  But the case that I
>> am trying to address is the case where the user expects *standard* RDF
>> -- ensuring that the mapping is deterministic with minimal information
>> loss.
>
> OK. Why do you believe a consumer expecting standard RDF isn't able to
> transform the extended RDF according to his needs to standard RDF? Why do we
> need to prescribe how to do this?

Because we are defining a *standard*.  Look, by analogy, suppose JSON-LD 
were defined to be a non-standard *extension* of JSON, such that people 
using *standard* JSON tools could not process JSON-LD without somehow 
converting their JSON-LD documents to JSON, but the conversion process 
was not standardized.   Different parties using standard JSON tools 
would then interpret the same JSON-LD document differently.  To my mind, 
that would be an obviously undesirable situation, as it defeats the 
purpose of defining standards.

If a vendor wants to support value-added extensions then that is fine. 
But I would expect *standard* JSON-LD parsers by *default* to produce 
*standard* RDF -- not extended RDF -- although it is fine and good for 
them to have an option for producing extended RDF.

>
> All it would buy us is that some implementations may not be able to called
> conformant anymore (those who decide to not implement skolemization).
> There's no way to enforce what consumers do with the data anyway.
>
> The easiest way out of this would be to define some additional product
> classes:
>    a) an "extended RDF to standard RDF converter using skolemization"
>    b) an "extended RDF to standard RDF converter discarding the extensions"
>
> Then we could say that class a) implementations MUST transform bnodes used
> in predicates to skolem IRIs.

Actually, this discussion has convinced me that prohibiting blank node 
properties would be a better solution than skolemizing.

>
> Unfortunately, I still can't see what the advantage of doing so would be?
> Why does this need to be in the JSON-LD spec?

It needs to be in the JSON-LD spec so that any two JSON-LD-compliant 
parsers will produce the exact same RDF triples when parsing the same 
JSON-LD document to standard RDF (except for blank node labels and 
possibly datatype conversions).  That kind of predictability across 
implementations is the reason we define standards.

David
Received on Wednesday, 3 July 2013 04:08:46 UTC