Re: Blank nodes as predicates [was Re: Input needed from RDF group on JSON-LD skolemization] from David Booth on 2013-07-12 (public-linked-json@w3.org from July 2013)

From: David Booth <david@dbooth.org>
Date: Thu, 11 Jul 2013 22:59:40 -0400
To: Markus Lanthaler <markus.lanthaler@gmx.net>
CC: public-linked-json@w3.org
Message-ID: <51DF711C.1020506@dbooth.org>
On 07/10/2013 10:18 AM, Markus Lanthaler wrote:
> On Wednesday, July 10, 2013 4:47 AM, David Booth wrote:
>> Hold on, let's back up a moment and make sure that we are on the same
>> page about the overall objective.  Suppose I slightly extend Dave
>> Longley's example to add one more blank node property, such as:
>>
>> {
>>    ...
>>     "_:website_status": {
>>       "editor": {
>>         "id": "1",
>>         "changes": 4
>>       },
>>       "_:ad636ee3fb": true,
>>       "_:ee3fbad636": false
>>     }
>> }
>>
>> Surely, as a design goal, it should be possible for the client to
>> process this JSON-LD document either as JSON or as extended RDF.  So
>> suppose the document is interpreted as extended RDF by a client that is
>> *intended* to fully understand it.  And if you wish, you can even assume
>> that the client has has out-of-band information to understand the
>> meanings of the "private" data properties _:ad636ee3fb and _:ad636ee3fb.
>>    But how is RDF client expected to obtain the values of the
>> _:ad636ee3fb and _:ee3fbad636 properties?   Those two boolean statements
>> would effectively become (in Turtle):
>>
>>     _:website_status _:ad636ee3fb true ;
>>     _:website_status _:ee3fbad636 false .
>>
>> But in RDF, blank node labels are merely syntactic devices, so the above
>> is exactly the same as saying:
>>
>>     _:b1 _:b2 true ;
>>     _:b1 _:b3 false .
>
> What if I would have some (out-of-band) knowledge that tells me that
>
>    _:b2 rdfs:subPropertyOf <http://example.com/someTheClientUnderstands1> .
>    _:b2 rdfs:subPropertyOf <http://example.com/someTheClientUnderstands2> .

It is not possible in RDF to do that, because the blank node label _:b2 
has no meaning outside of the original graph.  There is no way, from 
outside of that graph, to refer to _:b2 by name.  It has no name outside 
of the original graph.

As Nathan Rixham aptly put it,  "The problem with blank nodes is that a 
blank node has a name that is not a name".  (Paraphrased, as I couldn't 
find his exact quote.)

To make this more evident, write it in this form:

      [] [] true .
      [] [] false .

It means *exactly* the same thing in RDF.

>
> This would then entail
>
>    _:b2 <http://example.com/someTheClientUnderstands1> true .
>    _:b2 <http://example.com/someTheClientUnderstands2> true .
>
> I would argue that this might be something very useful in a number of cases.
>
>
>> So how on earth can the RDF client figure out which of those private
>> properties is supposed to be true and which is supposed to be false?
>> It can't.  All it can determine is that there exists a property with a
>> true value and there exists a property with a false value.
>
> Right, without context it wouldn't be able to figure that out. Exactly the
> same happens if a client encounters a URL that doesn't resolve to anything
> useful, e.g., a skolem IRI.

No, it has nothing to do with context.  It is because a blank node has 
no name.  Even if a URL does not resolve, it is still a name that can be 
used, in conjunction with out-of-band information, to refer to that 
resource.

>
>
>> This use of blank nodes looks to me like a hack to intentionally make it
>> harder for an *RDF* downstream consumer -- even an *extended* *RDF*
>> downstream consumer that can handle blank node predicates -- to make use
>> of the data than for a pure JSON downstream consumer.  This seams to me
>> like an *anti*-design goal.  To my mind, the design goal should be the
>> opposite: to make it as easy for *both* JSON and RDF consumers to make
>> equivalent use of the document.
>
> OK, a different example:
>
>    {
>      "some_data": "I don't care about",
>      "maybe": {
>        "I": {
>          "just": {
>            "care_about_deeply_nested_data": [
>              {
>                "id": "markus",
>                "name": "Markus Lanthaler",
>                "authorOf": "http://www.w3.org/TR/json-ld/"
>              }
>            ]
>          }
>        }
>      }
>    }
>
> Now I can convert the pieces I'm interested in to some meaningful RDF with
> the following context:
>
>    {
>      "@context": {
>        "@vocab": "_:",
>        "id": "@id",
>        "name": "http://example.com/vocab#name",
>        "authorOf": { "@id": "http://example.com/vocab#authorOf", "@type":
> "@id" }
>      }
>    }
>
> This would result in the following triples:
>
>    _:b0 _:b1 _:b2 .
>    _:b0 _:b8 "I don't care about" .
>    _:b2 _:b3 _:b4 .
>    _:b4 _:b5 _:b6 .
>    _:b6 _:b7 <markus> .
>    <markus> <http://example.com/vocab#authorOf>
> <http://www.w3.org/TR/json-ld/> .
>    <markus> <http://example.com/vocab#name> "Markus Lanthaler" .
>
>
> In this case, most triples are useless for me but I do care about the last
> two and such use cases are very valuable. I'm sorry, but I can't see how it
> would anyone if those blank node predicates would be URLs. What would you
> gain?

I was assuming that the information that the author chose to include in 
the JSON was potentially important.  If it is important for a JSON 
processor to be able to access, then presumably it is potentially 
important for an RDF processor to access it.

> The danger is that other people start relying on them or start
> complaining that you use a plethora of different URLs for which they can't
> find any definition.

But we have the same risk in JSON already!  If the information is 
included in the JSON, the author *already* runs the risk of someone 
relying on it (even though they were told not to) and complaining that 
they cannot find a definition.

> Blank nodes by their very nature on the other hand make
> it clear that there's some relationship, the details however are unclear.

But as I pointed out, the RDF data becomes unusable -- *even* for an 
extended RDF processor that can handle blank nodes as predicates.

> Nothing of that requires any out-of-band information or contract between the
> publisher and a consumer.

The intent of this blank-nodes-as-properties feature seems to be to 
allow certain data to be available to *JSON* processors -- presumably 
because it is important in some way -- but *not* available to *RDF* 
processors.  That's quite a double standard.  In essence, it is trying 
to use blank nodes to comment out certain information that is carried in 
the JSON.

If you really don't want the information to be available to clients then 
it should not be included in the JSON **at all**.  And if the problem is 
that you have some obsolete information in the JSON that you want to 
remove but cannot remove, because clients would break if it were 
removed, then that is a JSON problem or an API design problem -- not an 
RDF problem.

The use of blank nodes as properties is just plain bad design.  If you 
really want a feature that allows certain JSON information to be 
commented out then map those properties to NIL or /dev/null or "" or 
such, and eliminate them entirely from the generated JSON-LD information 
model.  Don't try to make them available to some processors but not 
others, and don't ask for RDF to be extended for such a dubious use case 
that really has nothing to do with RDF.

David
Received on Friday, 12 July 2013 03:00:09 UTC