Re: Input needed from RDF group on JSON-LD skolemization from David Booth on 2013-07-09 (public-linked-json@w3.org from July 2013)

From: David Booth <david@dbooth.org>
Date: Tue, 09 Jul 2013 16:27:43 -0400
To: Dave Longley <dlongley@digitalbazaar.com>
CC: Markus Lanthaler <markus.lanthaler@gmx.net>, public-linked-json@w3.org
Message-ID: <51DC723F.3010502@dbooth.org>
Hi Dave,

Thanks for the example.  Responses below . . .

On 07/09/2013 02:20 PM, Dave Longley wrote:
> On 07/04/2013 12:28 PM, David Booth wrote:
>> [ . .  ] the other option is for JSON-LD to prohibit blank nodes as
>> properties.  Authors could simply use relative IRIs instead.
>
> So I don't consider this situation to be all that different from the one
> where an author elects not to provide any mappings at all for certain
> keys in their JSON. We currently allow this to happen -- and it's an
> important use case for at least two reasons:
>
> 1. It allows authors to slowly transition over to using JSON-LD --
> mapping only those keys in their data that they are ready to, that they
> are confident will be mapped to the correct URL. Also note that JSON
> developers know nothing about owl:sameAs and we don't need to introduce
> them to another level of complexity right out of the gate.

Right, but you are assuming that they understand the notion of 
"unstable" or "private" (not in a security sense).  Thus it seems 
perfectly reasonable to generate URIs that are explicitly marked as 
unstable, so that downstream consumers will not complain if they change:

   ...
   "@context": {
     "@vocab": "UNSTABLE/"
   }
   ...

>
> 2. It allows authors to use their APIs both as JSON and as JSON-LD. This
> covers two main uses: preventing existing consumers of JSON APIs from
> being messed with whilst allowing servers to upgrade and consolidate
> code paths, and allowing servers to include data that is intended to be
> "private" (not in a security sense) to one particular use of their API
> (eg: for an HTML interface to their data) without exposing it as valid
> data otherwise.
>
> The point of all this is that sometimes authors would prefer data to be
> "lost" in some scenarios, and not in others.

If the author wants the data to be lost, it should be omitted entirely 
or encrypted -- not included using blank nodes.

There is an important difference between stating that an identifier or 
data element is unstable (and hence downstream consumers should not rely 
on it to remain the same) and intentionally making it difficult for 
downstream consumers to use the data.  If the data is included in the 
JSON-LD document, and marked as UNSTABLE, it is the downstream 
consumer's business what they try to do with that data -- not the 
author's business.  The author *should* make clear that unstable data is 
not supported, but the author should not make it gratuitously more 
difficult for downstream consumers to use that data.

Blank nodes are *not* the right mechanism to use to prevent downstream 
consumers from using the data.  They do not prevent it from being used, 
they just make it harder.

> If the above option were
> available, it would allow authors to continue this useful practice
> whilst having the default behavior produce fully compliant RDF.
>
> For a more concrete example:
>
> Suppose a server has been serving this JSON for a while:
>
> {
>    "foo": "bar",
>    "about": {
>      "id": "1",
>      "name": "Phillip J. Fry"
>    },
>    "website_status": {
>      "editor": {
>        "id": "1",
>        "changes": 4
>      },
>      "ad636ee3fb": true
>    }
> }
>
> Clients that are consuming this data as JSON really only look at "foo"
> and maybe "about", except for the particular website client WC, which
> also makes use of "website_status". The author has communicated,
> out-of-band, that anything starting with "website_" is unstable data
> that should be ignored by consumers of the API.
>
> Now, the author of this data would like to make it consumable as RDF, so
> a change is made to include a @context that appropriately maps "foo",
> "about", "id", and "name" to URLs/aliases. Now any RDF clients (that
> understand JSON-LD) can understand the meaning of those keys. However,
> the author still only uses "website_status" on their local website and
> doesn't want to have to deal with keeping it stable for any clients.
> JSON clients are aware of this but so are RDF clients, as
> "website_status" has no meaning to them; it is dropped by JSON-LD
> processors. No out-of-band communication is necessary for the RDF clients.
>
> Now, suppose the author would like to make the "changes" data found in
> "website_status" available to RDF clients without changing their
> existing JSON structure. They would prefer not to leak indexed hashes of
> private information (that appear as hex JSON keys above) as stable
> predicates in their data. The meaning of those hash predicates or their
> range may change in the future. They'd also prefer not to leak
> "website_status". They may decide to update WC so it can consume RDF, at
> which time perhaps they'd want access to that information, but that's
> not in the plan right now. For now, they'd simply like RDF clients to
> take advantage of the "changes" data.
>
> Can they do this with minimal work on their end?

Sure, and they can do it by adding a context, without the use of blank 
node properties and without changing the JSON content:

   {
     "@context": {
       "foo":  "http://example/stable/foo",
       "about":  "http://example/stable/about",
       "changes":  "http://example/stable/changes",
       "@vocab": "UNSTABLE/"
       # Or:  "@vocab": "http://example/UNSTABLE/"
     },
      "foo": "bar",
      "about": {
        "@id": "1",
        "name": "Phillip J. Fry"
      },
      "website_status": {
        "editor": {
          "@id": "1",
          "changes": 4
        },
        "ad636ee3fb": true
      }
   }

How would this be done if blank nodes were permitted as properties? 
(You neglected to show that.)

I was unable to determine from the JSON-LD spec whether @vocab could be 
used to specify a blank node prefix.  (Can it?)  If not, then it seems 
to me that to achieve this with blank nodes, the JSON content would have 
to be changed, whereas it does not have to be changed if URIs are used. 
  That would be a *major* advantage of URIs over blank nodes in this 
example.

SIDE NOTE: I also did not see anything in the JSON-LD spec that would 
prohibit @vocab from specifying a relative IRI such as

   "@vocab": "UNSTABLE/"

Section B.7 says "If the context definition has a @vocab key, its value 
MUST be a absolute IRI, a compact IRI, a term, or null.":
http://json-ld.org/spec/latest/json-ld/#context-definitions

However, I notice that the playground expects absolute IRIs, so I don't 
know if I missed something or the playground is wrong:
http://json-ld.org/playground/

>
> If the author could map any non-specifically-mapped predicate to a blank
> node, then the author could easily achieve most of the above goals. This
> would allow the deeply-embedded "changes" data to be seen and output by
> a JSON-LD processor. If a JSON-LD processor, by default, dropped blank
> node predicates, they could achieve even more -- as most RDF clients
> would ignore the data that the author would prefer to be ignored. But if
> it can't be ignored, that's not so bad because at least it is only blank
> node data -- there are not mappings to URLs that the author really
> doesn't want. If a JSON-LD processor had an option for keeping those
> blank nodes, then their potential future plans (updating X to an RDF
> client) could also work out, as they'd know to set the special option to
> keep the data they want -- just for their website.

That's getting pretty contrived, to say that you want some clients to 
drop the information and others to retain the information.  I think it 
is the clients' business to decide what they wish to do with the 
information -- whether to keep it or drop it.

>
> If there is no way to map predicates to blank nodes, then the author has
> to consider other options. If the author uses relative URLs, they'd
> expose predicates that were never intended to be exposed and that have
> semantics that may change. The author wants to be able to innovate and
> play with that particular data before (if ever) it is linked to a stable
> URL.

Yes, but that is the whole point of marking certain APIs, names or data 
elements as "unstable" or "private".  Developers already understand that 
concept.  Blank nodes are not needed or intended for that.

> Instead of engaging in what they would consider data pollution, the
> author may instead elect to go through a costly API upgrade path that
> may break existing JSON clients.

Any developer knows that if they rely on an API that is explicitly 
marked as "private" or "unstable" they do so at their own risk.

>
> I think there are use cases where authors simply aren't "ready" to
> publish *all* their data or would like to reuse the same APIs for
> different purposes. By disallowing blank node predicates we make their
> lives more difficult.

Based on the examples shown, I do not see it as being significantly more 
difficult.  AFAICT there would not be much difference in what the author 
would have to do, whether converting properties to blank nodes or 
converting them to relative or unstable URIs.

> Perhaps some of these practices can be described
> as "anti-web" (hiding/siloing information), but I think that there are
> practical uses for them and that a blind opposition to "anti-web"
> practices is not a good policy. This is particularly true for cases
> where an author is actually trying to become less "anti-web", but they
> can't easily get there because it's all or nothing.
>

But as I showed above, I don't see it as all or nothing, as the author 
can achieve a similar result with URIs.

I still have not yet seen an example in which blank node properties 
really seem to be needed.  AFAICT the use of URIs has the net benefits 
of: (a) being friendlier to downstream RDF processing; (b) resulting in 
standard RDF; and (c) avoiding information loss.

David
Received on Tuesday, 9 July 2013 20:28:11 UTC