RE: Input needed from RDF group on JSON-LD skolemization from Markus Lanthaler on 2013-07-09 (public-linked-json@w3.org from July 2013)

From: Markus Lanthaler <markus.lanthaler@gmx.net>
Date: Tue, 9 Jul 2013 23:27:24 +0200
To: <public-linked-json@w3.org>
Message-ID: <027201ce7ceb$1bb6b2c0$53241840$@lanthaler@gmx.net>
On Tuesday, July 09, 2013 10:28 PM, David Booth wrote:
> Right, but you are assuming that they understand the notion of
> "unstable" or "private" (not in a security sense).  Thus it seems
> perfectly reasonable to generate URIs that are explicitly marked as
> unstable, so that downstream consumers will not complain if they
> change:
> 
>    ...
>    "@context": {
>      "@vocab": "UNSTABLE/"
>    }
>    ...

That would require out-of-band knowledge. The whole reason to use JSON-LD
instead of JSON is to *not* require out-of-band knowledge.


> > The point of all this is that sometimes authors would prefer data to
> > be "lost" in some scenarios, and not in others.
> 
> If the author wants the data to be lost, it should be omitted entirely
> or encrypted -- not included using blank nodes.

The key here is "in some scenarios". If we encounter a property which isn't
mapped to an IRI (or keyword) we drop the whole subtree when processing a
document. In some cases you may be interested in some parts of such a
subtree. Mapping some intermediary properties to bnodes allows you to do
that.


> There is an important difference between stating that an identifier or
> data element is unstable (and hence downstream consumers should not rely
> on it to remain the same) and intentionally making it difficult for
> downstream consumers to use the data.  If the data is included in the
> JSON-LD document, and marked as UNSTABLE, it is the downstream
> consumer's business what they try to do with that data -- not the
> author's business.  The author *should* make clear that unstable data is
> not supported, but the author should not make it gratuitously more
> difficult for downstream consumers to use that data.

Again, this would require out-of-band information.


> Blank nodes are *not* the right mechanism to use to prevent downstream
> consumers from using the data.  They do not prevent it from being used,
> they just make it harder.

Right. Thus I would still prefer to expose them in the to RDF algorithm by
default (even though I could be convinced otherwise). We would add an option
to discard them on the user's behalf if he can't make any use of them.


> Sure, and they can do it by adding a context, without the use of blank
> node properties and without changing the JSON content:
> 
>    {
>      "@context": {
>        "foo":  "http://example/stable/foo",
>        "about":  "http://example/stable/about",
>        "changes":  "http://example/stable/changes",
>        "@vocab": "UNSTABLE/"
>        # Or:  "@vocab": "http://example/UNSTABLE/"
>      },
>       "foo": "bar",
>       "about": {
>         "@id": "1",
>         "name": "Phillip J. Fry"
>       },
>       "website_status": {
>         "editor": {
>           "@id": "1",
>           "changes": 4
>         },
>         "ad636ee3fb": true
>       }
>    }
> 
> How would this be done if blank nodes were permitted as properties?
> (You neglected to show that.)

I showed that in an earlier email. Just use "@vocab": "_:".


> I was unable to determine from the JSON-LD spec whether @vocab could be
> used to specify a blank node prefix.  (Can it?)  If not, then it seems
> to me that to achieve this with blank nodes, the JSON content would have
> to be changed, whereas it does not have to be changed if URIs are used.
>   That would be a *major* advantage of URIs over blank nodes in this
> example.
>
> SIDE NOTE: I also did not see anything in the JSON-LD spec that would
> prohibit @vocab from specifying a relative IRI such as
> 
>    "@vocab": "UNSTABLE/"
> 
> Section B.7 says "If the context definition has a @vocab key, its value
> MUST be a absolute IRI, a compact IRI, a term, or null.":
> http://json-ld.org/spec/latest/json-ld/#context-definitions
> 
> However, I notice that the playground expects absolute IRIs, so I don't
> know if I missed something or the playground is wrong:
> http://json-ld.org/playground/

The spec and the playground are right. Relative IRIs in @vocab are not
supported, blank node identifiers are (they are interpreted as absolute IRIs
with a "_" scheme, I will clarify that in the spec and add a test case).

> > If the author could map any non-specifically-mapped predicate to a blank
> > node, then the author could easily achieve most of the above goals. This
> > would allow the deeply-embedded "changes" data to be seen and output by
> > a JSON-LD processor. If a JSON-LD processor, by default, dropped blank
> > node predicates, they could achieve even more -- as most RDF clients
> > would ignore the data that the author would prefer to be ignored. But if
> > it can't be ignored, that's not so bad because at least it is only blank
> > node data -- there are not mappings to URLs that the author really
> > doesn't want. If a JSON-LD processor had an option for keeping those
> > blank nodes, then their potential future plans (updating X to an RDF
> > client) could also work out, as they'd know to set the special option to
> > keep the data they want -- just for their website.
> 
> That's getting pretty contrived, to say that you want some clients to
> drop the information and others to retain the information.  I think it
> is the clients' business to decide what they wish to do with the
> information -- whether to keep it or drop it.

Fully agree. That's why I proposed to add a flag to the to RDF algorithm to
let clients decide.


> > If there is no way to map predicates to blank nodes, then the author has
> > to consider other options. If the author uses relative URLs, they'd
> > expose predicates that were never intended to be exposed and that have
> > semantics that may change. The author wants to be able to innovate and
> > play with that particular data before (if ever) it is linked to a stable
> > URL.
> 
> Yes, but that is the whole point of marking certain APIs, names or data
> elements as "unstable" or "private".  Developers already understand that
> concept.  Blank nodes are not needed or intended for that.

Yes, they understand that and that's exactly the reason some have started to
use blank node properties when converting legacy data to JSON-LD. Sometimes
they are just interested in mapping some deeply nested data.


> > Instead of engaging in what they would consider data pollution, the
> > author may instead elect to go through a costly API upgrade path that
> > may break existing JSON clients.
> 
> Any developer knows that if they rely on an API that is explicitly
> marked as "private" or "unstable" they do so at their own risk.

How is it explicitly marked as unstable? Somewhere in an API documentation?
If I don't have to, I don't go and read that.


> > I think there are use cases where authors simply aren't "ready" to
> > publish *all* their data or would like to reuse the same APIs for
> > different purposes. By disallowing blank node predicates we make their
> > lives more difficult.
> 
> Based on the examples shown, I do not see it as being significantly more
> difficult.  AFAICT there would not be much difference in what the author
> would have to do, whether converting properties to blank nodes or
> converting them to relative or unstable URIs.

Of course not. The same is true for blank node subjects. Does that mean that
blank node subjects are useless? No, in some cases you can't guarantee a
stable identifier and thus you use a blank node instead. The same is true
for predicates. You just need some context to be able to interpret/integrate
those blank nodes.


> > Perhaps some of these practices can be described
> > as "anti-web" (hiding/siloing information), but I think that there are
> > practical uses for them and that a blind opposition to "anti-web"
> > practices is not a good policy. This is particularly true for cases
> > where an author is actually trying to become less "anti-web", but they
> > can't easily get there because it's all or nothing.
> 
> But as I showed above, I don't see it as all or nothing, as the author
> can achieve a similar result with URIs.

I disagree. This is not a similar result. URIs are global identifiers. They
can be reused by anyone else without coordination. Blank nodes can not. I
think that's a substantial difference. Producing heaps of useless URIs
doesn't help at all. The only argument for disallowing blank node predicates
is that RDF has that arbitrary limitation because RDF/XML wasn't able to
support them. There have been requests to allow them for over a decade. We
are in the unfortunate situation that we can't change RDF in that regard but
we don't have to introduce the same limitation in JSON-LD. If you want URLs,
skolemize the blank node properties. If you don't want them (because you
can't interpret them anyways), throw them away. It's up to you.


> I still have not yet seen an example in which blank node properties
> really seem to be needed.  AFAICT the use of URIs has the net benefits
> of: (a) being friendlier to downstream RDF processing; (b) resulting in
> standard RDF; and (c) avoiding information loss.

As soon as you use inference they also start to make sense in the "RDF
world". Just think, e.g., of a owl:inverseOf defined on the fly (yes I know
that JSON-LD has @reverse).

I can't really see how you can claim that information is lost when we expose
those generalized RDF triples knowing that you can skolemize those bnodes if
you need to. They have exactly the same meaning as the meaningless relative
IRIs that you suggest as alternative.


--
Markus Lanthaler
@markuslanthaler
Received on Tuesday, 9 July 2013 21:27:56 UTC