RE: Blank nodes as predicates [was Re: Input needed from RDF group on JSON-LD skolemization] from Markus Lanthaler on 2013-07-12 (public-linked-json@w3.org from July 2013)

From: Markus Lanthaler <markus.lanthaler@gmx.net>
Date: Fri, 12 Jul 2013 12:18:49 +0200
To: <public-linked-json@w3.org>
Message-ID: <00d801ce7ee9$35051b90$9f0f52b0$@lanthaler@gmx.net>
On Friday, July 12, 2013 5:00 AM, David Booth wrote:
> On 07/10/2013 10:18 AM, Markus Lanthaler wrote:
> > What if I would have some (out-of-band) knowledge that tells me that
> >
> >    _:b2 rdfs:subPropertyOf
<http://example.com/someTheClientUnderstands1> .
> >    _:b2 rdfs:subPropertyOf
<http://example.com/someTheClientUnderstands2> .
> 
> It is not possible in RDF to do that, because the blank node label _:b2
> has no meaning outside of the original graph.  There is no way, from
> outside of that graph, to refer to _:b2 by name.  It has no name
> outside of the original graph.

Since I am the client and I have out-of-band knowledge and I am the one
processing the graph I can simply inject that knowledge into the graph
before processing it. So it's certainly possible. How could someone possibly
prevent that?


> As Nathan Rixham aptly put it,  "The problem with blank nodes is that a
> blank node has a name that is not a name".  (Paraphrased, as I couldn't
> find his exact quote.)

I don't fully agree but that would surely end in a philosophical discussion
I don't have time to have right now.

 
> To make this more evident, write it in this form:
> 
>       [] [] true .
>       [] [] false .
> 
> It means *exactly* the same thing in RDF.
> 
> >
> > This would then entail
> >
> >    _:b2 <http://example.com/someTheClientUnderstands1> true .
> >    _:b2 <http://example.com/someTheClientUnderstands2> true .
> >
> > I would argue that this might be something very useful in a number of
> cases.
> >
> >
> >> So how on earth can the RDF client figure out which of those private
> >> properties is supposed to be true and which is supposed to be false?
> >> It can't.  All it can determine is that there exists a property with
> a
> >> true value and there exists a property with a false value.
> >
> > Right, without context it wouldn't be able to figure that out. Exactly
the
> > same happens if a client encounters a URL that doesn't resolve to
anything
> > useful, e.g., a skolem IRI.
> 
> No, it has nothing to do with context.  It is because a blank node has
> no name.  Even if a URL does not resolve, it is still a name that can be
> used, in conjunction with out-of-band information, to refer to that
> resource.

Strictly and naively speaking you are right. In practice however, it
couldn't matter less.

There are thousands of JSON APIs out there that work quite well in practice.
You could say that every property in every JSON document is a blank node. So
how does such a system work? Well, people simply rely on out-of-band
knowledge to interpret the data nevertheless. Nothing prevents my "prop"
property to mean something entirely different from your "prop". I the
consumer has enough context, it will understand them, if not, it will ignore
them. 

The goal of JSON-LD is to eliminate that out-of-band but we can't do so
overnight. Blank node properties provide us an elegant vehicle to gradually
migrate those existing infrastructures to fully linked data. If we make such
migrations too difficult, people just won't migrate. So you can either use
the sledgehammer to enforce theoretical pureness risking that people won't
use it or use a more pragmatic approach that simplifies the adoption of such
a technology in practice.


> > In this case, most triples are useless for me but I do care about the
last
> > two and such use cases are very valuable. I'm sorry, but I can't see how
it
> > would anyone if those blank node predicates would be URLs. What would
you
> > gain?
> 
> I was assuming that the information that the author chose to include in
> the JSON was potentially important.  If it is important for a JSON
> processor to be able to access, then presumably it is potentially
> important for an RDF processor to access it.

It's the author's job to decide that. In practice APIs often include debug
information or, e.g., data which is in there just to work around specific
practical problems in some specific clients. Should we prevent authors using
JSON-LD tool chains just because of that? I don't think so... and within a
tool chain you control you can also control that bnode labels don't change
for that matter.


> > The danger is that other people start relying on them or start
> > complaining that you use a plethora of different URLs for which they
can't
> > find any definition.
> 
> But we have the same risk in JSON already!  If the information is
> included in the JSON, the author *already* runs the risk of someone
> relying on it (even though they were told not to) and complaining that
> they cannot find a definition.

In JSON, you have to rely completely on the document's structure. It's dead
simple but becomes problematic if you have to use dozens of different APIs.
In JSON-LD we have better mechanisms, nevertheless people will need some
time to adopt such a radically different approach. Blank node predicates
allow us to make that upgrade path much smoother.


> > Blank nodes by their very nature on the other hand make
> > it clear that there's some relationship, the details however are
> > unclear.
> 
> But as I pointed out, the RDF data becomes unusable -- *even* for an
> extended RDF processor that can handle blank nodes as predicates.

Not if you can rely on out-of-band information.. and it definitely also
depends on what you are trying to do. Google build a billion dollar business
out of following links they didn't understand (at least in the beginning).


> > Nothing of that requires any out-of-band information or contract between
the
> > publisher and a consumer.
> 
> The intent of this blank-nodes-as-properties feature seems to be to
> allow certain data to be available to *JSON* processors -- presumably
> because it is important in some way -- but *not* available to *RDF*
> processors.  That's quite a double standard.  In essence, it is trying
> to use blank nodes to comment out certain information that is carried
> in the JSON.

I don't see any problem with that. The same is true for RDFa in HTML.


> If you really don't want the information to be available to clients then
> it should not be included in the JSON **at all**.  And if the problem is
> that you have some obsolete information in the JSON that you want to
> remove but cannot remove, because clients would break if it were
> removed, then that is a JSON problem or an API design problem -- not an
> RDF problem.

So? Who cares whose problem it is? The fact is that it will be a problem
many potential adopters will face. We have a solution for it and I'm not
willing to take it out just for the sake of complying with an arbitrary
limitation.


> The use of blank nodes as properties is just plain bad design.  If you
> really want a feature that allows certain JSON information to be
> commented out then map those properties to NIL or /dev/null or "" or
> such, and eliminate them entirely from the generated JSON-LD information
> model.  Don't try to make them available to some processors but not
> others, and don't ask for RDF to be extended for such a dubious use
> case that really has nothing to do with RDF.

Well, that's your opinion. I care more about developer's problems in their
day to day jobs than theoretical purity - especially if one gets in the way
of the other. In a perfect world you wouldn't need blank nodes at all.



--
Markus Lanthaler
@markuslanthaler
Received on Friday, 12 July 2013 10:19:24 UTC