Re: Input needed from RDF group on JSON-LD skolemization from David Booth on 2013-07-04 (public-linked-json@w3.org from July 2013)

From: David Booth <david@dbooth.org>
Date: Wed, 03 Jul 2013 21:53:56 -0400
To: Markus Lanthaler <markus.lanthaler@gmx.net>
CC: public-linked-json@w3.org
Message-ID: <51D4D5B4.3010802@dbooth.org>
On 07/03/2013 05:06 AM, Markus Lanthaler wrote:
> On Wednesday, July 03, 2013 6:08 AM, David Booth wrote:
>> It is true that IRIs generated this way would not be dereferenceable,
>> but this seems to me like a perfect example of why dereferenceable IRIs
>> are a "SHOULD" instead of a "MUST".  And a benefit of using IRIs is that
>> later on, those IRIs could potentially be made dereferenceable, and that
>> is not possible with blank nodes, as blank nodes are never
>> dereferenceable.
>
> I think that's the whole reason of using a blank node. I don't won't it to
> become dereferenceable ever. If I wish to do so, I replace it with a
> concrete IRI.

No, that is not the whole point or even the main point of using a blank 
node, though it may be *your* purpose in using them.

>
>> Regarding stability, AFAICT relative IRIs would be nearly as stable as
>> any versioned IRI: the IRI may change if the author decides to version
>> it, but aside from that it is exactly the same every time the data is
>> generated, even if other data elements are added, etc.  That is far
>
> I completely disagree. While technically you are right, the whole point of
> using a bnode is to convey it is in fact *not stable* and is not intended to
> be.

Again, you may think of blank nodes that way if you wish, but that is 
not why they were invented.

>
>> better than blank nodes, which have no stability at all.  (That's one
>> of the reasons they are such a pain for downstream RDF consumers.)
>
> That's a feature, not a bug IMO. I can create properties on the fly, perhaps
> even describe what they mean in the current context, but consumers should
> not start to rely on those properties. Simplest example:
>
>    _:knownBy owl:inverseOf <http://xmlns.com/foaf/0.1/> .
>
> (yes, we do support inverse properties out of the box in JSON-LD)
>
>
>> In summary, it seems to me that in comparing blank nodes with relative
>> IRIs: (a) blank nodes are far less friendly to downstream RDF
>> consumption; (b) neither would likely be dereferenceable initially, but
>> relative IRIs could later be made deferenceable, whereas blank nodes
>> cannot; and (c) relative IRIs would be far more stable than blank nodes
>> -- comparable stability to other versioned IRIs.
>
> The point is that I don't want them to be stable. I explicitly want to
> prevent that people start to rely on them.

I suppose that would make sense if your goal is to annoy downstream 
consumers of your data, but that's rather anti-social.   Making it hard 
for others to refer to resources mentioned in your data is widely viewed 
as a *negative* -- not a positive -- and it goes against the philosophy 
of the web.  As stated in the W3C Architecture of the World Wide Web:
http://www.w3.org/TR/webarch/#uri-benefits
[[
A resource should have an associated URI if another party might 
reasonably want to create a hypertext link to it, make or refute 
assertions about it, retrieve or cache a representation of it, include 
all or part of it by reference into another representation, annotate it, 
or perform other operations on it.
]]

You might also read the W3C TAG's document on Publishing and Linking:
http://www.w3.org/TR/publishing-linking/

>
> This is similar to a private member in OO programming. Nothing would break
> if everything were made public. Most of the time however I want control over
> what is made public. I do mark things for which I cannot guarantee stability
> as private to prevent that people start relying on them.

No, it is very different from a private member in OO programming.  The 
purpose of a private member is to prevent someone else from breaking 
your code by *modifying* that private member.  But in this case, no 
injury would be done to *your* code.  All you would be achieving in 
making those properties unstable is to make it harder for downstream 
*consumers* of your data to refer to that property.  And that is not a 
goal that a W3C working group should be supporting.

>
>
>> The only significant downside I see to relative IRIs is that they
>> create an expectation of being dereferenceable, and that expectation
>
> and stability..
>
>
>> (presumably) would not initially be met.  That seems to me like a small
>> price to pay for the concrete benefits that are obtained from having
>> IRIs instead of blank nodes.
>
> I think we just have to agree to disagree :-)

Apparently so.

>
>
>> If a vendor wants to support value-added extensions then that is fine.
>> But I would expect *standard* JSON-LD parsers by *default* to produce
>> *standard* RDF -- not extended RDF -- although it is fine and good for
>> them to have an option for producing extended RDF.
>>
>>> All it would buy us is that some implementations may not be able to
> called
>>> conformant anymore (those who decide to not implement skolemization).
>>> There's no way to enforce what consumers do with the data anyway.
>>>
>>> The easiest way out of this would be to define some additional product
>>> classes:
>>>     a) an "extended RDF to standard RDF converter using skolemization"
>>>     b) an "extended RDF to standard RDF converter discarding the
> extensions"
>>>
>>> Then we could say that class a) implementations MUST transform bnodes
> used
>>> in predicates to skolem IRIs.
>>
>> Actually, this discussion has convinced me that prohibiting blank node
>> properties would be a better solution than skolemizing.
>
> OK, so what if we would add a "generalizedRDF" flag to the toRDF algorithm
> which, when set to false would filter all quads where a bnode is in
> predicate position? I would prefer the default value to be set to true but
> could, if there's a good argument, also live with a false.
>
> Would that address your concerns?

Well, no.  An option for extended RDF would be fine (defaulting to 
standard RDF), but discarding triples would not be fine, because it 
would involve unnecessary information loss.  That would bring us back to 
figuring out how to avoid that information loss.  Skolemization would be 
one way to do it, but the use of relative URIs seems like a better 
option because it is so much simpler and it gives the additional 
benefits (which I understand you do not see as benefits) of more stable 
identifiers that could eventually be made dereferenceable.

David
Received on Thursday, 4 July 2013 01:54:23 UTC