Re: Blank Node Identifiers and RDF Dataset Normalization from Dave Longley on 2013-02-26 (public-linked-json@w3.org from February 2013)

From: Dave Longley <dlongley@digitalbazaar.com>
Date: Tue, 26 Feb 2013 10:01:49 -0500
To: Steve Harris <steve.harris@garlik.com>
CC: Linked JSON <public-linked-json@w3.org>
Message-ID: <512CCE5D.8060409@digitalbazaar.com>
On 02/26/2013 06:35 AM, Steve Harris wrote:
> On 2013-02-25, at 18:30, Dave Longley <dlongley@digitalbazaar.com> wrote:
>
>> On 02/25/2013 12:09 PM, Steve Harris wrote:
>>> There is categorically no valid argument that something along these lines is essential for such-and-such usecase, frankly that's nonsense as those usecases are already addressed by production systems in much more demanding environments, without those features.
>> I'm not aware of production systems functioning in demanding environments that are using RDF datasets expressed in idiomatic JSON; the JSON-LD specification is about making that possible. I do believe the use case where developers would strongly prefer to refer to a graph without having to create and maintain a global identifier is a valid one. I would also argue that denying developers the ability to do this because it changes the way certain optimizations are implemented in existing systems isn't a very strong argument.
> JSON-LD is just a surface syntax, it shouldn't have any bearing on the semantics unless something is wrong - RDF/XML is (supposed to be) idiomatic XML, and is also a natively tree format. I don't see how that's any different. If it is then there's likely some issue.
>
> As a company we use a lot of JSON documents, both internally and externally, so I have a reasonable amount of indirect experience of it.
>
>> I'm in a similar position to you with respect to another related use case: where an author would like to use a graph label to do something other than denote a graph. I don't really understand that or why it should be supported. If it's because the author doesn't or can't mint a new URL, then I would suggest that using a blank node identifier would solve the problem nicely. If that doesn't actually solve the problem, though, and they do really need to use a graph label that doesn't really denote the graph, I'd like to understand why. I don't think dismissing it as a solved problem by lots of other systems in production is necessarily helpful.
> The consequences of the graph label always denoting a graph are quite severe, it rules out some common situations (e.g. naming the graph after the URI that we dereferenced, and making changes to the graph over time). It's also totally unenforceable, so what will happen to systems when they encounter data that doesn't follow the rule? This will inevitably happen given how difficult it is to ensure that it's always the case.

The argument that rules may be difficult to enforce -- therefore they 
are bad rules -- sounds to me like a way to rationalize away the very 
need for any rule at all in a decentralized system. I'm ok with 
accepting that some systems will abuse the rules; it's a reality. That 
doesn't mean that the rules then serve no useful purpose at all. That 
being said, I'm not entirely opposed to the idea that there's a use case 
for graph labels that do not denote the graph just because I can see 
other ways to solve the problem. Which is what my point was. I could 
tell you that all you need to do is just mint a new URL that you control 
and copy all the data over there if you want to make edits to a graph. 
It will "work," it just won't do exactly what you'd prefer. In fact, 
someone else (perhaps in a production system with high demand) may have 
done just that.

>
>> If we are to "move way beyond the time where RDF is an 'emerging tech' only suitable for early-stage startups and academics", as you say, then I believe that we must embrace more common practices that occur outside the walls of its current use. Saying that developers should simply do something unnatural and/or prohibitive to solve their use cases will only continue to restrict the adoption of the technology by wider audiences.
> My point was that RDF has already moved beyond that point, but not enough people in the WG acknowledge that. It's not /widely/ used in financial services, but we're certainly not the only users. We're also using RDF in a much more security sensitive, heavily regulated, and provenance critical product. The WG should respect the fact that there are deployed systems, and not randomly change things without a very good reason. Supporting one company's modelling decisions is not a very good reason, IMHO.
>
> I personally don't believe that any of the other solutions (which don't require changing RDF semantics) that have been proposed are unnatural or prohibitive, though obviously unnatural is in the eye of the beholder.
>
> There are other ways you could have modelled the data you're attempting to express which wouldn't require wholesale changes to RDF.

This isn't about the way we've modeled the data in our company; which 
currently works with RDF. It's that the way we've modeled the data is a 
suboptimal and non-intuitive way of doing it. The use of JSON makes this 
a little more apparent -- and people who author idiomatic JSON will 
notice and wonder why things were done differently. We can live with 
what some might consider an inferior design ... we're just trying to 
improve it where it is lacking.

> The RDF community has a poor history of including random features like this (without enough understanding of the consequences) which have far reaching consequences on implementations. e.g. rdf:Bag/Alt/etc., rdf:List, XMLLiterals, reification, plain literals, and some would say bNodes. Those badly thought out features have all cost the community dearly.

That's all well and good, but simply begs the question. Of course if an 
idea is "random" or "badly thought out" it is likely to be to the 
detriment of any community to adopt it. The point of contention, 
however, is whether or not this idea is "random" or "badly thought out" 
-- we clearly disagree. The opposing viewpoint to yours is that the 
current state of RDF with respect to restricting blank node identifiers 
to special cases is "badly thought out" and "random".

Anyway, at this point I don't think we're likely to convince each other 
of the value of our respective positions, but thank you for listening.

-- 
Dave Longley
CTO
Digital Bazaar, Inc.
Received on Tuesday, 26 February 2013 15:01:49 UTC