Re: Blank Node Identifiers and RDF Dataset Normalization

On 2013-02-26, at 15:01, Dave Longley <dlongley@digitalbazaar.com> wrote:

> On 02/26/2013 06:35 AM, Steve Harris wrote:
>> On 2013-02-25, at 18:30, Dave Longley <dlongley@digitalbazaar.com> wrote:
>> 
>>> On 02/25/2013 12:09 PM, Steve Harris wrote:
>>>> There is categorically no valid argument that something along these lines is essential for such-and-such usecase, frankly that's nonsense as those usecases are already addressed by production systems in much more demanding environments, without those features.
>>> I'm not aware of production systems functioning in demanding environments that are using RDF datasets expressed in idiomatic JSON; the JSON-LD specification is about making that possible. I do believe the use case where developers would strongly prefer to refer to a graph without having to create and maintain a global identifier is a valid one. I would also argue that denying developers the ability to do this because it changes the way certain optimizations are implemented in existing systems isn't a very strong argument.
>> JSON-LD is just a surface syntax, it shouldn't have any bearing on the semantics unless something is wrong - RDF/XML is (supposed to be) idiomatic XML, and is also a natively tree format. I don't see how that's any different. If it is then there's likely some issue.
>> 
>> As a company we use a lot of JSON documents, both internally and externally, so I have a reasonable amount of indirect experience of it.
>> 
>>> I'm in a similar position to you with respect to another related use case: where an author would like to use a graph label to do something other than denote a graph. I don't really understand that or why it should be supported. If it's because the author doesn't or can't mint a new URL, then I would suggest that using a blank node identifier would solve the problem nicely. If that doesn't actually solve the problem, though, and they do really need to use a graph label that doesn't really denote the graph, I'd like to understand why. I don't think dismissing it as a solved problem by lots of other systems in production is necessarily helpful.
>> The consequences of the graph label always denoting a graph are quite severe, it rules out some common situations (e.g. naming the graph after the URI that we dereferenced, and making changes to the graph over time). It's also totally unenforceable, so what will happen to systems when they encounter data that doesn't follow the rule? This will inevitably happen given how difficult it is to ensure that it's always the case.
> 
> The argument that rules may be difficult to enforce -- therefore they are bad rules -- sounds to me like a way to rationalize away the very need for any rule at all in a decentralized system. I'm ok with accepting that some systems will abuse the rules; it's a reality. That doesn't mean that the rules then serve no useful purpose at all. That being said, I'm not entirely opposed to the idea that there's a use case for graph labels that do not denote the graph just because I can see other ways to solve the problem. Which is what my point was. I could tell you that all you need to do is just mint a new URL that you control and copy all the data over there if you want to make edits to a graph. It will "work," it just won't do exactly what you'd prefer. In fact, someone else (perhaps in a production system with high demand) may have done just that.
> 
>> 
>>> If we are to "move way beyond the time where RDF is an 'emerging tech' only suitable for early-stage startups and academics", as you say, then I believe that we must embrace more common practices that occur outside the walls of its current use. Saying that developers should simply do something unnatural and/or prohibitive to solve their use cases will only continue to restrict the adoption of the technology by wider audiences.
>> My point was that RDF has already moved beyond that point, but not enough people in the WG acknowledge that. It's not /widely/ used in financial services, but we're certainly not the only users. We're also using RDF in a much more security sensitive, heavily regulated, and provenance critical product. The WG should respect the fact that there are deployed systems, and not randomly change things without a very good reason. Supporting one company's modelling decisions is not a very good reason, IMHO.
>> 
>> I personally don't believe that any of the other solutions (which don't require changing RDF semantics) that have been proposed are unnatural or prohibitive, though obviously unnatural is in the eye of the beholder.
>> 
>> There are other ways you could have modelled the data you're attempting to express which wouldn't require wholesale changes to RDF.
> 
> This isn't about the way we've modeled the data in our company; which currently works with RDF. It's that the way we've modeled the data is a suboptimal and non-intuitive way of doing it. The use of JSON makes this a little more apparent -- and people who author idiomatic JSON will notice and wonder why things were done differently. We can live with what some might consider an inferior design ... we're just trying to improve it where it is lacking.

Sure, but you recognise that trying to rationalise your design by making significant changes to a widely used spec isn't exactly going to make your organisation very popular?

>> The RDF community has a poor history of including random features like this (without enough understanding of the consequences) which have far reaching consequences on implementations. e.g. rdf:Bag/Alt/etc., rdf:List, XMLLiterals, reification, plain literals, and some would say bNodes. Those badly thought out features have all cost the community dearly.
> 
> That's all well and good, but simply begs the question. Of course if an idea is "random" or "badly thought out" it is likely to be to the detriment of any community to adopt it. The point of contention, however, is whether or not this idea is "random" or "badly thought out" -- we clearly disagree. The opposing viewpoint to yours is that the current state of RDF with respect to restricting blank node identifiers to special cases is "badly thought out" and "random".

Well, for completeness, there's three positions:

1) bNodes should be deprecated in RDF 1.1
2) We should stick with what we have
3) We should allow them in more places in RDF 1.1

People expressed support for 1) and 2) at the start of the WG, but 3) is a new position.

> Anyway, at this point I don't think we're likely to convince each other of the value of our respective positions, but thank you for listening.

Indeed. I think we at least understand each others points of view.

- Steve

-- 
Steve Harris
Experian
+44 20 3042 4132
Registered in England and Wales 653331 VAT # 887 1335 93
80 Victoria Street, London, SW1E 5JL

Received on Tuesday, 26 February 2013 15:46:29 UTC