Re: Blank Node Identifiers and RDF Dataset Normalization from Steve Harris on 2013-02-27 (public-linked-json@w3.org from February 2013)

From: Steve Harris <steve.harris@garlik.com>
Date: Wed, 27 Feb 2013 15:37:09 +0000
To: Manu Sporny <msporny@digitalbazaar.com>
Cc: RDF WG <public-rdf-wg@w3.org>, Linked JSON <public-linked-json@w3.org>
Message-Id: <6DC9ECEC-7FC4-47ED-B662-C2C7E71BC2A2@garlik.com>
On 2013-02-27, at 04:27, Manu Sporny <msporny@digitalbazaar.com> wrote:

> On 02/26/2013 10:45 AM, Steve Harris wrote:
>>>> There are other ways you could have modelled the data you're 
>>>> attempting to express which wouldn't require wholesale changes to
>>>> RDF.
>>> 
>>> This isn't about the way we've modeled the data in our company; 
>>> which currently works with RDF. It's that the way we've modeled the
>>> data is a suboptimal and non-intuitive way of doing it. The use of
>>> JSON makes this a little more apparent -- and people who author 
>>> idiomatic JSON will notice and wonder why things were done 
>>> differently. We can live with what some might consider an inferior
>>> design ... we're just trying to improve it where it is lacking.
>> 
>> Sure, but you recognise that trying to rationalise your design by 
>> making significant changes to a widely used spec isn't exactly going
>> to make your organisation very popular?
> 
> I'm going to deviate temporarily from the technical arguments, those
> that want to skip to the technical stuff can skip this section.
> 
> Steve, making statements like these are not helpful. I can't keep
> continuing to allow statements like this to slide, which you are
> peppering into your responses, without saying something about your tone.
> 
> First, this discussion isn't about rationalizing a design decision, it's
> about making developers do something that they don't normally have to do
> - both in RDF and in JSON. Developers don't /have/ to label every
> subject with an IRI in RDF, but they have to label every Graph with an
> IRI in RDF. This isn't just about Web Payments, or JSON-LD. This is
> going to be confusing to developers in general. You disagree. Don't then
> attempt to make it seem like we are trying to do something sinister and
> then play good cop by waving your finger at us and telling us that we're
> going to be shunned by doing something that you planted on us.

I didn't mean rationalise in the sense of "make seem sane", I meant in the business sense of simplify (as in manufacturing). Probably a poor choice of word, but the other meaning isn't common in the UK (I had to look it up in a dictionary to see why you were upset about it BTW). Rationalise in the British sense still has a slightly negative connotation, a bit like "downsize" maybe - more efficient, but often carrying some cost.

I still hold the opinion that your preference for bNodes labelling graphs stems from the modelling decision you made in how you represent your data. I've seen nothing to counter that, all you examples are around reducing the byte count in JSON-LD web payments fragments.

As evidence I hold up that many other people have modelled exactly the same kind of data, and they've done it in other ways. We use objectification (sort of similar to how the Provenance group did it), FWIW - I wouldn't recommend that for your usecase because it will make the syntax of your structures much more complex. Andy (at least) has proposed alternatives that don't however.

Ofcourse, you're free to represent your data however you like, but asking the WG to make changes to an established spec to support that is going to be controversial. There may be other use cases that can only be addressed with  bNodes labelling graphs, but I've not heard any, and no-one has spoken out in support of this change with real-world examples from any other domain.

The perceived attitude of: "we have this very specific need, in a very specific (and currently not widely used) dialect of RDF, so everyone else should change what they do to make it slightly easier", is not a community-spirited one. I'm sure you don't think of it that way, but that's how it comes across to me.

I'm sorry if I come across as rude - I'm sure I do, I'm a very impatient, plain spoken, standards-loving brit geek/entrepreneur. I've got a strong attachment to RDFish technologies, and I've seen things like this go bad before too many times.

I don't think you comprehend quite how unreasonable your request is though.

> Second, framing our technical comments as "trying to rationalise your
> design by making significant changes to a widely used spec" is a fairly
> nasty accusation to make as that sort of behavior is pretty vile. I've
> notice this undercurrent in your responses and I'd like it to stop as
> it's borderline ad hominem.

I don't think it's particularly vile, it's more or less natural.

I have no problem being abrasive and direct if it gets the job done, but I always try to avoid anything that might be an ad hominem attack.

You think you've got a very good way of modelling your data, but it requires changing the semantics of RDF, I think that's too high a cost - in a nutshell.

> We wouldn't be making the request unless we thought it was for the good
> of the Web.

I believe you do.

> Third, this isn't a popularity contest. We raised technical issues, if
> people then take those issues and blame the messenger, then those people
> aren't using logic and reason to guide their responses. The idea that
> popularity would even enter into this discussion makes my skin crawl and
> makes it seem like we're in high school all over again.

I'm glad, because I'm certainly not going to win one of those :)

> Pat disagreed vehemently with us at first, but respectfully, and worked
> through it until he came to agree with our viewpoint. Andy continues to
> respectfully disagree with our position, but is making a very concerted
> effort to try and understand where we're coming from. Both of their
> responses are appreciated. I'm not seeing the same sort of respectful
> disagreement coming from you.

Yes, but Pat's concerns are very different from mine - I mainly care about pragmatic issues, and Pat mainly cares about logical soundness etc.

I'm no expert, but I expect you could use existential variables all over RDF in theory - that doesn't make it a good idea though.

>> Well, for completeness, there's three positions:
>> 
>> 1) bNodes should be deprecated in RDF 1.1 2) We should stick with 
>> what we have 3) We should allow them in more places in RDF 1.1
>> 
>> People expressed support for 1) and 2) at the start of the WG, but 3)
>> is a new position.
> 
> I've been involved in discussions about allowing blank nodes in all
> positions with various members of the Semantic Web community since
> 2008... Kingsley's been talking about this stuff for even longer than
> that. We've all been talking about it for a while in various sub-communties:
> 
> http://www.w3.org/2010/02/rdfa/sources/rdf-interfaces/#triples
> 
> I do realize that some of the folks in the RDF WG may be unaware that
> these discussions were happening, but even you admit that your company
> had allowed blank nodes in the graph position far in advance of the
> creation of this Working Group. I don't think it's a new position, or
> rather, I don't think folks should be surprised that the concept exists
> and found it's way into the group that decides these sorts of things.

For the record, it wasn't my company - it was in a research project 10+ years ago.

>>> Anyway, at this point I don't think we're likely to convince each 
>>> other of the value of our respective positions, but thank you for 
>>> listening.
>> 
>> Indeed. I think we at least understand each others points of view.
> 
> I don't think that's true. I'm not certain I understand your point of
> view, because:
> 
> 1) it seems veiled in a general dismissal of the problem space, as Dave
> Longley effectively argued in his responses to you,
> 2) the specific solutions that you refer to (skolemization for one) are
> not actually solutions to the problem space as far as we can see,
> 3) the rest of the solutions you allude to are vague and don't consist
> of enough details to apply it to the problem.

You can't say I'm dismissive of the problem space - I spent 6 years of my life building a company that works in the same area. Non-repudiation, security, and unambiguity are legal and contractual obligations on us. We use lots of RDF (Turtle) and lots of idiomatic JSON, though not combined in one document. We don't care so much about terseness, sure. We do have one or two API calls with a very similar approach, and we just mint URIs for external developers to ID objects in the future (if they need to).

RDF+SPARQL isn't a perfect fit, but we got the job done, and the specs are mostly better for not pandering to our requirements wholesale. In some cases we stepped beyond the spec at the time (e.g. Skolem URIs), and in other cases we deliberately left features out because it wasn't possible for us to implement them securely (e.g. FROM). Better that than pushing all our requirements onto the spec, but that doesn't mean we didn't ask!

I don't see how skolemisation can fail to be a solution - you appear to me to be fixated on a very specific  solution to your data modelling. I appreciate that you've invested some, maybe relatively significant, technical effort in your current solution. But, so have we, and everyone else that's implemented RDF 1.0.

I don't want to throw numbers about, but for us the cost of anything that significantly decreases the efficiency of our RDF storage carries a huge monetary cost - we couldn't justify it without a significant upside.

We gain a pretty small direct benefit from complying to the RDF specs, it's really only the availability of tutorials, and occasionally access to experienced developers, but I like standards. There's a big benefit to the industry on the whole, but only if they're curated correctly and sensitively.

If the majority of the WG thought this was a great idea, then it would be more of a decision around do we (Experian) carry on using RDF qua RDF internally, or just start thinking of it as RDF-flavoured - but I don't see that level of support, so luckily I'm not in that position.

- Steve

> I also don't think you understand our point of view. I say this for two
> reasons:
> 
> 1) Your responses gloss over specifics and demonstrate a basic
> mis-understanding of the proposals that we've put forward.
> 2) There is a theme among your responses where you believe that there
> isn't a problem, or the problem space is a solved one. When you approach
> a problem with that viewpoint, you tend to miss things or misunderstand
> the actual problem.
> 
> That said, I'm happy to leave things where they truly are: with you
> dismissing the problem and the class of people it affects and some of us
> not understanding why you've chosen to argue your point in the way that
> you have. I think that's where we truly are in this discussion.
> 
> -- manu
> 
> -- 
> Manu Sporny (skype: msporny, twitter: manusporny, G+: +Manu Sporny)
> Founder/CEO - Digital Bazaar, Inc.
> blog: Aaron Swartz, PaySwarm, and Academic Journals
> http://manu.sporny.org/2013/payswarm-journals/
> 

-- 
Steve Harris
Experian
+44 20 3042 4132
Registered in England and Wales 653331 VAT # 887 1335 93
80 Victoria Street, London, SW1E 5JL
Received on Wednesday, 27 February 2013 15:37:42 UTC