Re: Three solution designs to the first three Graphs use cases

On 2012-02-01, at 00:23, Sandro Hawke wrote:
> On Fri, 2012-01-27 at 12:27 +0000, Steve Harris wrote:
>> On 2012-01-27, at 10:35, Ivan Herman wrote:
>>> On Jan 27, 2012, at 10:33 , Andy Seaborne wrote:
>>>> On 27/01/12 03:45, Sandro Hawke wrote:
>>>>> On Thu, 2012-01-05 at 11:09 +0000, Andy Seaborne wrote:
>>>>>> On 04/01/12 19:23, David Wood wrote:
>>>>>>> Thanks, Sandro.  That's very helpful.
>>>>>>> 
>>>>>>> It might be useful to consider augmenting TriG syntax to support your third solution (explicitly naming relations). I'd be quite happy with that.
>>>>>> 
>>>>>> What would the data model be?
>>>>> 
>>>>> I think: an RDF graph which can have other RDF graphs as values of its
>>>>> triples.  All these graphs would be subgraphs of some greater graph, so
>>>>> they can share b-nodes.
>>>>> 
>>>>> (This is what cwm has had implemented since 2001, I think.)
>>>> 
>>>> I thought this WG wasn't going there (graph literals).
>>>> 
>>>> Personally, I see graph literals as the clean answer but it is RDF 2 (+).  RDF 1.1 is, to me, incremental improvements within the current abstract data model.  Datatyped literals  (e.g. "<s> <p> <o>"^^rdf:graphNTriples) are unwieldy and might block doing graph literals properly in RDF 2+.
>>>> 
>>> 
>>> I am not convinced it is such a huge jump and, if this is the only way to have a clean way forward, we may have to do this. The datatyped literals may be a way forward and, after all, the trig version of using '{' may be considered as a syntactic sugar for a datatyped literal…
>> 
>> This makes me /extremely/ nervous.
>> 
>> From the perspective of the indexing/query engine is an enormous difference, and I'm not aware of any commonly used systems that currently follow this model. So, there's a lack of experience in the community of how to deal with these structures efficiently.
>> 
>> I bought this kind of argument with RDF Lists (collections), and accessor functions - storing the lists natively, and also reflecting them into triples. Coming up with an implementation that was both correct and efficient turned out to be so hard that we gave up, and just elected not to use Lists in production.
> 
> I'm sad to hear about this experience with lists.  Sometime I'd like to
> hear more about why that was so hard.   (Have you folks
> written/presented about it?)

No, but I think I mentioned it at the last F2F.

In essence, to make it have anything like decent performance you have to maintain a parallel copy of the list structure in a vector (of some kind), and tracking changes in the triples, and updating the vector appropriatly (and vice versa, if you allow useful list manipulation functions) is /extremely/ difficult, and computationally expensive, especially at scale. Quite simply, it's just not worth the effort.

I believe Andy said something similar too.

>> If we had a critical mass of systems that worked this way I would be enthusiastic about it, but we don't.
> 
> I think it's possible to implement graph literals (like in N3, or my
> third proposed solution) using a quad store, like the ones you already
> use.  That's how at least one version of cwm did it.   The technique is
> to map it to TriG/SameAs with minted identifiers:
> 
> So, to represent:
> 
>  <s> <p> { <a> <b> <c> }
> 
> you mint an identifier ( <g1> ) then store these quads:
> 
>  <s> <p> <g1> DEFAULT
>  <a> <b> <c> <g1>

Sure, it's possible, but it's novel (for scalable systems), and no-one understands the performance implications.

We use SPARQL-style GRAPHs a lot for holding provenance information, currently it's just a query across the quad to find the provenance identifier, but this would move that to a single column join.

> In this proposal, such a use of quads is a purely internal decision of
> the implementer -- what's standard for interchange is the N3-like syntax
> with the graph literals.  It's just those documents are stored for easy
> access/manipulation in quads using a SameAs relation.  Elsewhere, people
> remain free to use quads, internally, however they want.
> 
> Wouldn't that solve the implementation burden?

No.

In general I have a serious issue with the way this group is chartered. It seems to take no account of the fact that there are FTSE-100 etc. companies spending serious effort and money on deployments of these technologies. IMHO it's far too late to run around messing with the underlying real-world datamodel (quads) when it has this many deployments. 10 years ago, when RDF was mostly just an academic plaything it might have been OK, but quads were already in common use then. I strongly believe this group should have been chartered to standardise what real implementations actually do, not invent random new stuff backed by no significant implementation experience.

We spend an eyewatering amount of money every year on power, cooling, and hardware to store quads. If RDF 1.1 makes that noticeably less efficient, then frankly we'll just ignore it.

- Steve [picking up toys and putting them back in the pram :)]

-- 
Steve Harris, CTO, Garlik Limited
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203  http://www.garlik.com/
Registered in England and Wales 0535 7233 VAT # 849 0517 11
Registered office: Landmark House, Experian Way, Nottingham, Notts, NG80 1ZZ

Received on Wednesday, 1 February 2012 10:43:47 UTC