Re: Graphs Design 6.2 from Richard Cyganiak on 2012-04-27 (public-rdf-wg@w3.org from April 2012)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Fri, 27 Apr 2012 09:03:43 +0100
To: Sandro Hawke <sandro@w3.org>
Cc: public-rdf-wg <public-rdf-wg@w3.org>
Message-Id: <40ACBE85-343B-43CD-AD6D-F12EA91C3178@cyganiak.de>
Hi Sandro,

On 26 Apr 2012, at 18:28, Sandro Hawke wrote:
>> Let's say I find a couple of TriG files on the Web. This being the Web, I don't trust them fully.
>> 
>> I want to load all of them into my SPARQL store so that I can query them with SPARQL. But I want to load them in a way so that I can still change my mind about what to trust or distrust after having loaded them. So I need to keep track of who said what.
>> 
>> Assuming you consider it reasonable, then please consider the following sub-cases:
>> 
>> 1) I may not fully trust everything that's said in some of the named graphs in these TriG files. (That is, I trust the source of the TriG file and the metadata in the default graphs, but don't trust some of the other sources quoted in the named graphs.)
>> 
>> 2) I may not fully trust everything that's said in the default graphs of these TriG files. (For example, the metadata in the default graphs might be horribly outdated, or the TriG files use @union and I don't trust some of the NGs.)
>> 
>> 3) I may not fully trust the association between graph IRIs and graphs in some of the TriG files (That is, I suspect they might be lying or mistaken when ascribing statements to certain source IRIs in their named graphs).
> 
> I think 6.2 can handle them all.  

Okay, I see below that you intend to handle them by using a custom import mechanism that renames the graphs, and a custom metadata vocabulary, and I assume defining these wouldn't be part of this WG's work. Yes, the scheme as described would be able to handle all three cases.

If the WG describes what it means to somehow “combine” or “merge” RDF datasets, and I got the impression we are headed in that direction, then we should be well aware whether our mechanism supports the scenarios above.

And if we define a syntax for RDF datasets, then people *will* put these on the web, and they *will* want to load them into their existing RDF stores, and they *will* expect that there's a way to do the things above, and we better be able to answer how it can be done.

So I was more hoping for a discussion whether the proposed standard merging mechanisms would still allow us to keep track of who said what in these three scenarios.

This might also require some thinking about what it means to put a TriG file on the web, about authority and so on.

I observe that 1) seems to be easy — if I trust the default graphs, then I can just merge them. I can just load all the named graphs, there should be no contradictions if the sources of the TriG files are trustworthy.

2) would probably just require keeping the default graphs separate. So if I load the first dataset (D1) from <http://example.com/alice> then I might stick the default graph of D1 into a named graph with IRI <http://example.com/alice>. Would that be a reasonable way of looking at a web-published TriG file? That the triples in its default graph can be seen as being inside a named graph that coincides with the TriG file's URL?

3) is more difficult and probably cannot be done safely without renmaing the graphs, that is, “standardizing them apart”.

There are some interesting effects around authority. For example, I suppose that a TriG file at <http://example.com/alice.trig> would be authoritative with respect to a graph named <http://example.com/alice.trig#graph1>, because the latter IRI, per IRI spec, identifies a part of the resource identified by the former. So I suppose it would be safe in all three cases to load the graph <http://example.com/alice.trig#graph1> from <http://example.com/alice.trig> without renaming.

I suppose in case 3) I could always just go and dereference the graph names to see if the result matches what's reported in the TriG files. This of course only works for dereferenceable graph names; I'd be out of luck with URNs or if blank nodes are allowed.

I guess that's what could be done with 6.2 out of the box, in the absence of further work outside of the WG.

Best,
Richard


> I'm now thinking in terms of 6.3
> which I'm still writing up.   The main difference is that I've come up
> with a name for the class of things denoted by graph labels (namely,
> "Graph Resources") and I'm letting go of rdf:Graph, rdf:GraphContainer,
> and rdf:hasGraph.    I think those kinds of things can be defined later,
> as the product of research.   The main thing we'd be adding that's not
> in SPARQL is a story about what it means when you use a graph label in
> your RDF.   When we start to look at inference, change-over-time, etc, I
> think that will matter.
> 
> Since I'm hoping for an A+, here's an answer to the reader exercise:
> 
> You said:
>> I want to load all of them into my SPARQL store so that I can query
>> them with SPARQL. But I want to load them in a way so that I can still
>> change my mind about what to trust or distrust after having loaded
>> them. So I need to keep track of who said what.
> 
> Okay, so maybe we have:
> 
> === D1, from http://example.com/alice ===
> @prefix : <http://example.com/>
> { :a :b 1 }
> :g1 { :a :b 2 }
> ==========
> 
> and
> 
> === D2, from http://example.com/bob ===
> @prefix : <http://example.com/>
> { :a :b 3 }
> :g1 { :a :b 4 }
> :g2 { :a :b 5,6 }
> ==========
> 
> Since we're not sure we want to trust them, we can't just merge them
> into our SPARQL store; we have to quote them in some way.  Perhaps we'll
> end up with something like this (leaving out some data we'd probably
> want for cache management, for now):
> 
> === D3, our store after the harvest ===
> @prefix : <http://example.com/>
> @prefix cr: <http://example.com/crawler/>
> # all the graphs we encounter, given new names
> cr:g9971 { :a :b 1 }
> cr:g9972 { :a :b 2 }
> cr:g9973 { :a :b 3 }
> cr:g9974 { :a :b 4 }
> cr:g9975 { :a :b 5,6 }
> # all the stuff we figured out for ourselves, and thus trust
> { [] a :DatasetRead;
>     :from :alice;
>     :defaultGraph cr:g9971;
>     :entry [  :name :g1; :graph cr:g9972 ].    
>  [] a :DatasetRead;
>     :from :bob;
>     :defaultGraph cr:g9973;
>     :entry [  :name :g1; :graph cr:g9974 ]
>     :entry [  :name :g2; :graph cr:g9975 ].
> }
> ============
> 
> Hopefully, this is about what you'd expect.  It's a little ugly, but
> offhand I can't think of a simpler way to make it work within the
> confines of SPARQL.   (It would be quite different in N3, but that's
> probably not relevant.)
> 
> A demo for this would be to include a lot of foaf files in the crawl,
> and then I would query for something (eg people whose names match a
> regexp), but ask for only results I "trust".  I'd define trusted
> sources as anyone who is within my 3rd degree circle of foaf:knows.  I
> think that's doable with this structure, although probably not in a
> single SPARQL query.   
> 
> The demo would show what happens someone inside the circle lies vs.
> someone outside the circle.   The advanced demo would have people
> including owl:sameAs arcs in their foaf file, and those arcs being used
> properly.  It would also show what happens as different people change
> their foaf files, and old data and no-longer-supported inferences go
> away.   We might find the object of :name has to be string.
> 
>   -- Sandro
> 
> 
> 
> 
>
Received on Friday, 27 April 2012 08:04:11 UTC