- From: Andy Seaborne <andy.seaborne@epimorphics.com>
- Date: Mon, 10 Oct 2011 16:07:09 +0100
- To: Sandro Hawke <sandro@w3.org>
- CC: public-rdf-wg@w3.org
On 10/10/11 03:23, Sandro Hawke wrote:
> On Sat, 2011-10-08 at 17:31 +0100, Andy Seaborne wrote:
>>
>> On 07/10/11 15:35, Sandro Hawke wrote:
>>> On Fri, 2011-10-07 at 13:48 +0100, Andy Seaborne wrote:
>>>>> Okay, that's enough for now. Give me a +1 if you think this is headed
>>>> > in a useful direction.
>>>>
>>>> I like something like this as a pattern of good practice (well, 2
>>>> patterns). I don't agree with forcing the 4th column to have a specific
>>>> meaning given all the other deployed uses we have now collected.
>>>
>>> Yeah.... There is a middle ground where some datasets use Web
>>> semantics and some don't. I see your point that we can't just force
>>> people to change -- we can't say the thingsthey've been saying now means
>>> something else.
>>>
>>> Maybe we can have a way to flag which datasets are using Web semantics,
>>> and allow market pressures to work? Like, where we do a new mime type
>>> for a multigraph syntax, we could add this. And maybe it's something
>>> we can flag in SPARQL service description.
>>>
>>>> On one points:
>>>>
>>>> I don't see why
>>>>
>>>> <http://example.org> {<s> <p> <o> . }
>>>>
>>>> should mean it is ONLY that triple rather than CONTAINS that triple. If
>>>> the data publisher wants to say "and that's all" then they should say so
>>>> as an additional fact. The converse of "it's closed by default" is
>>>> harder to see how to allow it to be open sometimes.
>>>>
>>>> For a large graph, and you only need to talk about a small subset, the
>>>> deployment issues. Consider dbpedia.
>>>>
>>>> (I also want to see the same change in TriG for concatenation of files)
>>>
>>> It seems to me that it's easy to go from complete to incomplete, just
>>> using a subgraph predicate. Let's say we want to say G1 is the graph
>>> with only<s> <p> <o> and G2 is a graph with that triple and maybe other
>>> stuff. I'd say:
>>>
>>> G1 {<s> <p> <o>. }
>>> { G1 r:subgraphOf G2. }
>>>
>>> But I don't see how to communicate G1 the way you're talking about. How
>>> do you say "and that's all"?
>>
>>
>>
>> G1 {<s> <p> <o>. }
>> { G1 r:representationOf G2. }
>
> I don't understand. Can you expand those out in English? I read
> that, with your proposed subgraph semantics as:
>
> G1 is a graph which contains at least the triple<s> <p> <o>.
>
> G1 is a representation of G2.
>
> I don't know what "representation" means in this sense, but in any case,
> how can we know from those statements that G2 contains only that one
> triple? The only connection between that triple and G2 is via G1, and
> we've made that connection so loose that it can't serve this purpose.
You were asking how to say "and that's all" if G1 { } is some triples from.
I am suggesting a statement that the {} corresponds to the AWWW
respresentation.
>
>> Mindful of DanBri comments dereference being non-global
>
> Which I disposed of.
Only by saying "don't do it" and coming up with the idea of "true
datasets". But the reality is that it can happen without the client
knowing.
If a "true dataset" is one particular case, then fine - it's a pattern
which is, tome, the best way forward.
If "true dataset" is only case then I disagree with the approach and
think not reflecting current usage is a pointless exercise in spec-ery.
> The cases of dereference being non-global are not
> useful to this purpose, so we're not using them (in my proposal).
> (Note that non-global dereference, while useful for some things, is
> anathema to much of the Web. Search engines can't really handle it,
> etc.)
>
>> maybe all that
>> can ever be said is "subgraph" so making the
>>
>> G1 {<s> <p> <o>. }
>>
>> case the subgraph case may be where AWWW leads us.
>
> I'm quote confident it doesn't.
>
>> Stronger statements
>> need additional triples to make them and this reflect that fact that
>> additional knowledge over and above AWWW deref is being used.
>
> No. When I load a Web over the web, it's very clear to me, at a
> protocol level, when I've gotten the full
> "representation" (serialization) of the page, of an image, of a video,
> of a stylesheet, etc.
>
> Also, when I download an RDF/XML or Turtle file, I can tell when I'm
> done. We want to be able to support merging of graphs but that doesn't
> mean have to pretend the boundaries between the graphs, pre-merging,
> don't exist. And the whole utility of "named graphs" for some folks
> (eg Tim Lebo) is that it lets you draw boundaries around graphs and
> point to them.
>
> -- Sandro
>
>
>> Andy
>>
>>
>>>
>>> -- Sandro
>>>
>>>
>>>> Andy
>>>>
>>>> On 07/10/11 03:04, Sandro Hawke wrote:
>>>>> Here's a proposal for what the fourth column should mean. It's kind of
>>>>> obvious, and I think it's how many of us just assumed Named Graphs were
>>>>> supposed to work. But I don't think it's been written down in a form
>>>>> we can use, so here it is, in a first draft.
>>>>>
>>>>> I haven't really tried to motivate this, but one thing it does is allow
>>>>> folks to refer to a graphs using just one URI. As [1] points out rather
>>>>> painfully, as things stand now, you need multiple URIs just to identify
>>>>> each g-box (and thus g-snap). (That is, you need to say which sparql
>>>>> endpoint you're talking about, and then which graph within its
>>>>> dataset.)
>>>>>
>>>>> My starting question was: what is the relationship between the IRI (the
>>>>> "graph name") and its associated g-snap in an RDF Dataset. This
>>>>> applies to the dataset backing any SPARQL end point, as well as the
>>>>> dataset serialized in any multigraph syntax, like TriG or N-Quads.
>>>>> Another way to look at it: what does it mean to assert a TriG
>>>>> document? If you send me the TriG Document "<a> {<s> <p> <o> }", and
>>>>> I trust you, what do I now know?
>>>>>
>>>>> Richard, I think, has been arguing for a minimalist position,
>>>>> answering "nothing", or "it depends on out-of-band agreements". This
>>>>> "Web Semantics" proposal is an alternative.
>>>>>
>>>>> === Proposal
>>>>>
>>>>> The idea here is to make the relationship between the URI and the
>>>>> graph be the standard Web naming relationship, similar to what we all
>>>>> use for Web pages. When you dereference the URI, you get the graph.
>>>>>
>>>>> This has the feature of being, to some extent, observable. Just like
>>>>> triples are claims about some domain of discourse, quads become claims
>>>>> about idealized Web dereference behavior.
>>>>>
>>>>> Specifically: Consider a "graph naming" to be the association of a
>>>>> graph name N with a graph G. For the graph naming to hold, every
>>>>> successful dereference of N yielding an RDF graph must yield G. For a
>>>>> dataset D to hold, its default graph must hold (as normal in RDF) and
>>>>> every graph naming pair in D must hold.
>>>>>
>>>>> Example 1: This dataset
>>>>>
>>>>> <http://example.org> {<s> <p> <o>. }
>>>>>
>>>>> means that if anyone is able to dereference "http://example.org"
>>>>> and obtain an RDF graph serialization, the serialized graph will
>>>>> consist of the single triple,<s> <p> <o>. Failure to dereference
>>>>> does not make the graph naming untrue, but a successful dereference
>>>>> yielding a different graph does.
>>>>>
>>>>> Example 2: This dataset can never be true:
>>>>>
>>>>> <http://example.org> {<s> <p> 1. }
>>>>> <HTTP://example.org> {<s> <p> 2. }
>>>>>
>>>>> ... since one cannot get different results dereferencing URIs that
>>>>> differ only in the case of the scheme component (as per RFC 3986).
>>>>>
>>>>> Example 3: This dataset:
>>>>>
>>>>> <tag:hawke.org,2010-10-06:eg1> {<s> <p> <o>. }
>>>>>
>>>>> cannot be tested using Web protocols, since the "tag" URI scheme is
>>>>> (by design) not dereferenceable. Whether it is true or false cannot
>>>>> be determined experimentally.
>>>>>
>>>>> ==== Temporal Context
>>>>>
>>>>> How can we say:
>>>>>
>>>>> <http://example.org> {<s> <p> <o>. }
>>>>>
>>>>> if we suspect that "http://example.org" might serve some other content
>>>>> tomorrow?
>>>>>
>>>>> The answer is that datasets often need temporal qualification just
>>>>> like RDF graphs do. It's just like saying in RDF:
>>>>>
>>>>> <http://example.org/Alice> foaf:age 25.
>>>>>
>>>>> One solution for foaf:age triples is to include triples like:
>>>>> <> dc:temporal "2011-10-06"^^xs:dateTime.
>>>>>
>>>>> and that can be done in datasets as well, using the default graph.
>>>>> More work is needed on this, but I'm pretty sure this proposal can use
>>>>> whatever solution people come up with for RDF and doesn't make matters
>>>>> much worse than they are already.
>>>>>
>>>>> ==== Practical Deployment Choices
>>>>>
>>>>> Any system which maintains a dataset (eg a sparql endpoint) or
>>>>> generates multigraph documents like TriG has to do one (or more) of
>>>>> the following:
>>>>>
>>>>> 1. Use new non-dereferenceable graph names. These could be tag or
>>>>> uuid URIs, or http URIs in your own name space which you choose to
>>>>> leave 404.
>>>>>
>>>>> 2. Use your own dereferenceable graph names, perhaps relative to the
>>>>> endpoint or TriG document URI. If you do serve RDF content at
>>>>> those URIs, it MUST be the same content (give or take stated time
>>>>> lag).
>>>>>
>>>>> 3. Use someone else's graph names. Here, the key thing is temporal
>>>>> metadata. You have to decide what you want (copy once vs
>>>>> synchronize with what accuracy) and (somehow) share that temporal
>>>>> metadata.
>>>>>
>>>>>
>>>>> ...
>>>>>
>>>>> Okay, that's enough for now. Give me a +1 if you think this is headed
>>>>> in a useful direction.
>>>>>
>>>>> -- Sandro
>>>>>
>>>>> [1] http://www.w3.org/2011/prov/wiki/Using_named_graphs_to_model_Accounts
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>
>
Received on Monday, 10 October 2011 15:07:48 UTC