Re: Web Semantics for Datasets

On 10/10/11 03:23, Sandro Hawke wrote:
> On Sat, 2011-10-08 at 17:31 +0100, Andy Seaborne wrote:
>>
>> On 07/10/11 15:35, Sandro Hawke wrote:
>>> On Fri, 2011-10-07 at 13:48 +0100, Andy Seaborne wrote:
>>>>> Okay, that's enough for now.  Give me a +1 if you think this is headed
>>>>    >   in a useful direction.
>>>>
>>>> I like something like this as a pattern of good practice (well, 2
>>>> patterns).  I don't agree with forcing the 4th column to have a specific
>>>> meaning given all the other deployed uses we have now collected.
>>>
>>> Yeah....   There is a middle ground where some datasets use Web
>>> semantics and some don't.  I see your point that we can't just force
>>> people to change -- we can't say the thingsthey've been saying now means
>>> something else.
>>>
>>> Maybe we can have a way to flag which datasets are using Web semantics,
>>> and allow market pressures to work?    Like, where we do a new mime type
>>> for a multigraph syntax, we could add this.   And maybe it's something
>>> we can flag in SPARQL service description.
>>>
>>>> On one points:
>>>>
>>>> I don't see why
>>>>
>>>> <http://example.org>    {<s>    <p>    <o>   . }
>>>>
>>>> should mean it is ONLY that triple rather than CONTAINS that triple.  If
>>>> the data publisher wants to say "and that's all" then they should say so
>>>> as an additional fact.  The converse of "it's closed by default" is
>>>> harder to see how to allow it to be open sometimes.
>>>>
>>>> For a large graph, and you only need to talk about a small subset, the
>>>> deployment issues.  Consider dbpedia.
>>>>
>>>> (I also want to see the same change in TriG for concatenation of files)
>>>
>>> It seems to me that it's easy to go from complete to incomplete, just
>>> using a subgraph predicate.   Let's say we want to say G1 is the graph
>>> with only<s>   <p>   <o>   and G2 is a graph with that triple and maybe other
>>> stuff.   I'd say:
>>>
>>>       G1 {<s>   <p>   <o>. }
>>>       { G1 r:subgraphOf G2. }
>>>
>>> But I don't see how to communicate G1 the way you're talking about. How
>>> do you say "and that's all"?
>>
>>
>>
>>         G1 {<s>   <p>   <o>. }
>>         { G1 r:representationOf G2. }
>
> I don't understand.  Can you expand those out in English?    I read
> that, with your proposed subgraph semantics as:
>
>      G1 is a graph which contains at least the triple<s>  <p>  <o>.
>
>      G1 is a representation of G2.
>
> I don't know what "representation" means in this sense, but in any case,
> how can we know from those statements that G2 contains only that one
> triple?   The only connection between that triple and G2 is via G1, and
> we've made that connection so loose that it can't serve this purpose.

You were asking how to say "and that's all" if G1 { } is some triples from.

I am suggesting a statement that the {} corresponds to the AWWW 
respresentation.

>
>> Mindful of DanBri comments dereference being non-global
>
> Which I disposed of.

Only by saying "don't do it" and coming up with the idea of "true 
datasets".  But the reality is that it can happen without the client 
knowing.

If a "true dataset" is one particular case, then fine - it's a pattern 
which is, tome, the best way forward.

If "true dataset" is only case then I disagree with the approach and 
think not reflecting current usage is a pointless exercise in spec-ery.

> The cases of dereference being non-global are not
> useful to this purpose, so we're not using them (in my proposal).
> (Note that non-global dereference, while useful for some things, is
> anathema to much of the Web.   Search engines can't really handle it,
> etc.)
>
>>   maybe all that
>> can ever be said is "subgraph" so making the
>>
>>       G1 {<s>   <p>   <o>. }
>>
>> case the subgraph case may be where AWWW leads us.
>
> I'm quote confident it doesn't.
>
>>   Stronger statements
>> need additional triples to make them and this reflect that fact that
>> additional knowledge over and above AWWW deref is being used.
>
> No.  When I load a Web over the web, it's very clear to me, at a
> protocol level, when I've gotten the full
> "representation" (serialization) of the page, of an image, of a video,
> of a stylesheet, etc.
>
> Also, when I download an RDF/XML or Turtle file, I can tell when I'm
> done.   We want to be able to support merging of graphs but that doesn't
> mean have to pretend the boundaries between the graphs, pre-merging,
> don't exist.    And the whole utility of "named graphs" for some folks
> (eg Tim Lebo) is that it lets you draw boundaries around graphs and
> point to them.
>
>      -- Sandro
>
>
>>  Andy
>>
>>
>>>
>>>       -- Sandro
>>>
>>>
>>>>  Andy
>>>>
>>>> On 07/10/11 03:04, Sandro Hawke wrote:
>>>>> Here's a proposal for what the fourth column should mean.  It's kind of
>>>>> obvious, and I think it's how many of us just assumed Named Graphs were
>>>>> supposed to work.    But I don't think it's been written down in a form
>>>>> we can use, so here it is, in a first draft.
>>>>>
>>>>> I haven't really tried to motivate this, but one thing it does is allow
>>>>> folks to refer to a graphs using just one URI.  As [1] points out rather
>>>>> painfully, as things stand now, you need multiple URIs just to identify
>>>>> each g-box (and thus g-snap).  (That is, you need to say which sparql
>>>>> endpoint you're talking about, and then which graph within its
>>>>> dataset.)
>>>>>
>>>>> My starting question was: what is the relationship between the IRI (the
>>>>> "graph name") and its associated g-snap in an RDF Dataset.  This
>>>>> applies to the dataset backing any SPARQL end point, as well as the
>>>>> dataset serialized in any multigraph syntax, like TriG or N-Quads.
>>>>> Another way to look at it: what does it mean to assert a TriG
>>>>> document?  If you send me the TriG Document "<a>    {<s>    <p>    <o>    }", and
>>>>> I trust you, what do I now know?
>>>>>
>>>>> Richard, I think, has been arguing for a minimalist position,
>>>>> answering "nothing", or "it depends on out-of-band agreements".  This
>>>>> "Web Semantics" proposal is an alternative.
>>>>>
>>>>> === Proposal
>>>>>
>>>>> The idea here is to make the relationship between the URI and the
>>>>> graph be the standard Web naming relationship, similar to what we all
>>>>> use for Web pages.  When you dereference the URI, you get the graph.
>>>>>
>>>>> This has the feature of being, to some extent, observable.  Just like
>>>>> triples are claims about some domain of discourse, quads become claims
>>>>> about idealized Web dereference behavior.
>>>>>
>>>>> Specifically: Consider a "graph naming" to be the association of a
>>>>> graph name N with a graph G.  For the graph naming to hold, every
>>>>> successful dereference of N yielding an RDF graph must yield G.  For a
>>>>> dataset D to hold, its default graph must hold (as normal in RDF) and
>>>>> every graph naming pair in D must hold.
>>>>>
>>>>> Example 1:  This dataset
>>>>>
>>>>>       <http://example.org>    {<s>    <p>    <o>. }
>>>>>
>>>>> means that if anyone is able to dereference "http://example.org"
>>>>> and obtain an RDF graph serialization, the serialized graph will
>>>>> consist of the single triple,<s>    <p>    <o>.  Failure to dereference
>>>>> does not make the graph naming untrue, but a successful dereference
>>>>> yielding a different graph does.
>>>>>
>>>>> Example 2:  This dataset can never be true:
>>>>>
>>>>>       <http://example.org>    {<s>    <p>    1. }
>>>>>       <HTTP://example.org>    {<s>    <p>    2. }
>>>>>
>>>>> ... since one cannot get different results dereferencing URIs that
>>>>> differ only in the case of the scheme component (as per RFC 3986).
>>>>>
>>>>> Example 3:  This dataset:
>>>>>
>>>>>      <tag:hawke.org,2010-10-06:eg1>    {<s>    <p>    <o>. }
>>>>>
>>>>> cannot be tested using Web protocols, since the "tag" URI scheme is
>>>>> (by design) not dereferenceable.  Whether it is true or false cannot
>>>>> be determined experimentally.
>>>>>
>>>>> ==== Temporal Context
>>>>>
>>>>> How can we say:
>>>>>
>>>>>       <http://example.org>    {<s>    <p>    <o>. }
>>>>>
>>>>> if we suspect that "http://example.org" might serve some other content
>>>>> tomorrow?
>>>>>
>>>>> The answer is that datasets often need temporal qualification just
>>>>> like RDF graphs do.  It's just like saying in RDF:
>>>>>
>>>>>       <http://example.org/Alice>    foaf:age 25.
>>>>>
>>>>> One solution for foaf:age triples is to include triples like:
>>>>>       <>    dc:temporal "2011-10-06"^^xs:dateTime.
>>>>>
>>>>> and that can be done in datasets as well, using the default graph.
>>>>> More work is needed on this, but I'm pretty sure this proposal can use
>>>>> whatever solution people come up with for RDF and doesn't make matters
>>>>> much worse than they are already.
>>>>>
>>>>> ==== Practical Deployment Choices
>>>>>
>>>>> Any system which maintains a dataset (eg a sparql endpoint) or
>>>>> generates multigraph documents like TriG has to do one (or more) of
>>>>> the following:
>>>>>
>>>>> 1.  Use new non-dereferenceable graph names.  These could be tag or
>>>>>        uuid URIs, or http URIs in your own name space which you choose to
>>>>>        leave 404.
>>>>>
>>>>> 2.  Use your own dereferenceable graph names, perhaps relative to the
>>>>>        endpoint or TriG document URI.  If you do serve RDF content at
>>>>>        those URIs, it MUST be the same content (give or take stated time
>>>>>        lag).
>>>>>
>>>>> 3.  Use someone else's graph names.  Here, the key thing is temporal
>>>>>        metadata.  You have to decide what you want (copy once vs
>>>>>        synchronize with what accuracy) and (somehow) share that temporal
>>>>>        metadata.
>>>>>
>>>>>
>>>>> ...
>>>>>
>>>>> Okay, that's enough for now.  Give me a +1 if you think this is headed
>>>>> in a useful direction.
>>>>>
>>>>>        -- Sandro
>>>>>
>>>>> [1] http://www.w3.org/2011/prov/wiki/Using_named_graphs_to_model_Accounts
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>
>

Received on Monday, 10 October 2011 15:07:48 UTC