Re: Web Semantics for Datasets

On Fri, 2011-10-07 at 16:57 +0100, Richard Cyganiak wrote:
> On 7 Oct 2011, at 15:07, Sandro Hawke wrote:
> >> The way I see it, the proposal is backwards. We want to specify an abstract, idealised information space – a collection of RDF graphs. How to access that information space (dereferencing, SPARQL, N-Quads dumps, etc) is an implementation detail that's subject to pragmatic decision, unreliable networks, fads and fashions, and so on. When specifying the abstract information space, you cannot define it in terms of its access implementation. Just define the abstract information space, and leave it to the market to decide on how to access it.
> > 
> > You may well be right about this, but I'd like to see how far we can
> > push it and what we'd get from doing so.
> 
> What's your reason for wanting to make this normative rather than just a declared good practice?

Because I want systems to be able to rely on it.   I want people to be
able to write apps which refer to graphs (really g-boxes) by a single
URI, etc.    When those apps are dealing with datasets -- via
TriG, SPARQL, whatever -- I want them to be able to assume that
graph name is still talking about the same g-box.

I agree we need a transition plan -- we can't just make datasets out
there have a different meaning by fiat.   And I'm not sure exactly
where the line is between a "good practice" and a "W3C
Recommendation".    But if there's something we think folks should be
doing, in certain circumstances, let's specify that behavior, and give
people a way to signal they are doing it, so others can start to build
on it. 

> I take it that a TriG file would be non-conforming if it contains a named graph that doesn't match what you get by dereferencing?
> 
> Let's say I have a TriG file <x.trig>. Now let's assume a scenario A where it is conforming (the web matches its contents) and a scenario B where it's non-conforming (the web doesn't match its contents). What observable difference in the behaviour of software would you like to see?

If folks were using Web semantics for datasets, and if we can tame the
temporal validity issues, then consumers could use data that came in via
datasets.   For instance, if sig.ma fetched that TriG document from
source <t>:
 
    <u> { <s> <p> <o>. }

then it wouldn't have to dereference <u>.   It could just add <s> <p>
<o> tagged for trust/provenance as coming from the combination of
sources <u> and <t>.

This isn't the most compelling use case -- it's just pre-loading a web
cache -- but hopefully it answers your question.  If the site giving
sig.ma that TriG document is conforming, then sig.ma presents its users
with the right data; if the site providing the TriG document has some
other notion of what datasets mean, or is buggy or otherwise
non-conforming, it could well result in sig.ma giving its users bad data
(even though everyone is being good).

So, a transition plan might be that we have two media types for TriG,
one for when you're using Web semantics and one for when you're not.
Sig.ma would only consume the datasets like this when Web semantics were
flagged as being used.   This is pretty clumsy, but it would technically
work.

> For example, would a TriG parser generate the same RDF dataset in both cases or not?
> 
> Would a SPARQL processor answer all queries in the same way or not?
> 
> Would it entail the same additional triples/quads under the various levels of RDF/S and OWL entailment or not?

This issue is sort of higher-level than all that, and doesn't affect
that stuff.

> > I think we can make it a lot more crisp than AWWW.
> 
> That sounds like TAG business to me.

I don't think anyone outside the RDF community cares how the names in
named graphs work.  The TAG isn't going to solve this for us and isn't
going to mind if we solve it in a sensible way.

> >> The relationship between <u,G> in a named graph shouldn't be “dereferencing u yields G”. It should be “owner of u gets to say what's in G”, which already *is* the case per AWWW, so we don't actually need to say anything about that when specifying <u,G>.
> > 
> > Can you say more about this?  I don't understand.   That seems even more
> > abstract than dereference.    
> 
> It says: “Good practice: don't squat in other people's namespaces.”
> 
> > In practice, the owner of u gets to
> > control what happens when folks dereference it, but without dereference
> > I'm not sure the world cares who the "owner" is or really gives them any
> > special rights.
> 
> That sounds completely wrong to me. In practice, the social convention of URI ownership is relied on in many places without assuming (or sometimes while forbidding) dereferencing: XML namespaces, URNs, microdata vocabulary URIs, …

Okay, I see what you mean.   Socially, for human interpretation, yes the
URI owner is granted some rights.   I think there is a community of
developers that scoffs at that idea, but that's a big digression.

So, when you said this:

        The relationship between <u,G> in a named graph shouldn't be
        “dereferencing u yields G”. It should be “owner of u gets to say
        what's in G”, which already *is* the case per AWWW, so we don't
        actually need to say anything about that when specifying <u,G>.
        
were you (1) arguing for a different way to frame Web Semantics for
Datasets or (2) arguing what the Semantics for Datasets in RDF should
be?    I first thought it was 2, which seemed like a big change for you,
so now I think it was 1.

   -- Sandro


> Best,
> Richard

Received on Friday, 7 October 2011 16:52:22 UTC