Re: Web Semantics for Datasets

On Fri, 2011-10-07 at 18:59 +0100, Richard Cyganiak wrote:
> Hi Sandro,
> 
> On 7 Oct 2011, at 17:52, Sandro Hawke wrote:
> >> What's your reason for wanting to make this normative rather than just a declared good practice?
> > 
> > Because I want systems to be able to rely on it.   I want people to be
> > able to write apps which refer to graphs (really g-boxes) by a single
> > URI, etc.    When those apps are dealing with datasets -- via
> > TriG, SPARQL, whatever -- I want them to be able to assume that
> > graph name is still talking about the same g-box.
> 
> You want to stop people from lying by specifying that lying makes their dataset invalid.

No.  I want people to have some shared understanding of what it means to
tell the truth, so that if they want to tell the truth they can.  I want
to make it so people (or systems) acting in good faith can communicate,
if they want to (about graphs).    

> >> Let's say I have a TriG file <x.trig>. Now let's assume a scenario A where it is conforming (the web matches its contents) and a scenario B where it's non-conforming (the web doesn't match its contents). What observable difference in the behaviour of software would you like to see?
> > 
> > If folks were using Web semantics for datasets, and if we can tame the
> > temporal validity issues, then consumers could use data that came in via
> > datasets.   For instance, if sig.ma fetched that TriG document from
> > source <t>:
> > 
> >    <u> { <s> <p> <o>. }
> > 
> > then it wouldn't have to dereference <u>.
> 
> Sig.ma can't do that unless it validates the TriG file, and the only way to validate it is by dereferencing <u>.
> 
> >   It could just add <s> <p>
> > <o> tagged for trust/provenance as coming from the combination of
> > sources <u> and <t>.
> 
> But it doesn't come from <u> at all. <t> claimed that it came from <u>, which is entirely different.

Not entirely, no.  If <t> is telling the truth, which I think will be
the normal state of affairs, then it's the same.

> > So, a transition plan might be that we have two media types for TriG,
> > one for when you're using Web semantics and one for when you're not.
> 
> That's like having one media type for valid HTML and one for invalid HTML.

I think it's a lot more like having two media types for HTML that have
different semantics for the same syntax.  For instance, we might have
text/html-1 which requires that <strong> text be rendered in bold, while
text/html-2 might say <strong> text can be rendered however the browser
likes (italic, bold, underline) or even omitted.

I think you're advocating something like text/html-2 for datasets, and
I'm advocating something like text/html-1.   (I don't know which would
be better for HTML.  It's a question about how market/social forces will
work.)

> > Sig.ma would only consume the datasets like this when Web semantics were
> > flagged as being used.   This is pretty clumsy, but it would technically
> > work.
> 
> No, it wouldn't. Because Sig.ma can't trust that a dataset is conforming just because its media type says so. Either Sig.ma knows beforehand that a dataset follows the expected practice, or it can't use it. Nothing is won by bringing a notion of conformance into play.
> 
> >>> I think we can make it a lot more crisp than AWWW.
> >> 
> >> That sounds like TAG business to me.
> > 
> > I don't think anyone outside the RDF community cares how the names in
> > named graphs work.  
> 
> The TAG and the RDF community intersect.

Sure.   I'm just saying I think we're in a much better position to
address this than the TAG is.   On paper it may be more like TAG
business, but in practice, I think we're the only group that could
possibly address this.

> > So, when you said this:
> > 
> >        The relationship between <u,G> in a named graph shouldn't be
> >        “dereferencing u yields G”. It should be “owner of u gets to say
> >        what's in G”, which already *is* the case per AWWW, so we don't
> >        actually need to say anything about that when specifying <u,G>.
> > 
> > were you (1) arguing for a different way to frame Web Semantics for
> > Datasets or (2) arguing what the Semantics for Datasets in RDF should
> > be?    I first thought it was 2, which seemed like a big change for you,
> > so now I think it was 1.
> 
> I'm not quite sure what the difference between (1) and (2) is, but I guess I did mean (1). If it's framed as a best practice, then there not much difference. I like to explain that “URI owners should make their URIs dereferenceable because that's a good way of communicating to the world what they intend their URIs to denote.”
> 
> Thinking more about it, your proposal would mean that any conforming dataset would become non-conforming the moment something on the web changes. This means that conformance is logically impossible in many completely reasonable use cases if one actually uses dereferenceable URIs as graph names. For example, datasets that work as caches and keep graph snapshots around for some time would be non-conforming by definition unless they assign new URIs. Whether a dataset conforms or not can literally depend on the weather in Ireland. That is no way to define a data model.

No, I don't think so.  I'm working on how to explain/show that.     At
very least, it works fine if we use static graphs, as you and I
elsewhere discussed being a good practice in many situations.

    -- Sandro

> Best,
> Richard

Received on Saturday, 8 October 2011 02:48:16 UTC