Re: Web Semantics for Datasets

Hi Sandro,

On 7 Oct 2011, at 17:52, Sandro Hawke wrote:
>> What's your reason for wanting to make this normative rather than just a declared good practice?
> 
> Because I want systems to be able to rely on it.   I want people to be
> able to write apps which refer to graphs (really g-boxes) by a single
> URI, etc.    When those apps are dealing with datasets -- via
> TriG, SPARQL, whatever -- I want them to be able to assume that
> graph name is still talking about the same g-box.

You want to stop people from lying by specifying that lying makes their dataset invalid.

>> Let's say I have a TriG file <x.trig>. Now let's assume a scenario A where it is conforming (the web matches its contents) and a scenario B where it's non-conforming (the web doesn't match its contents). What observable difference in the behaviour of software would you like to see?
> 
> If folks were using Web semantics for datasets, and if we can tame the
> temporal validity issues, then consumers could use data that came in via
> datasets.   For instance, if sig.ma fetched that TriG document from
> source <t>:
> 
>    <u> { <s> <p> <o>. }
> 
> then it wouldn't have to dereference <u>.

Sig.ma can't do that unless it validates the TriG file, and the only way to validate it is by dereferencing <u>.

>   It could just add <s> <p>
> <o> tagged for trust/provenance as coming from the combination of
> sources <u> and <t>.

But it doesn't come from <u> at all. <t> claimed that it came from <u>, which is entirely different.

> So, a transition plan might be that we have two media types for TriG,
> one for when you're using Web semantics and one for when you're not.

That's like having one media type for valid HTML and one for invalid HTML.

> Sig.ma would only consume the datasets like this when Web semantics were
> flagged as being used.   This is pretty clumsy, but it would technically
> work.

No, it wouldn't. Because Sig.ma can't trust that a dataset is conforming just because its media type says so. Either Sig.ma knows beforehand that a dataset follows the expected practice, or it can't use it. Nothing is won by bringing a notion of conformance into play.

>>> I think we can make it a lot more crisp than AWWW.
>> 
>> That sounds like TAG business to me.
> 
> I don't think anyone outside the RDF community cares how the names in
> named graphs work.  

The TAG and the RDF community intersect.

> So, when you said this:
> 
>        The relationship between <u,G> in a named graph shouldn't be
>        “dereferencing u yields G”. It should be “owner of u gets to say
>        what's in G”, which already *is* the case per AWWW, so we don't
>        actually need to say anything about that when specifying <u,G>.
> 
> were you (1) arguing for a different way to frame Web Semantics for
> Datasets or (2) arguing what the Semantics for Datasets in RDF should
> be?    I first thought it was 2, which seemed like a big change for you,
> so now I think it was 1.

I'm not quite sure what the difference between (1) and (2) is, but I guess I did mean (1). If it's framed as a best practice, then there not much difference. I like to explain that “URI owners should make their URIs dereferenceable because that's a good way of communicating to the world what they intend their URIs to denote.”

Thinking more about it, your proposal would mean that any conforming dataset would become non-conforming the moment something on the web changes. This means that conformance is logically impossible in many completely reasonable use cases if one actually uses dereferenceable URIs as graph names. For example, datasets that work as caches and keep graph snapshots around for some time would be non-conforming by definition unless they assign new URIs. Whether a dataset conforms or not can literally depend on the weather in Ireland. That is no way to define a data model.

Best,
Richard

Received on Friday, 7 October 2011 18:00:08 UTC