minimal dataset semantics from Sandro Hawke on 2013-06-07 (public-rdf-wg@w3.org from June 2013)

From: Sandro Hawke <sandro@w3.org>
Date: Fri, 07 Jun 2013 11:28:40 -0400
To: Andy Seaborne <andy@apache.org>
CC: public-rdf-wg@w3.org
Message-ID: <51B1FC28.1070006@w3.org>
On 06/07/2013 02:17 AM, Andy Seaborne wrote:
> On 06/06/13 16:33, Sandro Hawke wrote:
>> On 06/06/2013 01:14 AM, Andy Seaborne wrote:
>>> On 05/06/13 18:56, Sandro Hawke wrote:
>>>
>>>>  I'm worried
>>>> about how Alice can send Bob a dataset in which the graph names denote
>>>> their graph (and in which she is asserting the triples in the default
>>>> graph).   How can Alice communicate this intent to Bob? Doing it
>>>> out-of-band is of course possible (she calls him on the phone), but
>>>> that's very messy.   Can she do it in-band, such as by adding a magic
>>>> triple to the default graph?     Unless that's licensed by the RDF
>>>> Recommendations, I don't think so.   If the RDF Recommendations say 
>>>> all
>>>> the triples/quads in a dataset are meaningless (as they currently do),
>>>> then Bob isn't licensed to consider them as conveying Alice's intent.
>>>
>>> There's nothing to stop additional information being given - you can
>>> additionally describe the dataset and it's usage of graph labels.
>>>
>>> What is needed is that vocabulary definition - it does not even have
>>> to be rdf:*
>>>
>>> When receiving any document, the receiver has to assess what of it
>>> they are going to interpret/trust. (The real question is whether to
>>> trust the publisher has followed RDF specs - we can't do anything
>>> about that.)
>>>
>>> Sandro - what text in the docs do you think blocks that?
>>>
>>> Are you asking that the default graph is interpreted as it would if it
>>> were a single graph at that location?
>>>
>
> But is there any current text that blocks that?  I don't see any.
>

Nothing blocks it, but there's also nothing supporting it, so people can 
do something which breaks interoperability without violating our specs.

With RDF, there's a chain of specs:  the media-type specs point to RDF, 
and RDF says the IRIs denote things, and the graphs have truth 
conditions, etc.   That allows RDF graph syntaxes to have semantics 
which are extended by defining new IRIs.

If datasets don't have truth conditions at all, then there's no proper 
way in the document to indicate an extension, no way to ever give 
datasets semantics without an out-of-band protocol.

>>
>> Yes, I think that would do it.     If 4.2 were normative, it would
>> provide this functionality in a roundabout way.
>
> 4.2 is specifically about content negotiation.  It says that "if 
> expecting an RDF graph" so it does not really say anything about the 
> treatment of default graph when in the dataset.
>

I'm not sure I agree about 4.2, but it probably doesn't matter since 
it's not normative anyway.

> A client application gets some data; there is a statement that dataset 
> uses denotational labelling; the app can trust that or not, same as 
> trusting the publisher on any other statement, like the date the 
> dataset/graph/HTML doc was written.
>

No, I don't think there is a statement -- not in the sense I think you 
mean that word.   There is a triple, but it is not sent in a way which 
indicates that it is to be understood as conveying the intent of the sender.

In normal RDF, there are statements being made by something, and you can 
decide whether or not to trust that something.  In datasets as currently 
specified, even if you trust the source and know it to be 100% accurate, 
a dataset document doesn't actually tell you anything except what the 
structure of a dataset is.    The dataset "{<> dc:author 'Sandro 
Hawke'}" does not actually say I wrote it; it just says the dataset 
contains a triple in its default graph that claims I wrote it.   So the 
dataset is perfectly accurate even if I didn't write it.

>
>>   It says:
>>
>> //
>>
>>       If an RDF dataset
>> <https://dvcs.w3.org/hg/rdf/raw-file/default/rdf-concepts/index.html#dfn-rdf-dataset>
>>     is returned and the consumer is expecting an RDF graph
>> <https://dvcs.w3.org/hg/rdf/raw-file/default/rdf-concepts/index.html#dfn-rdf-graph>,
>>     the consumer is expected to use the RDF dataset's
>> <https://dvcs.w3.org/hg/rdf/raw-file/default/rdf-concepts/index.html#dfn-rdf-dataset>
>>     default graph.
>>

(Sorry for how my cut/paste of the spec put those URLs there.  I'm still 
getting used to my email client.)

>> So, that way a client is licensed to take the default graph as being
>> asserted, essentially if it wants to (by deciding it was expecting an
>> RDF graph).
>>
>> I guess the straightforward way to address this would be to express it
>> as minimal (baseline) dataset semantics:
>
> If we re-open up the semantics of datasets discussion, why would this 
> be the one semantic we have?  For better or worse, we have a WG position.
>

(1) because it's the minimal semantics needed to add more semantics, 
like the rdf:BoundDataset flag or whatever.

(2) because as people have been talking about datasets again in recent 
weeks, I'm seeing that even WG members seem to think the default graph 
conveys statements.

>> the RDF Semantics for a dataset
>> is the RDF Semantics of its default graph.    Then the default graph can
>> tell people what other, additional semantic conditions might apply (like
>> that it's a bound dataset, or whatever).    Also,  then section 4.2
>> becomes properly non-normative (it would just be expressing a logical
>> consequence) instead of carrying important information as it does now.
>>
>> cf 2013/02/06-rdf-wg RESOLVED:  Add a non-normative statement to RDF
>> Concepts explaining that if a RDF serialization format supports
>> expressing both datasets and graphs, that a consumer should use the
>> default graph if it is expecting a graph.   (Actual wording to be
>> handled by editor)
>>
>> There's a sort of theoretical argument against this, that people are
>> currently publishing SPARQL and TriG and N-Quads without having this
>> meaning in mind, but I can't think of how changing this could actually
>> cause any problems for any of these people.    I think it would only
>> cause a problem for people who would have a problem with that resolution
>> and section 4.2 -- people who are somehow using the default graph to
>> contain triples they don't want other systems to use.
>
> I have trouble seeing why anyone would publish triples that don't want 
> anyone else to use.
>

When we're talking about the default graph of a dataset, I totally 
agree, which is why I think these semantics are (1) already a defacto 
standard, and (2) harmless to formalize in our spec.      In contrast, a 
named graph might contain stuff that shouldn't be used without special 
understanding, because it might be (for example) an out-of-date snapshot.

Getting procedural for a moment:  I don't want to re-open [1] more than 
a tiny crack, but if I'm right (and Pat seems to agree [2]) then [1] was 
based on the mistaken idea that optional additional dataset semantics 
could be provided by defining vocabulary terms.  I now believe that's 
only true if datasets have this minimal semantic condition (that the 
meaning of a dataset is no less than the meaning of its default 
graph).   So I propose we amend [1] very slightly to include this bit.  
That will allow us to achieve the intent of [1], that dataset semantics 
can be defined elsewhere.      If we don't do anything, I imagine most 
of us will proceed as if this were in the spec, but some people might 
not, and that would reduce interoperability.

     -- Sandro

[1] https://www.w3.org/2013/meeting/rdf-wg/2012-10-03#resolution_1
[2] http://lists.w3.org/Archives/Public/public-rdf-wg/2013Jun/0059.html

>     Andy
>
>>
>>        -- Sandro
>>
>>
>>
>>
>>>     Andy
>>>
>>>>
>>>>         -- Sandro
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>
>
Received on Friday, 7 June 2013 15:28:47 UTC