Re: dataset semantics from Antoine Zimmermann on 2011-12-20 (public-rdf-wg@w3.org from December 2011)

From: Antoine Zimmermann <antoine.zimmermann@emse.fr>
Date: Tue, 20 Dec 2011 11:45:11 +0100
To: Pat Hayes <phayes@ihmc.us>
CC: public-rdf-wg@w3.org
Message-ID: <4EF06737.8030802@emse.fr>
Le 20/12/2011 00:12, Pat Hayes a écrit :
>
> On Dec 19, 2011, at 4:33 AM, Antoine Zimmermann wrote:
>
>> Le 17/12/2011 17:02, Pat Hayes a écrit :
>>>
>>> On Dec 17, 2011, at 7:09 AM, Sandro Hawke wrote:
>>>
>>>> On Sat, 2011-12-17 at 10:29 +0000, William Waites wrote:
>>>>> On Sat, 17 Dec 2011 00:43:38 -0500, Sandro
>>>>> Hawke<sandro@w3.org> said:
>>>>>
>>>>> sandro>   We haven't quite figured that out yet.  I'm
>>>>> proposing one sandro>   part of that is that a dataset being
>>>>> true implies its sandro>   default graph is true.
>>>
>>>> In terms of an entailment test:
>>>>
>>>> <a>    {<b>   <c>   <d>   }
>>>>
>>>> does NOT entail
>>>>
>>>> {<b>   <c>   <d>   }
>>>>
>>>
>>>
>>> Really?? Is this generally accepted, or is it your own
>>> conclusion? Because this has the (to me surprising) consequence
>>> that publishing a dataset does not assert ANY of the named graphs
>>> in it. Which leaves me wondering what the point of having
>>> datasets can possibly be in the first place. Does the Semantic
>>> Web consist mostly of unasserted fiction?
>>
>> As I hope people won't be publishing datasets, there really isn't
>> any problem with this. Really, we have to preserve the triple
>> format as the standard way of publishing data on the Web. Datasets
>> should only be an exchange format and data model for systems that
>> manage data.
>
> I guess I fail to follow this distinction. Isnt the very idea of the
> semantic web to put, and manage, data on the Web? So if these systems
> that manage data are functioning within the semantic web, how is that
> case distinguished from the other case?

Compare a webpage, in HTML, with a complete website packed in a ZIP 
file. The HTML is like RDF, the ZIP file is like the dataset. Nobody 
publishes multiple web documents as ZIP files, they use the ZIP as an 
efficient exchange format that you can send over by emails or transfer 
via FTP etc. Of course you can publish ZIP files if you want, but that's 
not the purpose of it.

>
>> different sources, and go for N-Quints and define Dataset-sets and
>> reiterate all the hard work of this WG later.
>
> Again I have trouble understanding this. Why would we need to
> reiterate this if datasets were published?

If datasets are published commonly in the wild, there'll be crawlers 
getting them and they'll have to store them in appropriate data 
structures where one can track the provenance of each dataset. So you 
need a dataset-set.

> There seems to be a background assumption here, which I may just not
> fully understand, about the reasons why datasets use quads having to
> do with RDF coming from different sources, and the need to keep these
> sources distinct. Is that right? Is this the primary reason for
> considering datasets in the first place, to provide a way to keep
> track of 'provenance' of RDF graphs, rather than simply merge them
> into one large graph? If so, we should get this clear and agree on
> it. So far in these emails I have seen a whole variety of purposes
> suggested for quadstores/datasets.

Provenance is one of the identified use case. It does not have to be the 
only one to justify that published datasets would need to have their 
provenance defined by a Web crawler.
Moreover, even if provenance is not the only use case, it is certainly 
one of prominant importance.

>> Once you acknowledge this, you understand that the "named" graphs
>> inside the datasets are just saying "there is an RDF graph
>> containing those triples, it is labelled with this 'name' and we
>> don't care whether it is asserted or not". Do we care whether a
>> relational table is "asserting" something in a database?
>
> Actually, i dont really care whether it is asserted or not, but I
> wanted to establish that it did actually mean something to say that
> it is asserted. In answer to your rhetorical question, though: yes,
> of course we care. That is, anyone who plans to use the data had
> better care, so it had better be clear when a relational table is
> being asserted as opposed, say, to being just held up for ridicule.

Personally, when I use a database, I just care that my queries get the 
results from the database according to the semantics of SQL. Same with 
an RDF dataset. Whether it's "asserted" or not, frankly, I don't give a 
damn.


AZ

>
> Pat
>
>>
>>>
>>> Pat
>>>
>>> ------------------------------------------------------------
>>> IHMC (850)434 8903 or (650)494 3973 40 South Alcaniz St. (850)202
>>> 4416   office Pensacola                            (850)202 4440
>>> fax FL 32502                              (850)291 0667 mobile
>>> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>> -- Antoine Zimmermann ISCOD / LSTI - Institut Henri Fayol École
>> Nationale Supérieure des Mines de Saint-Étienne 158 cours Fauriel
>> 42023 Saint-Étienne Cedex 2 France Tél:+33(0)4 77 42 83 36
>> Fax:+33(0)4 77 42 66 66 http://zimmer.aprilfoolsreview.com/
>>
>>
>
> ------------------------------------------------------------ IHMC
> (850)434 8903 or (650)494 3973 40 South Alcaniz St.
> (850)202 4416   office Pensacola                            (850)202
> 4440   fax FL 32502                              (850)291 0667
> mobile phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>
>
>
>
>
>

-- 
Antoine Zimmermann
ISCOD / LSTI - Institut Henri Fayol
École Nationale Supérieure des Mines de Saint-Étienne
158 cours Fauriel
42023 Saint-Étienne Cedex 2
France
Tél:+33(0)4 77 42 83 36
Fax:+33(0)4 77 42 66 66
http://zimmer.aprilfoolsreview.com/
Received on Tuesday, 20 December 2011 10:45:43 UTC