Re: Graph-State Resources (was Re: graphs and documents Re: [ALL] agenda telecon 14 Dec) from Pat Hayes on 2011-12-20 (public-rdf-wg@w3.org from December 2011)

From: Pat Hayes <phayes@ihmc.us>
Date: Mon, 19 Dec 2011 18:43:19 -0600
To: Richard Cyganiak <richard@cyganiak.de>
Cc: Sandro Hawke <sandro@w3.org>, Andy Seaborne <andy.seaborne@epimorphics.com>, public-rdf-wg@w3.org
Message-Id: <8F732E7B-029A-4962-8435-53827929D6B9@ihmc.us>
On Dec 19, 2011, at 4:49 PM, Richard Cyganiak wrote:

> On 15 Dec 2011, at 17:52, Sandro Hawke wrote:
>>> Unconvinced.  What's an RDFa document?  It's some RDF, some scripts, 
>>> some HTML links, some appearance.  Is that limiting it to RDF?
>> 
>> It's not a Graph-State Resource, as I'm trying to define the term.
> 
> That's a bummer.
> 
>> There's a lot more to its state (except in degenerate cases, like a sort
>> of RDFa-quine) than is conveyed in the triples.
> 
> One of the main use cases that makes me kind of want to have more in RDF datasets than the pure data-structure definition we have right now is web crawling. It would be nice to be able to have a well-defined representation of a web crawl as an RDF dataset. But this critically depends on being able to represent partial state (e.g., only the bits of an RDFa page marked up with RDFa) in the dataset.
> 
>> I'm looking for a class of things which have very similar behavior and
>> attributes.  My most recent angle is trying to document how to use REST
>> with these things.  I want to be able to talk about how HEAD, GET, PUT,
>> and PATCH should work on these things.   RDFa documents have to be
>> handled quite differently -- one could not, for instance, PATCH an RDFa
>> document with an application/sparql-update patch.   I'm trying to focus
>> on the class of things for which SPARQL Update is a meaningful PATCH
>> language.
> 
> I think that's restricting it way too much.
> 
> Given adequate parsers and extractors, bits of RDF can be read out of almost every page on the Web.
> 
> Limiting the applicability of the “web-style dataset” pattern to only things published from SPARQL endpoints (or even only update-capable SPARQL endpoints) would result in something that's not useful to most current RDF users. Most of the RDF out there is read-only at this point and doesn't come from SPARQL stores. I don't see this changing anytime soon – RDF coming from SPARQL stores will grow, but so will RDF coming from CMSes and DBs and Excel sheets and screenscraping and other read-only non-SPARQL sources.
> 
>> Looking for other properties which might apply to GSRs, I thought of
>> VoID and came across this:
>> 
>>       The fundamental concept of VoID is the dataset. A dataset is a
>>       set of RDF triples that are published, maintained or aggregated
>>       by a single provider. Unlike RDF graphs, which are purely
>>       mathematical constructs [RDF-CONCEPTS], the term dataset has a
>>       social dimension: we think of a dataset as a meaningful
>>       collection of triples, that deal with a certain topic, originate
>>       from a certain source or process, are hosted on a certain
>>       server, or are aggregated by a certain custodian. Also,
>>       typically a dataset is accessible on the Web, for example
>>       through resolvable HTTP URIs or through a SPARQL endpoint
>> 
>>               - http://www.w3.org/TR/2011/NOTE-void-20110303/#dataset
>> 
>> Terminology aside, that seems to match g-box rather well.  
> 
> Having written the quoted paragraph, I'm not sure that I agree.

Well, "a dataset is a set of triples...". OK, stop right there: a dataset is, then, an RDF graph. It may be a special kind of RDF graph, but it is an RDF graph. So it is not a g-box, apparently, and it is a mathematical construction (a set.) It might not be a "purely" mathematical construction , of course, but then a particular RDF graph need not be "purely" mathematical either (Im not sure what "purely" means here, to be honest, but a particular RDF graph can have properties such as having been composed by someone on a certain date, for example.)

However, this paragraph seems to fall into the same trap that we all fell into, by kind of smushing together the mathematical idea of the set with the practical idea of something accessible on the Web. So maybe, when this is straightened out, a dataset is a g-box after all. 

> 
> The prototypical void:Dataset would be something like “all the RDF in DBpedia”. The prototypical g-box would be something like “Bob's FOAF file” (assuming it can change over time).

Prototypes?? Why not stick to definitions instead. Surely the datasets being described here are all (either graphs or) g-boxes. They are rather large g-boxes, and they have a special kind of presence on the Web and special properties, and perhaps a special social importance, but none of this stops them being resources whose state is parsable into an RDF graph. They might be peculiar g-boxes, but thais does not stop them being g-boxes. 

> 
> The term “g-box” evokes storage of a graph. GSR evokes, to me, observation of the result of HTTP prodding. void:Dataset evokes, to me, a larger, socially meaningful collection of RDF data.

I dont think evoking and prototypically are useful ideas in this kind of discussion. 

Pat

> 
> Best,
> Richard
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Tuesday, 20 December 2011 00:44:07 UTC