Re: Graph-State Resources (was Re: graphs and documents Re: [ALL] agenda telecon 14 Dec) from Richard Cyganiak on 2011-12-19 (public-rdf-wg@w3.org from December 2011)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Mon, 19 Dec 2011 22:49:38 +0000
To: Sandro Hawke <sandro@w3.org>
Cc: Andy Seaborne <andy.seaborne@epimorphics.com>, public-rdf-wg@w3.org
Message-Id: <7B678E64-2857-4522-B255-675DC4FDA8C5@cyganiak.de>

On 15 Dec 2011, at 17:52, Sandro Hawke wrote:
>> Unconvinced.  What's an RDFa document?  It's some RDF, some scripts, 
>> some HTML links, some appearance.  Is that limiting it to RDF?
> 
> It's not a Graph-State Resource, as I'm trying to define the term.

That's a bummer.

> There's a lot more to its state (except in degenerate cases, like a sort
> of RDFa-quine) than is conveyed in the triples.

One of the main use cases that makes me kind of want to have more in RDF datasets than the pure data-structure definition we have right now is web crawling. It would be nice to be able to have a well-defined representation of a web crawl as an RDF dataset. But this critically depends on being able to represent partial state (e.g., only the bits of an RDFa page marked up with RDFa) in the dataset.

> I'm looking for a class of things which have very similar behavior and
> attributes.  My most recent angle is trying to document how to use REST
> with these things.  I want to be able to talk about how HEAD, GET, PUT,
> and PATCH should work on these things.   RDFa documents have to be
> handled quite differently -- one could not, for instance, PATCH an RDFa
> document with an application/sparql-update patch.   I'm trying to focus
> on the class of things for which SPARQL Update is a meaningful PATCH
> language.

I think that's restricting it way too much.

Given adequate parsers and extractors, bits of RDF can be read out of almost every page on the Web.

Limiting the applicability of the “web-style dataset” pattern to only things published from SPARQL endpoints (or even only update-capable SPARQL endpoints) would result in something that's not useful to most current RDF users. Most of the RDF out there is read-only at this point and doesn't come from SPARQL stores. I don't see this changing anytime soon – RDF coming from SPARQL stores will grow, but so will RDF coming from CMSes and DBs and Excel sheets and screenscraping and other read-only non-SPARQL sources.

> Looking for other properties which might apply to GSRs, I thought of
> VoID and came across this:
> 
>        The fundamental concept of VoID is the dataset. A dataset is a
>        set of RDF triples that are published, maintained or aggregated
>        by a single provider. Unlike RDF graphs, which are purely
>        mathematical constructs [RDF-CONCEPTS], the term dataset has a
>        social dimension: we think of a dataset as a meaningful
>        collection of triples, that deal with a certain topic, originate
>        from a certain source or process, are hosted on a certain
>        server, or are aggregated by a certain custodian. Also,
>        typically a dataset is accessible on the Web, for example
>        through resolvable HTTP URIs or through a SPARQL endpoint
> 
>                - http://www.w3.org/TR/2011/NOTE-void-20110303/#dataset
> 
> Terminology aside, that seems to match g-box rather well.  

Having written the quoted paragraph, I'm not sure that I agree.

The prototypical void:Dataset would be something like “all the RDF in DBpedia”. The prototypical g-box would be something like “Bob's FOAF file” (assuming it can change over time).

The term “g-box” evokes storage of a graph. GSR evokes, to me, observation of the result of HTTP prodding. void:Dataset evokes, to me, a larger, socially meaningful collection of RDF data.

Best,
Richard

Received on Monday, 19 December 2011 22:50:13 UTC