Re: SD vocab updates: dataset descriptions from Gregory Williams on 2009-12-02 (public-rdf-dawg@w3.org from October to December 2009)

From: Gregory Williams <greg@evilfunhouse.com>
Date: Wed, 2 Dec 2009 12:04:09 -0500
To: Andy Seaborne <andy.seaborne@talis.com>
Cc: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-Id: <2A68F711-2C16-4A8D-A55C-C817F4A55C0C@evilfunhouse.com>
Trying to catch up on some of this SD email...

On Nov 22, 2009, at 5:35 PM, Andy Seaborne wrote:

> On 21/11/2009 00:08, Gregory Williams wrote:
>> 
>> The primary change is related to the link between a SPARQL service and
>> a dataset description. This is where we're going to be punting a bit
>> to other vocabularies such as voiD, and letting them do the actual
>> dataset descriptions. However, as I discussed at the F2F[1], I thought
>> we needed a way to link a dataset to its default graph since the
>> default graph is very SPARQL-specific and not something likely to show
>> up in a mroe general dataset description vocabulary. Given this, I've
>> changed the SD vocab in the following ways:
> 
> The diagram [1] has dataset(s) but also the available universe.
> Is that related to describing update services as well?

I haven't tried yet to align with the update stuff, and am not totally swapped in on it (we've got some open issues on that I believe?).


>> * Added a sd:Dataset class. I'm hoping we can work with the voiD group
>> to make sure this aligns with their notion of a dataset so voiD
>> properties could be attached to a sd:Dataset.
> 
> +1 to working with the voiD group on this.
> 
> voiD can be used to describe
> """
> A dataset is a collection of data, published and maintained by a single provider, available as RDF on the Web, where at least some of the resources in the dataset are identified by dereferencable URIs.
> """
> and there are properties that imply it's one graph: void:dataDump points to an RDF graph so can't be applied to a whole RDF dataset.

I believe this is going to be changed in the version they are currently working on, but will try to find out for sure.


> I used sparql:Dataset rather than sd: as it's a general concept in SPARQL.  Looking at the SD doc, it seems to me that the sd: classes are about SPARQL concepts.

Yeah, I've just put everything into the one vocabulary. If you think it's worth considering splitting things up, I'd be happy to entertain the idea (but I'm not particularly bothered by having them in the same namespace, either).


>> * Replaced sd:datasetDescription with two new terms: sd:defaultDataset
>> and sd:availableDataset. sd:defaultDataset links a sd:Service with a
>> description of the default dataset used for query answering if none is
>> provided by the query or protocol. It may use the defaultGraph
>> property described below. sd:availableDataset links a sd:Service with
>> a description of a dataset containing named graphs that may be used in
>> FROM/FROM NAMED clauses.
> 
> Probably needed but does it not come through lists of all graphs at a service?
> 
> The concept of availableDataset seems to mix two things.
> 
> 1/ It's the enumerating the datasets from limited choices of FROM/FROM NAMED:
> 
>  FROM one of ...
>  FROM NAMED one of ....
> 
> If you have 5 graphs then I made it 150 possible distinct datasets :-) without allowing for union graphs. (Sum of the n'th row of Pascal's triangle for named graphs * N for choices of default dataset. Not all combinations are useful in practice).
> 
> Maybe describing the range of values for FROM/FROM NAMED is better.

In CVS, the current term for this is now sd:availableGraphDescriptions. I hope this more clearly emphasizes that the property links to descriptions of graphs that can be used in FROM/FROM NAMED. I've also added a sd:feature instance sd:dereferenceURLs to indicate that the endpoint will accept any graph in FROM/FROM NAMED and try to dereference the URLs used. I'd be very happy to have suggestions on better names for sd:dereferenceURLs, though.


> 2/ There are some preselected ways to name the dataset description with only some of all datasets possible from all graph names over FROM NAMED.
> 
> This can only be done by conditional choice of FROM, FROM NAMED without some extension e.g. rejest queries that don't meet additional rules like at most 3 FROM NAMED.  In this case, different service URLs for different datasets can be used.  Otherwise we are defining a new 1st class concept and that might be better done waiting until the next WG. (Not sure.)

This isn't what I had intended and hope the new changes make that clear(er).


>> * Added URL variants of the above two terms: sd:defaultDatasetURL and
>> sd:availableDatasetURL. These are meant to allow linking not to the
>> dataset description directly but to a dereferencable document that
>> contains such descriptions. This allows the service description to be
>> kept small while providing access to very large dataset descriptions.
>> 
>> * Added sd:defaultGraph term for linking a sd:Dataset with a
>> description of the default graph in a dataset. For now I'm leaving the
>> rdfs:range of this term open, allowing vocabularies like voiD to do it
>> themselves.
> 
> See above. And sd:namedGraph as presumably the named graphs may not be the entire available universe (or an end point) if I read the diagram correctly.

I've added a sd:namedGraph for symmetry. Hopefully this will align nicely with existing terms (in voiD or elsewhere).


Also, one more recent addition to CVS: a sd:feature IRI sd:unionDefaultGraph to indicate that the default graph of a dataset is the union of all named graphs.


.greg
Received on Wednesday, 2 December 2009 17:04:51 UTC