Re: service description vocabulary from Gregory Williams on 2009-09-25 (public-rdf-dawg@w3.org from July to September 2009)

From: Gregory Williams <greg@evilfunhouse.com>
Date: Fri, 25 Sep 2009 11:41:45 -0400
To: Alexandre Passant <Alexandre.Passant@deri.org>
Cc: "public-rdf-dawg@w3.org Group" <public-rdf-dawg@w3.org>
Message-Id: <DA783E71-04F3-4A5A-BA4A-4C6260E5BFCC@evilfunhouse.com>
On Sep 25, 2009, at 3:38 AM, Alexandre Passant wrote:

> On 25 Sep 2009, at 03:41, Gregory Williams wrote:
>
>> Beyond what's currently listed in the vocab section of the service  
>> description page[1], I think we need a way to describe the dataset  
>> provided by the endpoint. This goes beyond what things like voiD  
>> provide which is a way to describe a single graph. Therefore, I'd  
>> like to suggest something like this:
>>
>> <endpoint> sd:datasetDescription [
>> 	sd:defaultGraph <void-dataset-for-default-graph> ;
>> 	sd:namedGraph [
>> 		sd:graphName <graph-name> ;
>> 		sd:graphDescription <void-dataset-for-named-graph> ;
>> 	] .
>> ] .
>
> So, in a quad store, you will describe each graph separately using  
> voiD ?
> Won't it be too much information in the SD, e.g. if I have 1 million  
> RDF files in my store, will have 1 million of voiD descriptions in  
> the SD.
> It may be more useful to directly querying each graph to get that  
> void-like information, if needed.
>
> What about having a simple description listing the list of graphs +  
> default one + the voiD description of the complete endpoint.

Say you have 1 million named graphs (based on your description, I  
assume "1 million RDF files" means 1 million named graphs?) each with  
1 million triples, and a single default graph with 100 triples. At  
least for my use cases (optimization and federation), getting a voiD  
description of the graph-merge might very well be worse than having no  
description at all. If I'm trying to estimate how many results I can  
expect for any query against the default graph, having the description  
of the graph merge wouldn't do me any good.

On the other side of things, if I look at the dataset description and  
find that there's information about a single foaf:Person in the  
dataset and I want to retrieve that information, how am I meant to get  
to it if I don't know which of the million named graphs it's in?

Having a huge number of named graphs is clearly a challenge w.r.t.  
size of the service description, but I'm worried that a voiD  
description of the merged dataset isn't all that useful if it doesn't  
give you enough information to turn around and query the dataset for  
things you're interested in.

>> The lack of naming symmetry between sd:defaultGraph (for default  
>> graphs) and sd:graphDescription (for named graphs) could probably  
>> be made better (maybe sd:defaultGraphDescription?), but this  
>> modeling allows each graph in the dataset to be described as well  
>> as things to be said about the entire dataset.
>
> Strictly speaking, isn't the default graph also a named graph (since  
> it generally also have its own URI).

Possibly, but it doesn't have to have its own name, does it?

> <endpoint> sd:datasetDescription [
> 	sd:defaultGraph [
>                sd:graphName <graph-name> ;
> 		sd:graphDescription <void-dataset-for-default-graph> ;
>        ] .
> 	sd:namedGraph [
> 		sd:graphName <graph-name> ;
> 		sd:graphDescription <void-dataset-for-named-graph> ;
> 	] .
> ] .

If we can count on all graphs (even a default graph) having a name,  
then this would be a good way to generalize the modeling. For  
consistency, even without a graphName on the default graph, maybe we  
should do this?


> while I'd think a simple way would be
>
> <endpoint> sd:datasetDescription <void-dataset-for-dataset> ;
> 	sd:defaultGraph <graph-name> ;
> 	sd:namedGraph <graph-name> .

Again, I'm not sure how useful the <void-dataset-for-dataset> would  
be, since by this point you've lost the ability to discriminate  
between the named (and default) graphs.

Also, it's come up briefly in the past, but we haven't done much  
talking about the difference between including the dataset description  
in the service description document, or providing a URL for retrieving  
the dataset description if/when needed. I think both are important,  
but it does make querying harder because you need to potentially  
handle both cases. Thoughts?

thanks,
.greg
Received on Friday, 25 September 2009 15:42:24 UTC