Re: service description vocabulary from Steve Harris on 2009-09-26 (public-rdf-dawg@w3.org from July to September 2009)

From: Steve Harris <steve.harris@garlik.com>
Date: Sat, 26 Sep 2009 10:06:36 +0100
To: "public-rdf-dawg@w3.org Group" <public-rdf-dawg@w3.org>
Message-Id: <602BDBF2-9A41-4B82-81C6-6C5424756937@garlik.com>
On 25 Sep 2009, at 16:41, Gregory Williams wrote:

> On Sep 25, 2009, at 3:38 AM, Alexandre Passant wrote:
>
>> On 25 Sep 2009, at 03:41, Gregory Williams wrote:
>>
>>> Beyond what's currently listed in the vocab section of the service  
>>> description page[1], I think we need a way to describe the dataset  
>>> provided by the endpoint. This goes beyond what things like voiD  
>>> provide which is a way to describe a single graph. Therefore, I'd  
>>> like to suggest something like this:
>>>
>>> <endpoint> sd:datasetDescription [
>>> 	sd:defaultGraph <void-dataset-for-default-graph> ;
>>> 	sd:namedGraph [
>>> 		sd:graphName <graph-name> ;
>>> 		sd:graphDescription <void-dataset-for-named-graph> ;
>>> 	] .
>>> ] .
>>
>> So, in a quad store, you will describe each graph separately using  
>> voiD ?
>> Won't it be too much information in the SD, e.g. if I have 1  
>> million RDF files in my store, will have 1 million of voiD  
>> descriptions in the SD.
>> It may be more useful to directly querying each graph to get that  
>> void-like information, if needed.
>>
>> What about having a simple description listing the list of graphs +  
>> default one + the voiD description of the complete endpoint.
>
> Say you have 1 million named graphs (based on your description, I  
> assume "1 million RDF files" means 1 million named graphs?) each  
> with 1 million triples, and a single default graph with 100 triples.  
> At least for my use cases (optimization and federation), getting a  
> voiD description of the graph-merge might very well be worse than  
> having no description at all. If I'm trying to estimate how many  
> results I can expect for any query against the default graph, having  
> the description of the graph merge wouldn't do me any good.
>
> On the other side of things, if I look at the dataset description  
> and find that there's information about a single foaf:Person in the  
> dataset and I want to retrieve that information, how am I meant to  
> get to it if I don't know which of the million named graphs it's in?

SELECT ?g ?x WHERE { GRAPH ?g { ?x a foaf:Person } } ?

Maybe I misunderstood the question.

> Having a huge number of named graphs is clearly a challenge w.r.t.  
> size of the service description, but I'm worried that a voiD  
> description of the merged dataset isn't all that useful if it  
> doesn't give you enough information to turn around and query the  
> dataset for things you're interested in.

Let's not fixate on Void. If Void is not sufficient then the community  
will come up with something more comprehensive.

>>> The lack of naming symmetry between sd:defaultGraph (for default  
>>> graphs) and sd:graphDescription (for named graphs) could probably  
>>> be made better (maybe sd:defaultGraphDescription?), but this  
>>> modeling allows each graph in the dataset to be described as well  
>>> as things to be said about the entire dataset.
>>
>> Strictly speaking, isn't the default graph also a named graph  
>> (since it generally also have its own URI).
>
> Possibly, but it doesn't have to have its own name, does it?

No, it's not a named graph. Or at least, it's not required to be... or  
something :)

>> <endpoint> sd:datasetDescription [
>> 	sd:defaultGraph [
>>               sd:graphName <graph-name> ;
>> 		sd:graphDescription <void-dataset-for-default-graph> ;
>>       ] .
>> 	sd:namedGraph [
>> 		sd:graphName <graph-name> ;
>> 		sd:graphDescription <void-dataset-for-named-graph> ;
>> 	] .
>> ] .
>
> If we can count on all graphs (even a default graph) having a name,  
> then this would be a good way to generalize the modeling. For  
> consistency, even without a graphName on the default graph, maybe we  
> should do this?
>
>
>> while I'd think a simple way would be
>>
>> <endpoint> sd:datasetDescription <void-dataset-for-dataset> ;
>> 	sd:defaultGraph <graph-name> ;
>> 	sd:namedGraph <graph-name> .
>
> Again, I'm not sure how useful the <void-dataset-for-dataset> would  
> be, since by this point you've lost the ability to discriminate  
> between the named (and default) graphs.

Well, it's a little murky anyway, both the protocol and query have the  
ability to change the contents of the default graph.

> Also, it's come up briefly in the past, but we haven't done much  
> talking about the difference between including the dataset  
> description in the service description document, or providing a URL  
> for retrieving the dataset description if/when needed. I think both  
> are important, but it does make querying harder because you need to  
> potentially handle both cases. Thoughts?

That's often the case in the Linked Data world anyway, so I don't  
think it's disastrous.

Requiring systems to return everything in one graph could be onerous  
for client and server, eg. in the 2M FOAF graph case, both the list of  
graphs, and the description of the store will be large.

- Steve

-- 
Steve Harris
Garlik Limited, 2 Sheen Road, Richmond, TW9 1AE, UK
+44(0)20 8973 2465  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10  
9AD
Received on Saturday, 26 September 2009 09:07:12 UTC