Re: voiD 1.0 guide comments

Jiri,

Thanks for the feedback.

On 1 Feb 2009, at 21:03, Jiri Prochazka wrote:
> In the article I haven't found a solid definition of what is a dataset
> and when to use another dataset/subset. I think this has to be clearly
> defined.

“A dataset in voiD (void:Dataset) is a collection of data, which is:
- published and maintained by a single provider, and
- available as RDF, and
- accessible, for example, through dereferenceable HTTP URIs or a  
SPARQL endpoint.”

I think this is as clear as it's possible without becoming overly  
constraining.

> From what I understood, the publisher which is the "primary key" of
> datasets.

It's three points, see above.

> I think that it should be emphasized that categorizing datasets should
> only be used, if the data in it are somewhat homogeneous - the
> categorization applies to all of it.

Categorization is an art that is way older than voiD, and we don't  
want tell people how to do it properly! And I definitely don't agree  
with you when you say that “a categorization must apply to all of the  
dataset”. For example, I think it would be absolutely adequate to say  
that DBpedia is about people and geography, because it is a sizable  
and valuable resource for both those areas, even though it also  
contains data about lots of other things.

> I guess the categorization it is fairly unusable in use cases like
> personal website, because the information are various...

Well, http://dbpedia.org/resource/Personal_web_page might be a nice  
subject here. (Assuming that you do have some interesting RDF on your  
site!)

(I note with regret that the Wikipedia article on “Random stuff” has  
been deleted, it would make for another nice DBpedia resource...)

> Another thing - dataset partitioning. Combination of dataset
> categorization and partitioning led me to great confusion - I have
> thought voiD also wanted to categorize the data in the dataset.
> Better to put a notice that partitioning should be used carefully and
> that it was designed for mirroring of datasets.

I don't understand. “I have thought voiD also wanted to categorizing  
the data in the dataset” -- yes, that IS what we want. “partitioning  
was designed for mirroring of datasets” -- no, it was designed for  
cases where voiD authors want to say something about just a part of  
the dataset, and not about the entire dataset, for whatever reason.

Best,
Richard



>
>
> Best regards,
> Jiri Prochazka
>
>
> PS: Please send the replies also directly to me, as I am not  
> subscribed
> to this list.
>

Received on Monday, 2 February 2009 00:44:57 UTC