- From: Richard Cyganiak <richard@cyganiak.de>
- Date: Mon, 2 Feb 2009 00:44:15 +0000
- To: Jiri Prochazka <ojirio@gmail.com>
- Cc: public-lod@w3.org
Jiri, Thanks for the feedback. On 1 Feb 2009, at 21:03, Jiri Prochazka wrote: > In the article I haven't found a solid definition of what is a dataset > and when to use another dataset/subset. I think this has to be clearly > defined. “A dataset in voiD (void:Dataset) is a collection of data, which is: - published and maintained by a single provider, and - available as RDF, and - accessible, for example, through dereferenceable HTTP URIs or a SPARQL endpoint.” I think this is as clear as it's possible without becoming overly constraining. > From what I understood, the publisher which is the "primary key" of > datasets. It's three points, see above. > I think that it should be emphasized that categorizing datasets should > only be used, if the data in it are somewhat homogeneous - the > categorization applies to all of it. Categorization is an art that is way older than voiD, and we don't want tell people how to do it properly! And I definitely don't agree with you when you say that “a categorization must apply to all of the dataset”. For example, I think it would be absolutely adequate to say that DBpedia is about people and geography, because it is a sizable and valuable resource for both those areas, even though it also contains data about lots of other things. > I guess the categorization it is fairly unusable in use cases like > personal website, because the information are various... Well, http://dbpedia.org/resource/Personal_web_page might be a nice subject here. (Assuming that you do have some interesting RDF on your site!) (I note with regret that the Wikipedia article on “Random stuff” has been deleted, it would make for another nice DBpedia resource...) > Another thing - dataset partitioning. Combination of dataset > categorization and partitioning led me to great confusion - I have > thought voiD also wanted to categorize the data in the dataset. > Better to put a notice that partitioning should be used carefully and > that it was designed for mirroring of datasets. I don't understand. “I have thought voiD also wanted to categorizing the data in the dataset” -- yes, that IS what we want. “partitioning was designed for mirroring of datasets” -- no, it was designed for cases where voiD authors want to say something about just a part of the dataset, and not about the entire dataset, for whatever reason. Best, Richard > > > Best regards, > Jiri Prochazka > > > PS: Please send the replies also directly to me, as I am not > subscribed > to this list. >
Received on Monday, 2 February 2009 00:44:57 UTC