- From: Jiri Prochazka <ojirio@gmail.com>
- Date: Mon, 02 Feb 2009 07:56:12 +0100
- To: Richard Cyganiak <richard@cyganiak.de>
- CC: public-lod@w3.org
- Message-ID: <4986990C.1020200@gmail.com>
Thanks for reply. Perhaps I should describe the misconception I was led to in more detail, to clear things up. Lets say I have a data with RDF article representations, which are categorized by tags. If voiD wants to categorize data, than I should have one global dataset, with no categorization and great number of subsets = the number of tags + all their intersections. This was my mistake when reading the guide - I think you do not want such categorization. It's a question "For whom/what the categorization is meant to be?" followed by "How precise it should be?" which I would like to be answered in the guide. If it's meant for a search machine, than I would want to be able to precisely define the categorization and treat it as such, and if it would not cover the whole dataset, then I would like to specify how much it does, but I guess this is work in progress (the statistics). If I want it for human, just specifying the important topics, then I do not need it precise. I suggest the guide should mention that the categorization is meant to be only for orientation, and before partitioning dataset, you should look again at the definition of dataset, so no more people would fall in such confusion as I did. :) Best regards, Jiri Prochazka Richard Cyganiak wrote: > Jiri, > > Thanks for the feedback. > > On 1 Feb 2009, at 21:03, Jiri Prochazka wrote: >> In the article I haven't found a solid definition of what is a dataset >> and when to use another dataset/subset. I think this has to be clearly >> defined. > > “A dataset in voiD (void:Dataset) is a collection of data, which is: > - published and maintained by a single provider, and > - available as RDF, and > - accessible, for example, through dereferenceable HTTP URIs or a SPARQL > endpoint.” > > I think this is as clear as it's possible without becoming overly > constraining. > >> From what I understood, the publisher which is the "primary key" of >> datasets. > > It's three points, see above. > >> I think that it should be emphasized that categorizing datasets should >> only be used, if the data in it are somewhat homogeneous - the >> categorization applies to all of it. > > Categorization is an art that is way older than voiD, and we don't want > tell people how to do it properly! And I definitely don't agree with you > when you say that “a categorization must apply to all of the dataset”. > For example, I think it would be absolutely adequate to say that DBpedia > is about people and geography, because it is a sizable and valuable > resource for both those areas, even though it also contains data about > lots of other things. > >> I guess the categorization it is fairly unusable in use cases like >> personal website, because the information are various... > > Well, http://dbpedia.org/resource/Personal_web_page might be a nice > subject here. (Assuming that you do have some interesting RDF on your > site!) > > (I note with regret that the Wikipedia article on “Random stuff” has > been deleted, it would make for another nice DBpedia resource...) > >> Another thing - dataset partitioning. Combination of dataset >> categorization and partitioning led me to great confusion - I have >> thought voiD also wanted to categorize the data in the dataset. >> Better to put a notice that partitioning should be used carefully and >> that it was designed for mirroring of datasets. > > I don't understand. “I have thought voiD also wanted to categorizing the > data in the dataset” -- yes, that IS what we want. “partitioning was > designed for mirroring of datasets” -- no, it was designed for cases > where voiD authors want to say something about just a part of the > dataset, and not about the entire dataset, for whatever reason. > > Best, > Richard > > > >> >> >> Best regards, >> Jiri Prochazka >> >> >> PS: Please send the replies also directly to me, as I am not subscribed >> to this list. >> >
Received on Monday, 2 February 2009 06:58:34 UTC