- From: Richard Cyganiak <richard@cyganiak.de>
- Date: Tue, 6 Jan 2009 11:36:15 +0000
- To: Yves Raimond <yves.raimond@gmail.com>
- Cc: "Aldo Bucchi" <aldo.bucchi@gmail.com>, "public-lod@w3.org" <public-lod@w3.org>
Yves, On 5 Jan 2009, at 10:04, Yves Raimond wrote: >> Remember that Aldo was looking for something that allows clients to >> make >> smart decisions about when to follow a link out of an RDF document. >> He was >> not looking for something to describe the contents of RDF datasets >> on a high >> level. > > But my point is that it is the same problem: pointing clients in the > right direction, by expressing "this document holds more persons born > in NYC" or "this set of RDF triples holds Creative Commons records and > associated tags". I think it's not the same problem. Similar, but not the same. For a high-level description of the contents of a dataset, it's not so important if the description is compact and efficient and concise. Once a client has found a dataset of interest, it will probably make a larger number of queries to that dataset, or process a good deal of data coming from that dataset. In this case it's okay if the description itself is a few kilobytes in size or requires a couple of additional HTTP requests. The additional costs of downloading and parsing and processing the description are still small compared to the costs of running a large number of queries against the dataset. This is different in the link-following scenario that Aldo gave when he opened the thread, and that I tried to address with the proposal I gave earlier. Here, the typical scenario would be a Tabulator-style linked data browser or a SemWebClient-style directed crawler that wants to minimize the number of requests required to gather some data. The client tries to figure out if it's worth following a single link from one document to another. It tries to figure out if it should do that one additional HTTP request or not. It will probably need to make many of those decisions during a typical session (in the worst case, one decision for each outgoing link of each visited document!), and each time the decision is simply “follow that link or not”. Thus, there are many small decisions, and the cost resulting from each decision is small. So the economies are very different. If I want to decide wether it's worth following a link, then obviously it would be pointless to make several additional HTTP requests in order to answer the question. Also, we probably need only one single high-level description for an entire dataset of a million documents. But we might need a link-level description *for each individual link* in the dataset, so we might need millions of those hints in a large dataset, thus it's much more important that each one be compact and concise and easy to deploy. That's why I'm uncomfortable with the “one-size-fits-all” attitude in this thread (not just from you Yves). VoiD is intended as a solution to dataset-level descriptions. A VoiD description should help clients to figure out if a dataset contains interesting information. VoiD is *not* intended as something that tells a data browser wether it's a good idea to follow a link from one RDF document to another. I believe that this case has different trade-offs and therefore should have a different solution. <snip> > It is easy enough to find examples that > involve more than just one property in the target document, e.g. "Find > here female scientists born in NYC", So, if I'm interested in female actors born in NYC, should I follow the link or not? If the annotation doesn't actually tell us what *not* to expect in the linked file, then it is not useful for making a decision. > "Find here the phone numbers of > the Tabulator's developers", "Find the start time of chords on that > audio signal", "Find here my latitude and longitude and the time at > which they were captured"... These are just examples of chaining -- if I'm interested in the phone numbers of Tabulator developers, then it's probably a good idea to follow the ex:developer link anyways, even if I don't know yet wether I'm going to find a phone number there. (Knowing that there are *no* phone numbers in their descriptions would be helpful, because then I don't need to bother following the links if all I'm interested in is phone numbers. But your example-based proposal doesn't tell me that there will be *no* phone numbers.) Best, Richard > > > >>>>> But perhaps the approach I proposed when we discussed the >>>>> void:example >>>>> property could work, in exactly the same way as in [1]. >>>>> >>>>> In the representation of :New_York, we could write something >>>>> like (in >>>>> N3): >>>>> >>>>> <http://example.org/persons_nyc.rdf> void:example { :al_pacino >>>>> :birthPlace :New_York }. >>>> >>>> N3 formulae cannot be expressed in RDFa or RDF/XML. How would you >>>> serialize this in practice? >>> >>> As in the post I refered to: you can point to >>> http://example.org/dataset-example.rdf where you put these example >>> triples. >> >> Then, to decide if I want to follow any of those links, I need to >> do an >> extra HTTP request to retrieve a single-triple document. I think we >> can do >> better than that. I also don't like the idea of having to potentially >> provide an extra example document *per link*. >> >>>> As far as I can remember, all the examples that people have given >>>> could >>>> be addressed with a simple property-based approach. Has anyone >>>> mentioned a >>>> use case that goes beyond looking for a single property? If not, >>>> then what >>>> does the additional complexity of this proposal buy us in practice? >>> >>> The example mentioned in my post uses more than one property, or the >>> exampl above. >> >> The example in your post was about describing datasets. I don't see >> how it >> makes sense in the context of splitting up the RDF description of an >> individual resource. >> > > As mentioned above, it is the same problem - providing clues to a > client. IMHO, expressing "This RDF document holds persons born in NYC > and their birth date" is a similar problem as expressing "This dataset > holds Creative Commons records". > > Cheers! > y > >>>> (I note that the situation here is different from what you >>>> described in >>>> [1]. There it was about annotations on a dataset level. Here it >>>> is about >>>> annotating links that occur within many or all individual >>>> documents of a >>>> dataset.) >>> >>> A RDF document is a dataset, and can be described as such :-) >> >> This isn't about what *can* be done, it's about what's *useful* to >> do. >> >> I think that you have an interesting approach to describing RDF >> datasets, >> but I don't think that it is a good solution to the problem of >> hinting at >> the content that is available behind an RDF link. >> >> Best, >> Richard >> >> >> >>>>> [1] >>>>> http://blog.dbtune.org/post/2008/06/12/Describing-the-content-of-RDF-datasets >>
Received on Tuesday, 6 January 2009 11:44:23 UTC