- From: Yves Raimond <yves.raimond@gmail.com>
- Date: Wed, 7 Jan 2009 12:25:18 +0000
- To: "Richard Cyganiak" <richard@cyganiak.de>
- Cc: "Aldo Bucchi" <aldo.bucchi@gmail.com>, "public-lod@w3.org" <public-lod@w3.org>
Hello! > On 5 Jan 2009, at 10:04, Yves Raimond wrote: >>> >>> Remember that Aldo was looking for something that allows clients to make >>> smart decisions about when to follow a link out of an RDF document. He >>> was >>> not looking for something to describe the contents of RDF datasets on a >>> high >>> level. >> >> But my point is that it is the same problem: pointing clients in the >> right direction, by expressing "this document holds more persons born >> in NYC" or "this set of RDF triples holds Creative Commons records and >> associated tags". > > I think it's not the same problem. Similar, but not the same. > > For a high-level description of the contents of a dataset, it's not so > important if the description is compact and efficient and concise. Once a > client has found a dataset of interest, it will probably make a larger > number of queries to that dataset, or process a good deal of data coming > from that dataset. In this case it's okay if the description itself is a few > kilobytes in size or requires a couple of additional HTTP requests. The > additional costs of downloading and parsing and processing the description > are still small compared to the costs of running a large number of queries > against the dataset. > > This is different in the link-following scenario that Aldo gave when he > opened the thread, and that I tried to address with the proposal I gave > earlier. Here, the typical scenario would be a Tabulator-style linked data > browser or a SemWebClient-style directed crawler that wants to minimize the > number of requests required to gather some data. The client tries to figure > out if it's worth following a single link from one document to another. It > tries to figure out if it should do that one additional HTTP request or not. > It will probably need to make many of those decisions during a typical > session (in the worst case, one decision for each outgoing link of each > visited document!), and each time the decision is simply "follow that link > or not". Thus, there are many small decisions, and the cost resulting from > each decision is small. > > So the economies are very different. If I want to decide wether it's worth > following a link, then obviously it would be pointless to make several > additional HTTP requests in order to answer the question. Also, we probably > need only one single high-level description for an entire dataset of a > million documents. But we might need a link-level description *for each > individual link* in the dataset, so we might need millions of those hints in > a large dataset, thus it's much more important that each one be compact and > concise and easy to deploy. I agree with you, but my concern is about limiting the possible range of target documents. Your proposed solution encompasses a particular type of use-cases ({:s :p ?o} (or {?s :p :o}? how can I specify that? - this relates to Bernhard concern, which none of our proposals fully tackles) in the target document. > > That's why I'm uncomfortable with the "one-size-fits-all" attitude in this > thread (not just from you Yves). > > VoiD is intended as a solution to dataset-level descriptions. A VoiD > description should help clients to figure out if a dataset contains > interesting information. > > VoiD is *not* intended as something that tells a data browser wether it's a > good idea to follow a link from one RDF document to another. I believe that > this case has different trade-offs and therefore should have a different > solution. > > <snip> >> >> It is easy enough to find examples that >> involve more than just one property in the target document, e.g. "Find >> here female scientists born in NYC", > > So, if I'm interested in female actors born in NYC, should I follow the link > or not? If the annotation doesn't actually tell us what *not* to expect in > the linked file, then it is not useful for making a decision. But how would your solution deal with that (or any solutions we discussed so far)? I agree this is a problem, and that it would be really valuable information, but apart from embedding SPARQL ASK queries as literals + expected truth values in the description of the original resource, I have no clue how to do that. > >> "Find here the phone numbers of >> the Tabulator's developers", "Find the start time of chords on that >> audio signal", "Find here my latitude and longitude and the time at >> which they were captured"... > > These are just examples of chaining That latitude/longitude example isn't, and seem to me like an important one - linking to a document holding several values indexed over time. Cheers! y > -- if I'm interested in the phone > numbers of Tabulator developers, then it's probably a good idea to follow > the ex:developer link anyways, even if I don't know yet wether I'm going to > find a phone number there. > > (Knowing that there are *no* phone numbers in their descriptions would be > helpful, because then I don't need to bother following the links if all I'm > interested in is phone numbers. But your example-based proposal doesn't tell > me that there will be *no* phone numbers.) > > Best, > Richard > > > >> >> >> >>>>>> But perhaps the approach I proposed when we discussed the void:example >>>>>> property could work, in exactly the same way as in [1]. >>>>>> >>>>>> In the representation of :New_York, we could write something like (in >>>>>> N3): >>>>>> >>>>>> <http://example.org/persons_nyc.rdf> void:example { :al_pacino >>>>>> :birthPlace :New_York }. >>>>> >>>>> N3 formulae cannot be expressed in RDFa or RDF/XML. How would you >>>>> serialize this in practice? >>>> >>>> As in the post I refered to: you can point to >>>> http://example.org/dataset-example.rdf where you put these example >>>> triples. >>> >>> Then, to decide if I want to follow any of those links, I need to do an >>> extra HTTP request to retrieve a single-triple document. I think we can >>> do >>> better than that. I also don't like the idea of having to potentially >>> provide an extra example document *per link*. >>> >>>>> As far as I can remember, all the examples that people have given could >>>>> be addressed with a simple property-based approach. Has anyone >>>>> mentioned a >>>>> use case that goes beyond looking for a single property? If not, then >>>>> what >>>>> does the additional complexity of this proposal buy us in practice? >>>> >>>> The example mentioned in my post uses more than one property, or the >>>> exampl above. >>> >>> The example in your post was about describing datasets. I don't see how >>> it >>> makes sense in the context of splitting up the RDF description of an >>> individual resource. >>> >> >> As mentioned above, it is the same problem - providing clues to a >> client. IMHO, expressing "This RDF document holds persons born in NYC >> and their birth date" is a similar problem as expressing "This dataset >> holds Creative Commons records". >> >> Cheers! >> y >> >>>>> (I note that the situation here is different from what you described in >>>>> [1]. There it was about annotations on a dataset level. Here it is >>>>> about >>>>> annotating links that occur within many or all individual documents of >>>>> a >>>>> dataset.) >>>> >>>> A RDF document is a dataset, and can be described as such :-) >>> >>> This isn't about what *can* be done, it's about what's *useful* to do. >>> >>> I think that you have an interesting approach to describing RDF datasets, >>> but I don't think that it is a good solution to the problem of hinting at >>> the content that is available behind an RDF link. >>> >>> Best, >>> Richard >>> >>> >>> >>>>>> [1] >>>>>> >>>>>> http://blog.dbtune.org/post/2008/06/12/Describing-the-content-of-RDF-datasets >>> >
Received on Wednesday, 7 January 2009 12:25:59 UTC