Re: Granular dereferencing ( prop by prop ) using REST + LinkedData; Ideas? from Richard Cyganiak on 2009-01-06 (public-lod@w3.org from January 2009)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Tue, 6 Jan 2009 11:36:15 +0000
To: Yves Raimond <yves.raimond@gmail.com>
Cc: "Aldo Bucchi" <aldo.bucchi@gmail.com>, "public-lod@w3.org" <public-lod@w3.org>
Message-Id: <9CA3C0B3-3881-4EC2-8C01-07CBA0DA3960@cyganiak.de>
Yves,

On 5 Jan 2009, at 10:04, Yves Raimond wrote:
>> Remember that Aldo was looking for something that allows clients to  
>> make
>> smart decisions about when to follow a link out of an RDF document.  
>> He was
>> not looking for something to describe the contents of RDF datasets  
>> on a high
>> level.
>
> But my point is that it is the same problem: pointing clients in the
> right direction, by expressing "this document holds more persons born
> in NYC" or "this set of RDF triples holds Creative Commons records and
> associated tags".

I think it's not the same problem. Similar, but not the same.

For a high-level description of the contents of a dataset, it's not so  
important if the description is compact and efficient and concise.  
Once a client has found a dataset of interest, it will probably make a  
larger number of queries to that dataset, or process a good deal of  
data coming from that dataset. In this case it's okay if the  
description itself is a few kilobytes in size or requires a couple of  
additional HTTP requests. The additional costs of downloading and  
parsing and processing the description are still small compared to the  
costs of running a large number of queries against the dataset.

This is different in the link-following scenario that Aldo gave when  
he opened the thread, and that I tried to address with the proposal I  
gave earlier. Here, the typical scenario would be a Tabulator-style  
linked data browser or a SemWebClient-style directed crawler that  
wants to minimize the number of requests required to gather some data.  
The client tries to figure out if it's worth following a single link  
from one document to another. It tries to figure out if it should do  
that one additional HTTP request or not. It will probably need to make  
many of those decisions during a typical session (in the worst case,  
one decision for each outgoing link of each visited document!), and  
each time the decision is simply “follow that link or not”. Thus,  
there are many small decisions, and the cost resulting from each  
decision is small.

So the economies are very different. If I want to decide wether it's  
worth following a link, then obviously it would be pointless to make  
several additional HTTP requests  in order to answer the question.  
Also, we probably need only one single high-level description for an  
entire dataset of a million documents. But we might need a link-level  
description *for each individual link* in the dataset, so we might  
need millions of those hints in a large dataset, thus it's much more  
important that each one be compact and concise and easy to deploy.

That's why I'm uncomfortable with the “one-size-fits-all” attitude in  
this thread (not just from you Yves).

VoiD is intended as a solution to dataset-level descriptions. A VoiD  
description should help clients to figure out if a dataset contains  
interesting information.

VoiD is *not* intended as something that tells a data browser wether  
it's a good idea to follow a link from one RDF document to another. I  
believe that this case has different trade-offs and therefore should  
have a different solution.

<snip>
> It is easy enough to find examples that
> involve more than just one property in the target document, e.g. "Find
> here female scientists born in NYC",

So, if I'm interested in female actors born in NYC, should I follow  
the link or not? If the annotation doesn't actually tell us what *not*  
to expect in the linked file, then it is not useful for making a  
decision.

> "Find here the phone numbers of
> the Tabulator's developers", "Find the start time of chords on that
> audio signal", "Find here my latitude and longitude and the time at
> which they were captured"...

These are just examples of chaining -- if I'm interested in the phone  
numbers of Tabulator developers, then it's probably a good idea to  
follow the ex:developer link anyways, even if I don't know yet wether  
I'm going to find a phone number there.

(Knowing that there are *no* phone numbers in their descriptions would  
be helpful, because then I don't need to bother following the links if  
all I'm interested in is phone numbers. But your example-based  
proposal doesn't tell me that there will be *no* phone numbers.)

Best,
Richard



>
>
>
>>>>> But perhaps the approach I proposed when we discussed the  
>>>>> void:example
>>>>> property could work, in exactly the same way as in [1].
>>>>>
>>>>> In the representation of :New_York, we could write something  
>>>>> like (in
>>>>> N3):
>>>>>
>>>>> <http://example.org/persons_nyc.rdf> void:example { :al_pacino
>>>>> :birthPlace :New_York }.
>>>>
>>>> N3 formulae cannot be expressed in RDFa or RDF/XML. How would you
>>>> serialize this in practice?
>>>
>>> As in the post I refered to: you can point to
>>> http://example.org/dataset-example.rdf where you put these example  
>>> triples.
>>
>> Then, to decide if I want to follow any of those links, I need to  
>> do an
>> extra HTTP request to retrieve a single-triple document. I think we  
>> can do
>> better than that. I also don't like the idea of having to potentially
>> provide an extra example document *per link*.
>>
>>>> As far as I can remember, all the examples that people have given  
>>>> could
>>>> be addressed with a simple property-based approach. Has anyone  
>>>> mentioned a
>>>> use case that goes beyond looking for a single property? If not,  
>>>> then what
>>>> does the additional complexity of this proposal buy us in practice?
>>>
>>> The example mentioned in my post uses more than one property, or the
>>> exampl above.
>>
>> The example in your post was about describing datasets. I don't see  
>> how it
>> makes sense in the context of splitting up the RDF description of an
>> individual resource.
>>
>
> As mentioned above, it is the same problem - providing clues to a
> client. IMHO, expressing "This RDF document holds persons born in NYC
> and their birth date" is a similar problem as expressing "This dataset
> holds Creative Commons records".
>
> Cheers!
> y
>
>>>> (I note that the situation here is different from what you  
>>>> described in
>>>> [1]. There it was about annotations on a dataset level. Here it  
>>>> is about
>>>> annotating links that occur within many or all individual  
>>>> documents of a
>>>> dataset.)
>>>
>>> A RDF document is a dataset, and can be described as such :-)
>>
>> This isn't about what *can* be done, it's about what's *useful* to  
>> do.
>>
>> I think that you have an interesting approach to describing RDF  
>> datasets,
>> but I don't think that it is a good solution to the problem of  
>> hinting at
>> the content that is available behind an RDF link.
>>
>> Best,
>> Richard
>>
>>
>>
>>>>> [1]
>>>>> http://blog.dbtune.org/post/2008/06/12/Describing-the-content-of-RDF-datasets
>>
Received on Tuesday, 6 January 2009 11:44:23 UTC