Re: Granular dereferencing ( prop by prop ) using REST + LinkedData; Ideas? from Yves Raimond on 2009-01-07 (public-lod@w3.org from January 2009)

From: Yves Raimond <yves.raimond@gmail.com>
Date: Wed, 7 Jan 2009 12:25:18 +0000
To: "Richard Cyganiak" <richard@cyganiak.de>
Cc: "Aldo Bucchi" <aldo.bucchi@gmail.com>, "public-lod@w3.org" <public-lod@w3.org>
Message-ID: <82593ac00901070425r55101e55kb5e06f5d6a269963@mail.gmail.com>
Hello!

> On 5 Jan 2009, at 10:04, Yves Raimond wrote:
>>>
>>> Remember that Aldo was looking for something that allows clients to make
>>> smart decisions about when to follow a link out of an RDF document. He
>>> was
>>> not looking for something to describe the contents of RDF datasets on a
>>> high
>>> level.
>>
>> But my point is that it is the same problem: pointing clients in the
>> right direction, by expressing "this document holds more persons born
>> in NYC" or "this set of RDF triples holds Creative Commons records and
>> associated tags".
>
> I think it's not the same problem. Similar, but not the same.
>
> For a high-level description of the contents of a dataset, it's not so
> important if the description is compact and efficient and concise. Once a
> client has found a dataset of interest, it will probably make a larger
> number of queries to that dataset, or process a good deal of data coming
> from that dataset. In this case it's okay if the description itself is a few
> kilobytes in size or requires a couple of additional HTTP requests. The
> additional costs of downloading and parsing and processing the description
> are still small compared to the costs of running a large number of queries
> against the dataset.
>
> This is different in the link-following scenario that Aldo gave when he
> opened the thread, and that I tried to address with the proposal I gave
> earlier. Here, the typical scenario would be a Tabulator-style linked data
> browser or a SemWebClient-style directed crawler that wants to minimize the
> number of requests required to gather some data. The client tries to figure
> out if it's worth following a single link from one document to another. It
> tries to figure out if it should do that one additional HTTP request or not.
> It will probably need to make many of those decisions during a typical
> session (in the worst case, one decision for each outgoing link of each
> visited document!), and each time the decision is simply "follow that link
> or not". Thus, there are many small decisions, and the cost resulting from
> each decision is small.
>
> So the economies are very different. If I want to decide wether it's worth
> following a link, then obviously it would be pointless to make several
> additional HTTP requests  in order to answer the question. Also, we probably
> need only one single high-level description for an entire dataset of a
> million documents. But we might need a link-level description *for each
> individual link* in the dataset, so we might need millions of those hints in
> a large dataset, thus it's much more important that each one be compact and
> concise and easy to deploy.

I agree with you, but my concern is about limiting the possible range
of target documents. Your proposed solution encompasses a particular
type of use-cases ({:s :p ?o}  (or {?s :p :o}? how can I specify that?
- this relates to Bernhard concern, which none of our proposals fully
tackles) in the target document.

>
> That's why I'm uncomfortable with the "one-size-fits-all" attitude in this
> thread (not just from you Yves).
>
> VoiD is intended as a solution to dataset-level descriptions. A VoiD
> description should help clients to figure out if a dataset contains
> interesting information.
>
> VoiD is *not* intended as something that tells a data browser wether it's a
> good idea to follow a link from one RDF document to another. I believe that
> this case has different trade-offs and therefore should have a different
> solution.
>
> <snip>
>>
>> It is easy enough to find examples that
>> involve more than just one property in the target document, e.g. "Find
>> here female scientists born in NYC",
>
> So, if I'm interested in female actors born in NYC, should I follow the link
> or not? If the annotation doesn't actually tell us what *not* to expect in
> the linked file, then it is not useful for making a decision.

But how would your solution deal with that (or any solutions we
discussed so far)? I agree this is a problem, and that it would be
really valuable information, but apart from embedding SPARQL ASK
queries as literals + expected truth values in the description of the
original resource, I have no clue how to do that.

>
>> "Find here the phone numbers of
>> the Tabulator's developers", "Find the start time of chords on that
>> audio signal", "Find here my latitude and longitude and the time at
>> which they were captured"...
>
> These are just examples of chaining

That latitude/longitude example isn't, and seem to me like an
important one - linking to a document holding several values indexed
over time.

Cheers!
y

> -- if I'm interested in the phone
> numbers of Tabulator developers, then it's probably a good idea to follow
> the ex:developer link anyways, even if I don't know yet wether I'm going to
> find a phone number there.
>
> (Knowing that there are *no* phone numbers in their descriptions would be
> helpful, because then I don't need to bother following the links if all I'm
> interested in is phone numbers. But your example-based proposal doesn't tell
> me that there will be *no* phone numbers.)
>
> Best,
> Richard
>
>
>
>>
>>
>>
>>>>>> But perhaps the approach I proposed when we discussed the void:example
>>>>>> property could work, in exactly the same way as in [1].
>>>>>>
>>>>>> In the representation of :New_York, we could write something like (in
>>>>>> N3):
>>>>>>
>>>>>> <http://example.org/persons_nyc.rdf> void:example { :al_pacino
>>>>>> :birthPlace :New_York }.
>>>>>
>>>>> N3 formulae cannot be expressed in RDFa or RDF/XML. How would you
>>>>> serialize this in practice?
>>>>
>>>> As in the post I refered to: you can point to
>>>> http://example.org/dataset-example.rdf where you put these example
>>>> triples.
>>>
>>> Then, to decide if I want to follow any of those links, I need to do an
>>> extra HTTP request to retrieve a single-triple document. I think we can
>>> do
>>> better than that. I also don't like the idea of having to potentially
>>> provide an extra example document *per link*.
>>>
>>>>> As far as I can remember, all the examples that people have given could
>>>>> be addressed with a simple property-based approach. Has anyone
>>>>> mentioned a
>>>>> use case that goes beyond looking for a single property? If not, then
>>>>> what
>>>>> does the additional complexity of this proposal buy us in practice?
>>>>
>>>> The example mentioned in my post uses more than one property, or the
>>>> exampl above.
>>>
>>> The example in your post was about describing datasets. I don't see how
>>> it
>>> makes sense in the context of splitting up the RDF description of an
>>> individual resource.
>>>
>>
>> As mentioned above, it is the same problem - providing clues to a
>> client. IMHO, expressing "This RDF document holds persons born in NYC
>> and their birth date" is a similar problem as expressing "This dataset
>> holds Creative Commons records".
>>
>> Cheers!
>> y
>>
>>>>> (I note that the situation here is different from what you described in
>>>>> [1]. There it was about annotations on a dataset level. Here it is
>>>>> about
>>>>> annotating links that occur within many or all individual documents of
>>>>> a
>>>>> dataset.)
>>>>
>>>> A RDF document is a dataset, and can be described as such :-)
>>>
>>> This isn't about what *can* be done, it's about what's *useful* to do.
>>>
>>> I think that you have an interesting approach to describing RDF datasets,
>>> but I don't think that it is a good solution to the problem of hinting at
>>> the content that is available behind an RDF link.
>>>
>>> Best,
>>> Richard
>>>
>>>
>>>
>>>>>> [1]
>>>>>>
>>>>>> http://blog.dbtune.org/post/2008/06/12/Describing-the-content-of-RDF-datasets
>>>
>
Received on Wednesday, 7 January 2009 12:25:59 UTC