Re: Announce: Linked Data Patterns book

On 7 April 2010 09:31, Ian Davis <lists@iandavis.com> wrote:
> On Wed, Apr 7, 2010 at 12:14 AM, Peter Ansell <ansell.peter@gmail.com> wrote:
>> In the Annotation publishing pattern section there is the following statement:
>>
>> "It is entirely consistent with the Linked Data principles to make
>> statements about third-party resources."
>>
>> I don't believe that to be true, simply because, unless users are
>> always using a quad model (RDF+NamedGraphs), they have no way of
>> retrieving that information just by resolving the foreign identifier
>> which is the subject of the RDF triple. They would have to stumble on
>> the information by knowing to retrieve the object URI, which isn't
>> clear from the pattern description so far. In a triples model it is
>> harmful to have this pattern as Linked Data, as the statements are not
>> discoverable just knowing the URI.
>>
>
> Can you elaborate more on the harm you suggest here?
>
> I don't think we need to limit the data published about a subject to
> that subset retrievable at its URI.  (I wrote a little about this last
> year at http://blog.iandavis.com/2009/10/more-than-the-minimum )
>
> I also don't believe this requires the use of quads. I think it can be
> interlinked using rdfs:seeAlso.

For the interlinks to be useful, they would have to be incorporated by
the original producer, ie, the one that publishers the basic Linked
Data statements, and that doesn't need to be the case if people create
their own URI's to match the other URI's. There is no difference in
their discoverability, and there is the risk that if one uses the
original URI, that the original producer will snub them, so the
statement will not be discovered. I don't really see how it is
natively Linked Data if there is a case where there is the likelihood
a consistent circle of resolution. To be consistent a user would have
to be able to get back to the statement just using the Linked Data
URI, imo, although the actual specification doesn't indicate this.

Technically the book is possibly correct, as "some" statements come
from resolution of the URI if it is Linked Data compatible in its own
right. My issue is that the statement that a user had originally is
not able to be found if the original publisher chose not to include
the equivalence or see also statement, and I don't think we should be
promoting situations where this information loss will occur at the
whim of some other authority without any way of rediscovering the
statement.

As one example, there was a recent discussion that suggested DBpedia
should not include any links to other datasets if the link is not
derived from Wikipedia. What is to stop other projects that reach a
critical mass on their own from doing the same and still being
referred to as Linked Data due to mass of inbound links, even if there
are not many outbound links comparatively.

In my opinion, we should be recommending that people create new URI's
so that others have enough information to be able to choose whether to
go there based on the DNS authority or some other method, even if
there are new URI's created. Basically, I am pro, having multiple
URI's for things, based on who is publishing the information so that
the information can be discoverable without some external authority
authorising the statement to be discoverable. In the future, if and/or
when Linked Data is subsumed by some sort of Federated SPARQL
methodology as a best practice, it may be okay to reuse URI's, as
there may be some other mechanism of locating the statement using a
metadata repository. Until then, Linked Data URI's are the only widely
used method, and the resolution of the statement is necessarily
controlled by whoever controls the DNS authority.

Cheers,

Peter

Received on Wednesday, 7 April 2010 00:09:04 UTC