W3C home > Mailing lists > Public > public-rdfa@w3.org > February 2010

Re: Linked data

From: David Oliver <david@doliver.co.uk>
Date: Fri, 26 Feb 2010 09:16:57 +0100
Message-ID: <4B878379.1030002@doliver.co.uk>
To: public-rdfa@w3.org
Hi Mike,

> (I
> should really have prioritised the website stuff over this, but hey!)

Sometimes you've just got to go with the inner geek!

> One question I have, however, relates to the video "RDFa Basics (short
> video)". At 5:30 he seems to claim that changing her name will make it
> both visible to humans and the computer simultaneously. This seems fine,
> because there is little chance of someone changing it to be something
> other than her name. However, at 7:30 he talks about the "knows", but in
> this case the statement "Jane is friends with Mac" could easily changed
> in three ways that would make it not true: either of the names could be
> changed or "is friends with" could be "is unaware of". Or something.
> Anyway, what I am getting at is that I don't think that his comment at
> 5:30 necessarily holds beyond simple cases.

My knowledge of RDFa is only very, very basic, so don't take anything I 
say as necessarily correct, but I think a lot (or all of it?) comes down 
to the trustworthiness of the URIs being used, as well as the page on 
which the data is published.

The particular example given in "RDFa Basics (short video)" at 
http://blog.doliver.co.uk/2010/01/intro-resources-learning-linked-data-semantic-web-rdfa/ 
:

<body xmlns:foaf="http://xmlns.com/foaf/0.1/">

<span about="#jane" typeof="foaf:Person" property="foaf:name">
Jane McJanerson
</span>

<span about="#mac" instanceof="foaf:Person" property="foaf:name">
Mac McMacerson
</span>

<span about="#jane" rel="foaf:knows" resource="#mac">
Jane is friends with (or at least is aware of) Mac.
</span>

</body>

If we assume that the publisher of the above data is both honest and 
correct, the only way the statement "Jane knows Mac" could potentially 
be distorted is if http://xmlns.com/foaf/0.1/knows was taken to mean 
something other than "knows" by either machines or humans. (Both Jane 
and Mac themselves are described by the page itself, meaning we trust 
that info.)

If http://xmlns.com/ was compromised, or the domain was left 
unregistered by mistake in an oversight of tragic proportions, I'm not 
sure what effect that would/could have on all the data that depended on 
it. I fear I don't have the insight required to ascertain to what extent 
that would be a problem, and how it could be solved, but I think it 
would equal a disaster in terms of data accuracy.

If we do *not* assume that the publisher of the above data is 
necessarily both honest and correct, then we have to decide how we work 
out whether or not the data presented at the example page is accurate. 
This is a very interesting and important problem for the semantic web. I 
found this in 
http://tomheath.com/papers/Fbizer-heath-berners-lee-ijswis-linked-data.pdf :

"Trust, Quality and Relevance

A significant consideration for Linked Data applications is how to 
ensure the data most relevant or appropriate to the user's needs is 
identified and made available. For example, in scenarios where data 
quality and trustworthiness are paramount, how can this be determined 
heuristically, particularly where the data set may not have been 
encountered previously?

An overview of different content-, context-, and rating-based techniques 
that can be used to heuristically assess the relevance, quality and 
trustworthiness of data is given in (Bizer & Cyganiak, 2009; Heath, 
2008a). Equivalents to the PageRank algorithm will likely be important 
in determining coarse-grained measures of the popularity or significance 
of a particular data source, as a proxy for relevance or quality of the 
data, however such algorithms will need to be adapted to the linkage 
patterns that emerge on the Web of Data.

 From an interface perspective, the question of how to represent the 
provenance and trustworthiness of data drawn from many sources into an 
integrated view is a significant research challenge. (Berners-Lee, 1997) 
proposed that browser interfaces should be enhanced with an “Oh, yeah?” 
button to support the user in assessing the reliability of
information encountered on the Web. Whenever a user encounteres a piece 
of information that they would like to verify, pressing such a button 
would produce an explanation of the trustworthiness of the displayed 
information. This goal has yet to be realised, however existing 
developments such as WIQA (Bizer & Cyganiak, 2009) and InferenceWeb 
(McGuinness & da Silva, 2003) can contribute to work in this area by 
providing explanations about information quality as well as inference 
processes that are used to derive query results."

This sounds a bit vague to me. The pagerank algorithm may be good for 
seeing what pages are popular, but this doesn't necessarily mean they're 
accurate. Popularity being synonymous with trustworthiness has been the 
cause of some our worst fuck ups as a species!

Hopefully decent methods of using linked data will be found. There are 
certainly big questions to be answered about exactly how the data will 
be found (tools are being developed for searching, etc.) and assessed, 
but the concept itself is amazing in that simple, profound way that 
makes me think this is going to be fundamental to the way we function.

I'll forward a copy of this email to an RDFa email list - perhaps people 
with a decent level of understanding of the situation will be able to 
clarify things for us and correct any mistakes I've made.

Regarding the issue of Jane's name change being visible to both humans 
and machines itself, that claim holds true. As soon as the text in that 
span is changed, both humans and machines will read the updated value 
whenever they next get that data fed to them.

> But, anyway, I think it looks very interesting and look forward to
> seeing how it develops! Just think, there are people being paid to dream
> up these new ideas!

Ha, yes! I think there are also a lot of people working on linked data 
who are not being paid for it.

David
Received on Friday, 26 February 2010 08:17:27 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 26 February 2010 08:17:28 GMT