- From: David Oliver <david@doliver.co.uk>
- Date: Fri, 26 Feb 2010 09:16:57 +0100
- To: public-rdfa@w3.org
Hi Mike, > (I > should really have prioritised the website stuff over this, but hey!) Sometimes you've just got to go with the inner geek! > One question I have, however, relates to the video "RDFa Basics (short > video)". At 5:30 he seems to claim that changing her name will make it > both visible to humans and the computer simultaneously. This seems fine, > because there is little chance of someone changing it to be something > other than her name. However, at 7:30 he talks about the "knows", but in > this case the statement "Jane is friends with Mac" could easily changed > in three ways that would make it not true: either of the names could be > changed or "is friends with" could be "is unaware of". Or something. > Anyway, what I am getting at is that I don't think that his comment at > 5:30 necessarily holds beyond simple cases. My knowledge of RDFa is only very, very basic, so don't take anything I say as necessarily correct, but I think a lot (or all of it?) comes down to the trustworthiness of the URIs being used, as well as the page on which the data is published. The particular example given in "RDFa Basics (short video)" at http://blog.doliver.co.uk/2010/01/intro-resources-learning-linked-data-semantic-web-rdfa/ : <body xmlns:foaf="http://xmlns.com/foaf/0.1/"> <span about="#jane" typeof="foaf:Person" property="foaf:name"> Jane McJanerson </span> <span about="#mac" instanceof="foaf:Person" property="foaf:name"> Mac McMacerson </span> <span about="#jane" rel="foaf:knows" resource="#mac"> Jane is friends with (or at least is aware of) Mac. </span> </body> If we assume that the publisher of the above data is both honest and correct, the only way the statement "Jane knows Mac" could potentially be distorted is if http://xmlns.com/foaf/0.1/knows was taken to mean something other than "knows" by either machines or humans. (Both Jane and Mac themselves are described by the page itself, meaning we trust that info.) If http://xmlns.com/ was compromised, or the domain was left unregistered by mistake in an oversight of tragic proportions, I'm not sure what effect that would/could have on all the data that depended on it. I fear I don't have the insight required to ascertain to what extent that would be a problem, and how it could be solved, but I think it would equal a disaster in terms of data accuracy. If we do *not* assume that the publisher of the above data is necessarily both honest and correct, then we have to decide how we work out whether or not the data presented at the example page is accurate. This is a very interesting and important problem for the semantic web. I found this in http://tomheath.com/papers/Fbizer-heath-berners-lee-ijswis-linked-data.pdf : "Trust, Quality and Relevance A significant consideration for Linked Data applications is how to ensure the data most relevant or appropriate to the user's needs is identified and made available. For example, in scenarios where data quality and trustworthiness are paramount, how can this be determined heuristically, particularly where the data set may not have been encountered previously? An overview of different content-, context-, and rating-based techniques that can be used to heuristically assess the relevance, quality and trustworthiness of data is given in (Bizer & Cyganiak, 2009; Heath, 2008a). Equivalents to the PageRank algorithm will likely be important in determining coarse-grained measures of the popularity or significance of a particular data source, as a proxy for relevance or quality of the data, however such algorithms will need to be adapted to the linkage patterns that emerge on the Web of Data. From an interface perspective, the question of how to represent the provenance and trustworthiness of data drawn from many sources into an integrated view is a significant research challenge. (Berners-Lee, 1997) proposed that browser interfaces should be enhanced with an “Oh, yeah?” button to support the user in assessing the reliability of information encountered on the Web. Whenever a user encounteres a piece of information that they would like to verify, pressing such a button would produce an explanation of the trustworthiness of the displayed information. This goal has yet to be realised, however existing developments such as WIQA (Bizer & Cyganiak, 2009) and InferenceWeb (McGuinness & da Silva, 2003) can contribute to work in this area by providing explanations about information quality as well as inference processes that are used to derive query results." This sounds a bit vague to me. The pagerank algorithm may be good for seeing what pages are popular, but this doesn't necessarily mean they're accurate. Popularity being synonymous with trustworthiness has been the cause of some our worst fuck ups as a species! Hopefully decent methods of using linked data will be found. There are certainly big questions to be answered about exactly how the data will be found (tools are being developed for searching, etc.) and assessed, but the concept itself is amazing in that simple, profound way that makes me think this is going to be fundamental to the way we function. I'll forward a copy of this email to an RDFa email list - perhaps people with a decent level of understanding of the situation will be able to clarify things for us and correct any mistakes I've made. Regarding the issue of Jane's name change being visible to both humans and machines itself, that claim holds true. As soon as the text in that span is changed, both humans and machines will read the updated value whenever they next get that data fed to them. > But, anyway, I think it looks very interesting and look forward to > seeing how it develops! Just think, there are people being paid to dream > up these new ideas! Ha, yes! I think there are also a lot of people working on linked data who are not being paid for it. David
Received on Friday, 26 February 2010 08:17:27 UTC