- From: <john.nj.davies@bt.com>
- Date: Fri, 22 Oct 2010 11:48:20 +0100
- To: <chris@bizer.de>, <denny.vrandecic@kit.edu>, <martin.hepp@ebusiness-unibw.org>
- CC: <kidehen@openlinksw.com>, <public-lod@w3.org>, <e.motta@open.ac.uk>, <tsteiner@google.com>, <semantic-web@w3.org>, <anja@anjeve.de>, <semanticweb@yahoogroups.com>, <giovanni.tummarello@deri.org>, <m.daquin@open.ac.uk>
- Message-ID: <83F74BDEDE0D3846946275659126C8E23287317340@EMV65-UKRD.domain1.systemhost.net>
This article from the NYT may provide an amusing distraction from the current discussion: I thought the powerpoint slide shown looked eerily familiar ;-) http://www.nytimes.com/2010/04/27/world/27powerpoint.html?_r=1 John PS excellent post Denny IMHO Dr John Davies Chief Researcher Future Business Applications & Services BT Innovate & Design __________________________________________________ Tel: +44 1473 609583 Email: john.nj.davies@bt.com This email contains BT information, which may be privileged or confidential. It's meant only for the individual(s) or entity named above. If you're not the intended recipient, note that disclosing, copying, distributing or using this information is prohibited. If you've received this email in error, please let me know immediately on the email address above. Thank you. We monitor our email system, and may record your emails. British Telecommunications plc Registered office: 81 Newgate Street London EC1A 7AJ Registered in England no: 1800000 From: semanticweb@yahoogroups.com [mailto:semanticweb@yahoogroups.com] On Behalf Of Chris Bizer Sent: 22 October 2010 09:36 To: 'Denny Vrandecic'; 'Martin Hepp' Cc: 'Kingsley Idehen'; 'public-lod'; 'Enrico Motta'; 'Thomas Steiner'; 'Semantic Web'; 'Anja Jentzsch'; 'semanticweb'; 'Giovanni Tummarello'; 'Mathieu d'Aquin' Subject: [semanticweb] AW: AW: ANN: LOD Cloud - Statistics and compliance with best practices Hi Denny, thank you for your smart and insightful comments. > I also find it a shame, that this thread has been hijacked, especially since the > original topic was so interesting. The original email by Anja was not about the > LOD cloud, but rather about -- as the title of the thread still suggests -- the > compliance of LOD with some best practices. Instead of the question "is X in > the diagram", I would much rather see a discussion on "are the selected > quality criteria good criteria? why are some of them so little followed? how > can we improve the situation?" Absolutely. Opening up the discussion on these topics is exactly the reason why we compiled the statistics. In order to guide the discussion back to this topic, maybe it is useful to repost the original link: http://www4.wiwiss.fu-berlin.de/lodcloud/state/ A quick initial comment concerning the term "quality criteria". I think it is essential to distinguish between: 1. The quality of the way data is published, meaning to which extend the publishers comply with best practices (a possible set of best practices is listed in the document) 2. The quality of the data itself. I think Enrico's comment was going into this direction. The Web of documents is an open system built on people agreeing on standards and best practices. Open system means in this context that everybody can publish content and that there are no restrictions on the quality of the content. This is in my opinion one of the central facts that made the Web successful. The same is true for the Web of Data. There obviously cannot be any restrictions on what people can/should publish (including, different opinions on a topic, but also including pure SPAM). As on the classic Web, it is a job of the information/data consumer to figure out which data it wants to believe and use (definition of information quality = usefulness of information, which is a subjective thing). Thus it also does not make sense to discuss the "objective quality" of the data that should be included into the LOD cloud (objective quality just does not exist) and it makes much more sense to discuss the mayor issues that we are still having in regard to the compliance with publishing best practices. > Anja has pointed to a wealth of openly > available numbers (no pun intended), that have not been discussed at all. For > example, only 7.5% of the data source provide a mapping of "proprietary > vocabulary terms" to "other vocabulary terms". For anyone building > applications to work with LOD, this is a real problem. Yes, this is also the figure that scared me most. > but in order to figure out what really needs to be done, and > how the criteria for good data on the Semantic Web need to look like, we > need to get back to Anja's original questions. I think that is a question we > may try to tackle in Shanghai in some form, I at least would find that an > interesting topic. Same with me. Shanghai was also the reason for the timing of the post. Cheers, Chris > -----Ursprüngliche Nachricht----- > Von: semantic-web-request@w3.org<mailto:semantic-web-request%40w3.org> [mailto:semantic-web- > request@w3.org<mailto:request%40w3.org>] Im Auftrag von Denny Vrandecic > Gesendet: Freitag, 22. Oktober 2010 08:44 > An: Martin Hepp > Cc: Kingsley Idehen; public-lod; Enrico Motta; Chris Bizer; Thomas Steiner; > Semantic Web; Anja Jentzsch; semanticweb; Giovanni Tummarello; Mathieu > d'Aquin > Betreff: Re: AW: ANN: LOD Cloud - Statistics and compliance with best > practices > > I usually dislike to comment on such discussions, as I don't find them > particularly productive, but 1) since the number of people pointing me to > this thread is growing, 2) it contains some wrong statements, and 3) I feel > that this thread has been hijacked from a topic that I consider productive and > important, I hope you won't mind me giving a comment. I wanted to keep it > brief, but I failed. > > Let's start with the wrong statements: > > First, although I take responsibility as a co-creator for Linked Open Numbers, > I surely cannot take full credit for it. The dataset was a shared effort by a > number of people in Karlsruhe over a few days, and thus calling the whole > thing "Denny's numbers dataset" is simply wrong due to the effort spent by > my colleagues on it. It is fine to call it "Karlsruhe's numbers dataset" or simply > Linked Open Numbers, but providing me with the sole attribution is too > much of an honor. > > Second, although it is claimed that Linked Open Numbers are "by design and > known to everybody in the core community, not data but noise", being one > of the co-designers of the system I have to disagree. It is "noise by design". > One of my motivations for LON was to raise a few points for discussion, and > at the same time provide with a dataset fully adhering to Linked Open Data > principles. We were obviously able to get the first goal right, and we didn't do > too bad on the second, even though we got an interesting list of bugs by > Richard Cyganiak, which, pitily, we still did not fix. I am very sorry for that. > But, to make the point very clear again, this dataset was designed to follow > LOD principles as good as possible, to be correct, and to have an > implementation that is so simple that we are usually up, so anyone can use > LON as a testing ground. Due to a number of mails and personal > communications I know that LON has been used in that sense, and some > developers even found it useful for other features, like our provision of > number names in several languages. So, what is called "noise by design" > here, is actually an actively used dataset, that managed to raise, as we have > hoped, discussions about the point of counting triples, was a factor in the > discussion about literals as subjects, made us rethink the notion of > "semantics" and computational properties of RDF entities in a different way, > and is involved in the discussion about quality of LOD. With respect to that, in > my opinion, LON has achieved and exceeded its expectations, but I > understand anyone who disagrees. Besides that, it was, and is, huge fun. > > Now to some topics of the discussion: > > On the issue of the LOD cloud diagram. I want to express my gratitude to all > the people involved, for the effort they voluntarily put in its development > and maintenance. I find it especially great, that it is becoming increasingly > transparent how the diagram is created and how the datasets are selected. > Chris has refered to a set of conditions that are expected for inclusion, and > before the creation of the newest iteration there was an explicit call on this > mailing list to gather more information. I can only echo the sentiment that if > someone is unhappy with that diagram, they are free to create their own and > put it online. The data is available, the SVG is available and editable, and they > use licenses that allow the modification and republishing. > > Enrico is right that a system like Watson (or Sindice), that automatically > gathers datasets from the Web instead of using a manually submitted and > managed catalog, will probably turn out to be the better approach. Watson > used to have an overview with statistics on its current content, and I really > loved that overview, but this feature has been disabled since a few months. > If it was available, especially in any graphical format that can be easily reused > in slides -- for example, graphs on the growth of number of triples, datasets, > etc., graphs on the change of cohesion, vocabulary reuse, etc. over time, > within the Watson corpus -- I have no doubts that such graphs and data > would be widely reused, and would in many instances replace the current > usage of the cloud diagram. (I am furthermore curious about Enrico's > statement that the Semantic Web =/= Linked Open Data and wonder about > what he means here, but that is a completely different thread). > > Finally, to what I consider most important in this thread: > > I also find it a shame, that this thread has been hijacked, especially since the > original topic was so interesting. The original email by Anja was not about the > LOD cloud, but rather about -- as the title of the thread still suggests -- the > compliance of LOD with some best practices. Instead of the question "is X in > the diagram", I would much rather see a discussion on "are the selected > quality criteria good criteria? why are some of them so little followed? how > can we improve the situation?" Anja has pointed to a wealth of openly > available numbers (no pun intended), that have not been discussed at all. For > example, only 7.5% of the data source provide a mapping of "proprietary > vocabulary terms" to "other vocabulary terms". For anyone building > applications to work with LOD, this is a real problem. > > Whenever I was working on actual applications using LOD, I got disillusioned. > The current state of LOD is simply insufficient to sustain serious application > development on top of it. Current best practices (like follow-your-nose) are > theoretically sufficient, but not fully practical. To just give a few examples: > * imagine you get an RDF file with some 100 triples, including some 120 > vocabulary terms. In order to actually display those, you need the label for > every single of these terms, preferably in the user's language. But most RDF > files do not provide such labels for terms they merely reference. In order to > actually display them, we need to resolve all these 120 terms, i.e. we need to > make more than a hundred calls to the Web -- and we are only talking about > the display of a single file! In Semantic MediaWiki we had, from the > beginning, made sure that all referenced terms are accompanied with some > minimum definition, providing labels, types, etc. which enables tools to at > least create a display quickly and then gather further data, but that practice > was not adopted. Nevermind the fact that language labels are basically not > used for multi-linguality (check out Chapter 4 of my thesis for the data, it's > devastating). > * URIs. Perfectly valid URIs like, e.g. used in Geonames, like > http://sws.geonames.org/3202326/ suddenly cause trouble, because their > serialization as a QName is, well, problematic. > * missing definitions. E.g. DBpedia has the properties > http://dbpedia.org/ontology/capital and > http://dbpedia.org/property/capital -- used in the very same file about the > same country. Resolving them will not help you at all to figure out how they > relate to each other. As a human I may make an educated guess, but for a > machine agent? And in this case we are talking about the *same* data > provider, nevermind cross-data-provider mapping. > > I could go on for a while -- and these are just examples *on top* of the > problems that Anja raises in her original post, and I am sure that everyone > who has actually used LOD from the wild has stumbled upon even more such > problems. She is raising here a very important point, for the practical > application of the data. But instead of discussing these issues that actually > matter, we talk about bubble graphs, that are created and maintained > voluntarily, and why a dataset is included or not, even though the criteria > have been made transparent and explicit. All these issues seriously hamper > the uptake of usage of LOD and lead to the result that it is so much easier to > use dedicated, proprietary APIs in many cases. > > At one point it was stated that Chris' criteria were random and hard to fulfill > in certain cases. If you'd ask me, I would suggest much more draconian > criteria, in order to make data reuse as simple as we all envision. I really enjoy > the work of the pedantic web group with respect to this, providing validators > and guidelines, but in order to figure out what really needs to be done, and > how the criteria for good data on the Semantic Web need to look like, we > need to get back to Anja's original questions. I think that is a question we > may try to tackle in Shanghai in some form, I at least would find that an > interesting topic. > > Sorry again for the length of this rant, and I hope I have offended everyone > equally, I really tried not to single anyone out, > Denny > > P.S.: Finally, a major reason why I think I shouldn't have commented on this > thread is because it involves something I co-created, and thus I am afraid it > impossible to stay unbiased. I consider constant advertising of your own > ideas tiring, impolite, and bound to lead to unproductive discussions due to > emotional investment. If the work you do is good enough, you will find > champions for it. If not, improve it or do something else. > > > > On Oct 21, 2010, at 20:56, Martin Hepp wrote: > > > Hi all: > > > > I think that Enrico really made two very important points: > > > > 1. The LOD bubbles diagram has very high visibility inside and outside of the > community (up to the point that broad audiences believe the diagram would > define relevance or quality). > > > > 2. Its creators have a special responsibility (in particular as scientists) to > maintain the diagram in a way that enhances insight and understanding, > rather than conveying false facts and confusing people. > > > > So Kingsley's argument that anybody could provide a better diagram does > not really hold. It will harm the community as a whole, sooner or later, if the > diagram misses the point, simply based on the popularity of this diagram. > > > > And to be frank, despite other design decisions, it is really ridiculous that > Chris justifies the inclusion of Denny's numbers dataset as valid Linked Data, > because that dataset is, by design and known to everybody in the core > community, not data but noise. > > > > This is the "linked data landfill" mindset that I have kept on complaining > about. You make it very easy for others to discard the idea of linked data as a > whole. > > > > Best > > > > Martin > > > > __._,_.___ Reply to sender<mailto:chris@bizer.de?subject=AW:%20AW:%20ANN:%20LOD%20Cloud%20-%20Statistics%20and%20compliance%20with%20best%20practices> | Reply to group<mailto:semanticweb@yahoogroups.com?subject=AW:%20AW:%20ANN:%20LOD%20Cloud%20-%20Statistics%20and%20compliance%20with%20best%20practices> | Reply via web post<http://groups.yahoo.com/group/semanticweb/post;_ylc=X3oDMTJwZWlpdHE4BF9TAzk3MzU5NzE0BGdycElkAzI3MjYyMjgEZ3Jwc3BJZAMxNzA1MDE2MDYxBG1zZ0lkAzUxMTIEc2VjA2Z0cgRzbGsDcnBseQRzdGltZQMxMjg3NzM2Mzk1?act=reply&messageNum=5112> | Start a New Topic<http://groups.yahoo.com/group/semanticweb/post;_ylc=X3oDMTJlOGNyOHNsBF9TAzk3MzU5NzE0BGdycElkAzI3MjYyMjgEZ3Jwc3BJZAMxNzA1MDE2MDYxBHNlYwNmdHIEc2xrA250cGMEc3RpbWUDMTI4NzczNjM5NQ--> Messages in this topic<http://groups.yahoo.com/group/semanticweb/message/5108;_ylc=X3oDMTM0NDhkNWJlBF9TAzk3MzU5NzE0BGdycElkAzI3MjYyMjgEZ3Jwc3BJZAMxNzA1MDE2MDYxBG1zZ0lkAzUxMTIEc2VjA2Z0cgRzbGsDdnRwYwRzdGltZQMxMjg3NzM2Mzk1BHRwY0lkAzUxMDg-> (3) Recent Activity: ˇ New Members<http://groups.yahoo.com/group/semanticweb/members;_ylc=X3oDMTJmajBpdmIwBF9TAzk3MzU5NzE0BGdycElkAzI3MjYyMjgEZ3Jwc3BJZAMxNzA1MDE2MDYxBHNlYwN2dGwEc2xrA3ZtYnJzBHN0aW1lAzEyODc3MzYzOTU-?o=6> 2 Visit Your Group<http://groups.yahoo.com/group/semanticweb;_ylc=X3oDMTJlZzYyMmk5BF9TAzk3MzU5NzE0BGdycElkAzI3MjYyMjgEZ3Jwc3BJZAMxNzA1MDE2MDYxBHNlYwN2dGwEc2xrA3ZnaHAEc3RpbWUDMTI4NzczNjM5NQ--> MARKETPLACE Get great advice about dogs and cats. Visit the Dog & Cat Answers Center.<http://us.ard.yahoo.com/SIG=15opdcjro/M=493064.13814537.14041040.10835568/D=groups/S=1705016061:MKP1/Y=YAHOO/EXP=1287743596/L=0431752a-ddb7-11df-bb58-cbc5498e5478/B=qzBMIUwNO6E-/J=1287736396359462/K=cyxj9KyfSsHtr1vuvduVPg/A=6078812/R=0/SIG=114ae4ln1/*http:/dogandcatanswers.yahoo.com/> ________________________________ Hobbies & Activities Zone: Find others who share your passions! Explore new interests.<http://us.ard.yahoo.com/SIG=15oob6ufi/M=493064.14012770.13963757.13298430/D=groups/S=1705016061:MKP1/Y=YAHOO/EXP=1287743596/L=0431752a-ddb7-11df-bb58-cbc5498e5478/B=rDBMIUwNO6E-/J=1287736396359462/K=cyxj9KyfSsHtr1vuvduVPg/A=6015306/R=0/SIG=11vlkvigg/*http:/advision.webevents.yahoo.com/hobbiesandactivitieszone/> ________________________________ Stay on top of your group activity without leaving the page you're on - Get the Yahoo! Toolbar now.<http://us.ard.yahoo.com/SIG=15o8tbgiv/M=493064.13983314.14041046.13298430/D=groups/S=1705016061:MKP1/Y=YAHOO/EXP=1287743596/L=0431752a-ddb7-11df-bb58-cbc5498e5478/B=qjBMIUwNO6E-/J=1287736396359462/K=cyxj9KyfSsHtr1vuvduVPg/A=6060255/R=0/SIG=1194m4keh/*http:/us.toolbar.yahoo.com/?.cpdl=grpj> [http://l.yimg.com/a/i/us/yg/logo/us.gif]<http://groups.yahoo.com/;_ylc=X3oDMTJkMWVydXFqBF9TAzk3MzU5NzE0BGdycElkAzI3MjYyMjgEZ3Jwc3BJZAMxNzA1MDE2MDYxBHNlYwNmdHIEc2xrA2dmcARzdGltZQMxMjg3NzM2Mzk1> Switch to: Text-Only<mailto:semanticweb-traditional@yahoogroups.com?subject=Change%20Delivery%20Format:%20Traditional>, Daily Digest<mailto:semanticweb-digest@yahoogroups.com?subject=Email%20Delivery:%20Digest> * Unsubscribe<mailto:semanticweb-unsubscribe@yahoogroups.com?subject=Unsubscribe> * Terms of Use<http://docs.yahoo.com/info/terms/> . __,_._,___
Received on Friday, 22 October 2010 10:49:02 UTC