- From: Kingsley Idehen <kidehen@openlinksw.com>
- Date: Fri, 22 Oct 2010 08:19:24 -0400
- To: john.nj.davies@bt.com
- CC: public-lod@w3.org, semantic-web@w3.org, semanticweb@yahoogroups.com
- Message-ID: <4CC1814C.6090608@openlinksw.com>
On 10/22/10 6:48 AM, john.nj.davies@bt.com wrote: > > This article from the NYT may provide an amusing distraction from the > current discussion: I thought the powerpoint slide shown looked eerily > familiar ;-) > > http://www.nytimes.com/2010/04/27/world/27powerpoint.html?_r=1 > LOL! How poignant, really :-) Kingsley > > John > > PS excellent post Denny IMHO > > *Dr John Davies* > Chief Researcher > Future Business Applications & Services > BT Innovate & Design > __________________________________________________ > Tel: +44 1473 609583 > Email: john.nj.davies@bt.com > > This email contains BT information, which may be privileged or > confidential. > It's meant only for the individual(s) or entity named above. If you're > not the intended > recipient, note that disclosing, copying, distributing or using this > information > is prohibited. If you've received this email in error, please let me > know immediately > on the email address above. Thank you. > We monitor our email system, and may record your emails. > British Telecommunications plc > Registered office: 81 Newgate Street London EC1A 7AJ > Registered in England no: 1800000 > > *From:*semanticweb@yahoogroups.com > [mailto:semanticweb@yahoogroups.com] *On Behalf Of *Chris Bizer > *Sent:* 22 October 2010 09:36 > *To:* 'Denny Vrandecic'; 'Martin Hepp' > *Cc:* 'Kingsley Idehen'; 'public-lod'; 'Enrico Motta'; 'Thomas > Steiner'; 'Semantic Web'; 'Anja Jentzsch'; 'semanticweb'; 'Giovanni > Tummarello'; 'Mathieu d'Aquin' > *Subject:* [semanticweb] AW: AW: ANN: LOD Cloud - Statistics and > compliance with best practices > > Hi Denny, > > thank you for your smart and insightful comments. > > > I also find it a shame, that this thread has been hijacked, especially > since the > > original topic was so interesting. The original email by Anja was not > about the > > LOD cloud, but rather about -- as the title of the thread still suggests > -- the > > compliance of LOD with some best practices. Instead of the question > "is X > in > > the diagram", I would much rather see a discussion on "are the selected > > quality criteria good criteria? why are some of them so little followed? > how > > can we improve the situation?" > > Absolutely. Opening up the discussion on these topics is exactly the > reason > why we compiled the statistics. > > In order to guide the discussion back to this topic, maybe it is useful to > repost the original link: > > http://www4.wiwiss.fu-berlin.de/lodcloud/state/ > > A quick initial comment concerning the term "quality criteria". I think it > is essential to distinguish between: > > 1. The quality of the way data is published, meaning to which extend the > publishers comply with best practices (a possible set of best practices is > listed in the document) > 2. The quality of the data itself. I think Enrico's comment was going into > this direction. > > The Web of documents is an open system built on people agreeing on > standards > and best practices. > Open system means in this context that everybody can publish content and > that there are no restrictions on the quality of the content. > This is in my opinion one of the central facts that made the Web > successful. > > The same is true for the Web of Data. There obviously cannot be any > restrictions on what people can/should publish (including, different > opinions on a topic, but also including pure SPAM). As on the classic Web, > it is a job of the information/data consumer to figure out which data it > wants to believe and use (definition of information quality = > usefulness of > information, which is a subjective thing). > > Thus it also does not make sense to discuss the "objective quality" of the > data that should be included into the LOD cloud (objective quality > just does > not exist) and it makes much more sense to discuss the mayor issues > that we > are still having in regard to the compliance with publishing best > practices. > > > Anja has pointed to a wealth of openly > > available numbers (no pun intended), that have not been discussed at > all. > For > > example, only 7.5% of the data source provide a mapping of "proprietary > > vocabulary terms" to "other vocabulary terms". For anyone building > > applications to work with LOD, this is a real problem. > > Yes, this is also the figure that scared me most. > > > but in order to figure out what really needs to be done, and > > how the criteria for good data on the Semantic Web need to look like, we > > need to get back to Anja's original questions. I think that is a > question > we > > may try to tackle in Shanghai in some form, I at least would find > that an > > interesting topic. > > Same with me. > Shanghai was also the reason for the timing of the post. > > Cheers, > > Chris > > > -----Ursprüngliche Nachricht----- > > Von: semantic-web-request@w3.org > <mailto:semantic-web-request%40w3.org> [mailto:semantic-web- > > request@w3.org <mailto:request%40w3.org>] Im Auftrag von Denny Vrandecic > > Gesendet: Freitag, 22. Oktober 2010 08:44 > > An: Martin Hepp > > Cc: Kingsley Idehen; public-lod; Enrico Motta; Chris Bizer; Thomas > Steiner; > > Semantic Web; Anja Jentzsch; semanticweb; Giovanni Tummarello; Mathieu > > d'Aquin > > Betreff: Re: AW: ANN: LOD Cloud - Statistics and compliance with best > > practices > > > > I usually dislike to comment on such discussions, as I don't find them > > particularly productive, but 1) since the number of people pointing > me to > > this thread is growing, 2) it contains some wrong statements, and 3) I > feel > > that this thread has been hijacked from a topic that I consider > productive > and > > important, I hope you won't mind me giving a comment. I wanted to > keep it > > brief, but I failed. > > > > Let's start with the wrong statements: > > > > First, although I take responsibility as a co-creator for Linked Open > Numbers, > > I surely cannot take full credit for it. The dataset was a shared effort > by a > > number of people in Karlsruhe over a few days, and thus calling the > whole > > thing "Denny's numbers dataset" is simply wrong due to the effort > spent by > > my colleagues on it. It is fine to call it "Karlsruhe's numbers dataset" > or simply > > Linked Open Numbers, but providing me with the sole attribution is too > > much of an honor. > > > > Second, although it is claimed that Linked Open Numbers are "by > design and > > known to everybody in the core community, not data but noise", being one > > of the co-designers of the system I have to disagree. It is "noise by > design". > > One of my motivations for LON was to raise a few points for discussion, > and > > at the same time provide with a dataset fully adhering to Linked > Open Data > > principles. We were obviously able to get the first goal right, and we > didn't do > > too bad on the second, even though we got an interesting list of bugs by > > Richard Cyganiak, which, pitily, we still did not fix. I am very > sorry for > that. > > But, to make the point very clear again, this dataset was designed to > follow > > LOD principles as good as possible, to be correct, and to have an > > implementation that is so simple that we are usually up, so anyone > can use > > LON as a testing ground. Due to a number of mails and personal > > communications I know that LON has been used in that sense, and some > > developers even found it useful for other features, like our > provision of > > number names in several languages. So, what is called "noise by design" > > here, is actually an actively used dataset, that managed to raise, as we > have > > hoped, discussions about the point of counting triples, was a factor in > the > > discussion about literals as subjects, made us rethink the notion of > > "semantics" and computational properties of RDF entities in a different > way, > > and is involved in the discussion about quality of LOD. With respect to > that, in > > my opinion, LON has achieved and exceeded its expectations, but I > > understand anyone who disagrees. Besides that, it was, and is, huge fun. > > > > Now to some topics of the discussion: > > > > On the issue of the LOD cloud diagram. I want to express my gratitude to > all > > the people involved, for the effort they voluntarily put in its > development > > and maintenance. I find it especially great, that it is becoming > increasingly > > transparent how the diagram is created and how the datasets are > selected. > > Chris has refered to a set of conditions that are expected for > inclusion, > and > > before the creation of the newest iteration there was an explicit > call on > this > > mailing list to gather more information. I can only echo the sentiment > that if > > someone is unhappy with that diagram, they are free to create their own > and > > put it online. The data is available, the SVG is available and editable, > and they > > use licenses that allow the modification and republishing. > > > > Enrico is right that a system like Watson (or Sindice), that > automatically > > gathers datasets from the Web instead of using a manually submitted and > > managed catalog, will probably turn out to be the better approach. > Watson > > used to have an overview with statistics on its current content, and I > really > > loved that overview, but this feature has been disabled since a few > months. > > If it was available, especially in any graphical format that can be > easily > reused > > in slides -- for example, graphs on the growth of number of triples, > datasets, > > etc., graphs on the change of cohesion, vocabulary reuse, etc. over > time, > > within the Watson corpus -- I have no doubts that such graphs and data > > would be widely reused, and would in many instances replace the current > > usage of the cloud diagram. (I am furthermore curious about Enrico's > > statement that the Semantic Web =/= Linked Open Data and wonder about > > what he means here, but that is a completely different thread). > > > > Finally, to what I consider most important in this thread: > > > > I also find it a shame, that this thread has been hijacked, especially > since the > > original topic was so interesting. The original email by Anja was not > about the > > LOD cloud, but rather about -- as the title of the thread still suggests > -- the > > compliance of LOD with some best practices. Instead of the question > "is X > in > > the diagram", I would much rather see a discussion on "are the selected > > quality criteria good criteria? why are some of them so little followed? > how > > can we improve the situation?" Anja has pointed to a wealth of openly > > available numbers (no pun intended), that have not been discussed at > all. > For > > example, only 7.5% of the data source provide a mapping of "proprietary > > vocabulary terms" to "other vocabulary terms". For anyone building > > applications to work with LOD, this is a real problem. > > > > Whenever I was working on actual applications using LOD, I got > disillusioned. > > The current state of LOD is simply insufficient to sustain serious > application > > development on top of it. Current best practices (like follow-your-nose) > are > > theoretically sufficient, but not fully practical. To just give a few > examples: > > * imagine you get an RDF file with some 100 triples, including some 120 > > vocabulary terms. In order to actually display those, you need the label > for > > every single of these terms, preferably in the user's language. But most > RDF > > files do not provide such labels for terms they merely reference. In > order > to > > actually display them, we need to resolve all these 120 terms, i.e. we > need to > > make more than a hundred calls to the Web -- and we are only talking > about > > the display of a single file! In Semantic MediaWiki we had, from the > > beginning, made sure that all referenced terms are accompanied with some > > minimum definition, providing labels, types, etc. which enables tools to > at > > least create a display quickly and then gather further data, but that > practice > > was not adopted. Nevermind the fact that language labels are > basically not > > used for multi-linguality (check out Chapter 4 of my thesis for the > data, > it's > > devastating). > > * URIs. Perfectly valid URIs like, e.g. used in Geonames, like > > http://sws.geonames.org/3202326/ suddenly cause trouble, because their > > serialization as a QName is, well, problematic. > > * missing definitions. E.g. DBpedia has the properties > > http://dbpedia.org/ontology/capital and > > http://dbpedia.org/property/capital -- used in the very same file about > the > > same country. Resolving them will not help you at all to figure out how > they > > relate to each other. As a human I may make an educated guess, but for a > > machine agent? And in this case we are talking about the *same* data > > provider, nevermind cross-data-provider mapping. > > > > I could go on for a while -- and these are just examples *on top* of the > > problems that Anja raises in her original post, and I am sure that > everyone > > who has actually used LOD from the wild has stumbled upon even more such > > problems. She is raising here a very important point, for the practical > > application of the data. But instead of discussing these issues that > actually > > matter, we talk about bubble graphs, that are created and maintained > > voluntarily, and why a dataset is included or not, even though the > criteria > > have been made transparent and explicit. All these issues seriously > hamper > > the uptake of usage of LOD and lead to the result that it is so much > easier to > > use dedicated, proprietary APIs in many cases. > > > > At one point it was stated that Chris' criteria were random and hard to > fulfill > > in certain cases. If you'd ask me, I would suggest much more draconian > > criteria, in order to make data reuse as simple as we all envision. I > really enjoy > > the work of the pedantic web group with respect to this, providing > validators > > and guidelines, but in order to figure out what really needs to be done, > and > > how the criteria for good data on the Semantic Web need to look like, we > > need to get back to Anja's original questions. I think that is a > question > we > > may try to tackle in Shanghai in some form, I at least would find > that an > > interesting topic. > > > > Sorry again for the length of this rant, and I hope I have offended > everyone > > equally, I really tried not to single anyone out, > > Denny > > > > P.S.: Finally, a major reason why I think I shouldn't have commented on > this > > thread is because it involves something I co-created, and thus I am > afraid > it > > impossible to stay unbiased. I consider constant advertising of your own > > ideas tiring, impolite, and bound to lead to unproductive > discussions due > to > > emotional investment. If the work you do is good enough, you will find > > champions for it. If not, improve it or do something else. > > > > > > > > On Oct 21, 2010, at 20:56, Martin Hepp wrote: > > > > > Hi all: > > > > > > I think that Enrico really made two very important points: > > > > > > 1. The LOD bubbles diagram has very high visibility inside and outside > of the > > community (up to the point that broad audiences believe the diagram > would > > define relevance or quality). > > > > > > 2. Its creators have a special responsibility (in particular as > scientists) to > > maintain the diagram in a way that enhances insight and understanding, > > rather than conveying false facts and confusing people. > > > > > > So Kingsley's argument that anybody could provide a better diagram > does > > not really hold. It will harm the community as a whole, sooner or later, > if the > > diagram misses the point, simply based on the popularity of this > diagram. > > > > > > And to be frank, despite other design decisions, it is really > ridiculous > that > > Chris justifies the inclusion of Denny's numbers dataset as valid Linked > Data, > > because that dataset is, by design and known to everybody in the core > > community, not data but noise. > > > > > > This is the "linked data landfill" mindset that I have kept on > complaining > > about. You make it very easy for others to discard the idea of > linked data > as a > > whole. > > > > > > Best > > > > > > Martin > > > > > > > > __._,_.___ > > Reply to *sender* > <mailto:chris@bizer.de?subject=AW:%20AW:%20ANN:%20LOD%20Cloud%20-%20Statistics%20and%20compliance%20with%20best%20practices> > | Reply to *group* > <mailto:semanticweb@yahoogroups.com?subject=AW:%20AW:%20ANN:%20LOD%20Cloud%20-%20Statistics%20and%20compliance%20with%20best%20practices> > | Reply *via web post* > <http://groups.yahoo.com/group/semanticweb/post;_ylc=X3oDMTJwZWlpdHE4BF9TAzk3MzU5NzE0BGdycElkAzI3MjYyMjgEZ3Jwc3BJZAMxNzA1MDE2MDYxBG1zZ0lkAzUxMTIEc2VjA2Z0cgRzbGsDcnBseQRzdGltZQMxMjg3NzM2Mzk1?act=reply&messageNum=5112> > | *Start a New Topic* > <http://groups.yahoo.com/group/semanticweb/post;_ylc=X3oDMTJlOGNyOHNsBF9TAzk3MzU5NzE0BGdycElkAzI3MjYyMjgEZ3Jwc3BJZAMxNzA1MDE2MDYxBHNlYwNmdHIEc2xrA250cGMEc3RpbWUDMTI4NzczNjM5NQ--> > > > Messages in this topic > <http://groups.yahoo.com/group/semanticweb/message/5108;_ylc=X3oDMTM0NDhkNWJlBF9TAzk3MzU5NzE0BGdycElkAzI3MjYyMjgEZ3Jwc3BJZAMxNzA1MDE2MDYxBG1zZ0lkAzUxMTIEc2VjA2Z0cgRzbGsDdnRwYwRzdGltZQMxMjg3NzM2Mzk1BHRwY0lkAzUxMDg-> > (*3*) > > *Recent Activity:* > > ˇ*New Members > <http://groups.yahoo.com/group/semanticweb/members;_ylc=X3oDMTJmajBpdmIwBF9TAzk3MzU5NzE0BGdycElkAzI3MjYyMjgEZ3Jwc3BJZAMxNzA1MDE2MDYxBHNlYwN2dGwEc2xrA3ZtYnJzBHN0aW1lAzEyODc3MzYzOTU-?o=6>**2 > * > > Visit Your Group > <http://groups.yahoo.com/group/semanticweb;_ylc=X3oDMTJlZzYyMmk5BF9TAzk3MzU5NzE0BGdycElkAzI3MjYyMjgEZ3Jwc3BJZAMxNzA1MDE2MDYxBHNlYwN2dGwEc2xrA3ZnaHAEc3RpbWUDMTI4NzczNjM5NQ--> > > > *MARKETPLACE* > > *Get great advice about dogs and cats. Visit the Dog & Cat Answers > Center. > <http://us.ard.yahoo.com/SIG=15opdcjro/M=493064.13814537.14041040.10835568/D=groups/S=1705016061:MKP1/Y=YAHOO/EXP=1287743596/L=0431752a-ddb7-11df-bb58-cbc5498e5478/B=qzBMIUwNO6E-/J=1287736396359462/K=cyxj9KyfSsHtr1vuvduVPg/A=6078812/R=0/SIG=114ae4ln1/*http:/dogandcatanswers.yahoo.com/>* > > ** > > * > ------------------------------------------------------------------------ > * > > *Hobbies & Activities Zone: Find others who share your passions! > Explore new interests. > <http://us.ard.yahoo.com/SIG=15oob6ufi/M=493064.14012770.13963757.13298430/D=groups/S=1705016061:MKP1/Y=YAHOO/EXP=1287743596/L=0431752a-ddb7-11df-bb58-cbc5498e5478/B=rDBMIUwNO6E-/J=1287736396359462/K=cyxj9KyfSsHtr1vuvduVPg/A=6015306/R=0/SIG=11vlkvigg/*http:/advision.webevents.yahoo.com/hobbiesandactivitieszone/>* > > ** > > * > ------------------------------------------------------------------------ > * > > *Stay on top of your group activity without leaving the page you're on > - Get the Yahoo! Toolbar now. > <http://us.ard.yahoo.com/SIG=15o8tbgiv/M=493064.13983314.14041046.13298430/D=groups/S=1705016061:MKP1/Y=YAHOO/EXP=1287743596/L=0431752a-ddb7-11df-bb58-cbc5498e5478/B=qjBMIUwNO6E-/J=1287736396359462/K=cyxj9KyfSsHtr1vuvduVPg/A=6060255/R=0/SIG=1194m4keh/*http:/us.toolbar.yahoo.com/?.cpdl=grpj>* > > ** > > Yahoo! Groups > <http://groups.yahoo.com/;_ylc=X3oDMTJkMWVydXFqBF9TAzk3MzU5NzE0BGdycElkAzI3MjYyMjgEZ3Jwc3BJZAMxNzA1MDE2MDYxBHNlYwNmdHIEc2xrA2dmcARzdGltZQMxMjg3NzM2Mzk1> > > Switch to: Text-Only > <mailto:semanticweb-traditional@yahoogroups.com?subject=Change%20Delivery%20Format:%20Traditional>, > Daily Digest > <mailto:semanticweb-digest@yahoogroups.com?subject=Email%20Delivery:%20Digest> > . Unsubscribe > <mailto:semanticweb-unsubscribe@yahoogroups.com?subject=Unsubscribe> . > Terms of Use <http://docs.yahoo.com/info/terms/> > > . > > __,_._,___ > -- Regards, Kingsley Idehen President& CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
Received on Friday, 22 October 2010 12:19:58 UTC