- From: Gannon Dick <gannon_dick@yahoo.com>
- Date: Tue, 27 Mar 2012 15:03:26 -0700 (PDT)
- To: Stasinos Konstantopoulos <konstant@iit.demokritos.gr>
- Cc: public-lod <public-lod@w3.org>, "eGov IG \(Public\)" <public-egov-ig@w3.org>, "public-gld-wg@w3.org" <public-gld-wg@w3.org>
- Message-ID: <1332885806.27247.YahooMailNeo@web112607.mail.gq1.yahoo.com>
Comments below ________________________________ From: Stasinos Konstantopoulos <konstant@iit.demokritos.gr> To: Gannon Dick <gannon_dick@yahoo.com> Cc: public-lod <public-lod@w3.org>; eGov IG (Public) <public-egov-ig@w3.org>; public-gld-wg@w3.org Sent: Tuesday, March 27, 2012 6:25 AM Subject: Re: ISO 639 Cookbook was ... LD? algorithm and questions. Gannon, hi. I am adding GLD to the list of recipients, as this is relevant to ISSUE-26. There is a balance to be achieved here between the utility of closed sets where all instances of the can safely assumed to be universally understood and the open nature of both the world and the Semantic Web. In other words, if each piece of published data were to come up with its own identifiers for each language variation that happened to be pertinent to the data, that would make the data less understood, less linked, and less useful. If, on the other hand, one had to choose one of a closed set of identifiers none of which is appropriate, this would make the data less accurate and, again, less useful. ============= I agree ============= By we do not need to choose between these two extremes, because it is exactly situations like this that differentiate semantic technologies from relational data stores: the ability to extend vocabularies in a way that allows consumers that do not know about the extension to retrieve some (although not all) of the semantics of the data. ============= I would say the two extremes exist at all times and navigation is critical. The Discovery path is not the inverse of the (data) Supply path - there is a phase change which involves the collection of statistics. The data is <b>either</b> faithful or commercially efficient but cannot always be both. I can only demonstrate the case exists *sometimes*. For example, the UN saying, "give a man a fish and you have fed him for a day, teach a man to fish and you have fed him for a lifetime" (or something like that). If you ask the Russian Federation or the Greek Government they point you to their Fishing Lessons page (en). If you have fish and want a list of Russians or Greeks to whom the page is not available, the answer is that every Russian and every Greek can learn to fish if they like. Open Data is just that. That the fishing lessons are in English is of no consequence <i>other</i> than rending anonymous the good fisherman who wrote the lesson. ============= Coming back to language code lists, IMHO the best approach is to allow language properties to range over URIs beyond the ISO language codes, only if such language fillers are linked to their closest match in the ISO codeset. In other words, allow ad-hoc extensions of the codeset only if the extended codepoint is linked to codepoint that would have been used if no extension were allowed. As concrete example, let us define a new property, possibly a sub-class of skos:related, that has: relatedToLanguage rdfs:domain dc:LinguisticSystem . relatedToLanguage rdfs:range http://id.loc.gov/vocabulary/iso639-1/iso639-1_Language . We can now define arbitrarily fine language varieties and historical forms of languages without loosing the link to the main entry. For example: ex:en_16c rdf:type dc:LinguisticSystem ; rdfs:label "16c English" ; relatedToLanguage http://id.loc.gov/vocabulary/iso639-1/el . ex:en_Gla rdf:type dc:LinguisticSystem ; rdfs:label "English as spoken in Glasgow" ; relatedToLanguage http://id.loc.gov/vocabulary/iso639-1/el . ========================================== With reference to the above, this is a good backward looking interoperability mechanism, as long as you remember that there is no forward looking "solution". There are no identifiable people in Glasgow who speak the average "this", although there is a large group of anonymous who speak "this" fluently. The group (not class) probably includes several Greek and Russian fishing experts on holiday, too. This is both a feature of governance and a bug of discovery. ========================================== Coming back to goverments, under the regime above language lists like ISO 639 cannot be used as an excuse to not provide for local or even ad-hoc extensions. ========================================== Governments gain nothing from solving the forward looking problem although it would be nice for the commercial world if they did. Governments do gain a great deal leveling ambiguity by using a language understood in Glasgow, London, Moscow, Athens and elsewhere, and they can hope no one notices that the fishing lessons sound much the same (the backward problem). They use Artificial Bureaucracy. It differs from Artificial Intelligence in this way: You are in Vienna and notice that the Danube is a handy way to get to Budapest. Rome requires quite a bit more rowing, or a more "intelligent" access to resources (airplanes are good). From the viewpoint of a government run data repository, the trip down the Danube looks like this: http://www.rustprivacy.org/2012/urn-lex/danube.html (sorry, not all the links work, it is a screen shot). This is the domain model and the direction of discovery is left to right. However "propaganda" has no reason to travel right to left - aside from tourism promotion; the sunny beaches of Greenland and the shark-free bathtubs of Australia are just good marketing. --Gannon Best, Stasinos On Sat Mar 17 14:52:19 2012 Gannon Dick said: > "A criticism voiced by detractors of Linked Data suggest that Linked Data modeling is too hard or time consuming." > > There are some sets of standard codes which are infrequently updated. It might pay for a data set repository to build identifiers to order. In this way, the standards can be maintained complete and, more to the point, applications can "assume" they are complete. > > There is an example (ISO 639 Language Codes) here: http://www.rustprivacy.org/2012/urn/lang/loc.tar.gz > > This includes two mysql databases: > 1. A "lite" version with just the tables needed to specify either "terminology" or "bibliographic" codes (including currency). I used the D2R Server. > > 2. A full maintainable version, which starts with a "maintain table" and regenerates the tables which address the sticky bits. > > (The following in case you get caught playing with this at your day job, otherwise, have fun) > > There are a number of little technical issues, but for Government, one huge Moral Hazard. The language of Legislation, Policy and Statistical Reporting are coupled with Jurisdiction. The Moral Hazard arises from the situation where speaking a language not understood by a psychiatrist is then considered insane. Nobody wants a government who acts like that, and the Open Data Community doesn't want data sets which skip over distinct populations (without saying so) either. > > > --Gannon > > > > > ________________________________ > From: Bernadette Hyland <bhyland@3roundstones.com> > To: Hugh Glaser <hg@ecs.soton.ac.uk>; Yury Katkov <katkov.juriy@gmail.com> > Cc: Semantic Web <semantic-web@w3.org>; public-lod <public-lod@w3.org> > Sent: Friday, March 16, 2012 4:11 PM > Subject: Re: How to find the data I need in LD? algorithm and questions. > > > Hi, > Hugh - I responded earlier today to Yury, off-list. So I would offer a different perspective, perhaps because the sun is out here today and it is Friday afternoon and the plum blossoms are blooming... > > We've moved from: > * shouting (circa 2003-2006) to > * the meme of Linked Data by TimBL (2007) [1] > * proof-of-concepts (2008-2010) to > * a couple academic books, conference talks & keynotes on real world deployments involving LD/LOD (2010, 2011) to > * developers books, W3C Recommendations, published use cases/CXO guides (2012) > > FWIW, I offered to fold in some of Yury's guidance to the draft Linked Data Cookbook[2] and suggested the cookbook as a possible resource for his students. > > If you are open to a different viewpoint, here is what I see on the ground in 2012. There are publishers, both in the private & public sector, who are beginning to publish data as Linked Data. It is of course a new approach to data publishing and consumption and there are some really entrenched players, so it isn't going to happen within one or two years. Furthermore, everyone has a "day job" and learning yet another way to publish your data doesn't sound like a career-building activity on face value ... > > I contend, it will take some public successes, plus a couple of pragmatic Linked Data books for developers, some cookbooks or how-to's, and some well-formed W3C Recommendations for Linked Open Data to be pervasive ... all of which is in progress. > > It will take probably 10 years before LD/LOD publishing is 'mainstream' but make no mistake, it will happen. A Linked Data approach to publishing data (on the Web of data) is as disruptive as the Web of documents was circa 1995. > > It will save organizations millions and governments billions of dollars (or their currency equivalents) in enterprise information integration. Do I have documented ROIs in a glossy printed consulting report to back that up - no, not yet. I believe we (as in the Linked Data ecosystem) will have this soon. The numbers & case studies will come from big international organizations involved in issue tracking & customer care, business publishing, healthcare, logistics and defense (the non-secret-squirrel-part of defense). > > Regardless whether orgs are doing LD behind the firewall or in front of it, publishing Linked Data makes good economic sense but we're in the early days. Don't loose heart. > > I see university students are learning about LD now in undergrad CS classes. About 20 of us from the UK, Netherlands, Spain, US, India, Australia in government / academe / private sector meet weekly on the W3 Gov't Linked Data Working Group to nut out vocabs, best practices & a cookbook for gov't publication & consumption. > > FYR, data.gov recently featured a blogpost [4] by a uni student who did a mashup where he didn't know the publisher of US Gov't content, although he did work under the supervision of someone who knows a bit about RDF. > > > Kind regards, > > Bernadette Hyland > > > [1] http://www.w3.org/DesignIssues/LinkedData.html > [2] http://www.w3.org/2011/gld/wiki/Linked_Data_Cookbook > [3] http://www.data.gov/communities/node/116/blogs/6170 > > > On Mar 16, 2012, at 4:15 PM, Hugh Glaser wrote: > > Hi Yury > >Well I am sorry to see you have had no response, but it is not so surprising, really. > >You will find that essentially there are very few people doing what you are trying to do. > >The Semantic Web and Linked Data world is made up of people who publish, and rarely consume. > >It is almost unheard of for someone to consume someone else's data, unless they know the publisher. > >Everyone is shouting, but not many listening. > >OK, I might not be in a great mood today, but I'm not far wrong. > > > >To your problem. > >Your steps seem reasonable. > >I would, however, add the use of VoiD (http://www.w3.org/TR/void/, http://semanticweb.org/wiki/VoiD). > >VoiD is designed to deliver what you want, I think (if it doesn't, then it should be made to). > >Some sites do publish VoiD descriptions, and these can often be located automatically by looking in the sitemap, which can in turn be discovered by looking in robots.txt. > >Keith Alexander has a store of collected VoiD descriptions (http://kwijibo.talis.com/voiD/), as do we (http://void.rkbexplorer.com). > >I would also suggest that my own site, http://sameas.org might lead from interesting URIs to other related URIs, and hence interesting stores. > > > >Hope that helps. > >Best > >Hugh > > > >On 16 Mar 2012, at 04:58, Yury Katkov wrote: > > > > > >Hi! > >> > > > >> > >What do you usually do when you want to find a dataset for your needs? > >> > >I'm preparing a tiny tutorial on this topic for the students and ask > >> > >you to share your experience. > >> > >My typical algorithm is the following: > >> > >0) Define the topic. I have to know precisely what kind of data I need. > >> > >1) Look at Linked Data cloud and other visualizations to ensure that > >> > >the needed data is presented somewhere. If for example I want to > >> > >improve Mendeley or Zotero I look at these visualizations and search > >> > >for publication data. > >> > >2) Search the needed properties and classes with Sindice, Sig.ma and Swoogle. > >> > >3) Look at CKAN description of the dataset, its XML citemap and VoiD metadata. > >> > >4) explore the dataset that were found on the previous step with some > >> > >simple SPARQL queries like these: > >> > > > >> > >SELECT DISTINCT ?p WHERE { > >> > >?s ?p ?o > >> > >} > >> > > > >> > >SELECT DISTINCT ?class WHERE { > >> > >{ ?class a rdfs:Class . } > >> > >UNION > >> > >{?class a owl:Class . } > >> > >} > >> > > > >> > >SELECT DISCTINCT ?label WHERE { > >> > >{?a rdfs:label ?label} > >> > >UNION > >> > >{?a dc:title ?label} > >> > >/* and possibly some more things to search foaf:name's and so on */ > >> > >} > >> > > > >> > >I can also use COUNTing and GROUPing BY to get some quick statistics > >> > >about the datasets. > >> > >5) When I find some interesting URIs I use semantic web browsers > >> > >Marbles and Sig.ma to navigate through the dataset. > >> > >5) Ask these smart guys in Semantic Web mailing list and Public LOD > >> > >mailing list. Probably go to semanticoverflow and ask for help there > >> > >as well > >> > >====================== > >> > >Here are my questions: > >> > > > >> > >1) What else do you typically doing to find the dataset? > >> > >2) Is there a resource where I can find the brief description of the > >> > >dataset in terms of properties and classes that mentioned there? And > >> > >these cool arrows in Richard Cyganiak's diagram: is there a resource > >> > >where I can find the information about relationship between the given > >> > >dataset and the rest of the world? > >> > >3) I have similar algorithm for searching vocabularies. Can resources > >> > >like Schemapedia help me in searching the dataset? > >> > >4) Do you know any other meeting SPARQL queries that can be handy when > >> > >I search something in the dataset. > >> > > > >> > >Sincerely yours, > >> > >----- > >> > >Yury Katkov > >> > > > >> > >-- > >Hugh Glaser, > > Web and Internet Science > > Electronics and Computer Science, > > University of Southampton, > > Southampton SO17 1BJ > >Work: +44 23 8059 3670, Fax: +44 23 8059 3045 > >Mobile: +44 75 9533 4155 , Home: +44 23 8061 5652 > >http://www.ecs.soton.ac.uk/~hg/ > > > > > >
Received on Tuesday, 27 March 2012 22:03:57 UTC