- From: Dave Reynolds <dave.e.reynolds@gmail.com>
- Date: Sat, 27 Oct 2012 11:33:26 +0100
- To: Makx Dekkers <makx@makxdekkers.com>
- CC: 'Richard Cyganiak' <richard@cyganiak.de>, 'Phil Archer' <phila@w3.org>, 'Public GLD WG' <public-gld-wg@w3.org>
Hi Makx, On 26/10/12 19:20, Makx Dekkers wrote: > > All, > > I was trying to keep out of the public discussion, because I really > cannot speak for DCMI. For an "official" opinion, you need to ask Tom > Baker. Understood. > Having said that, I do want to react to this: > >>> (Dave) Whereas the intent of Dublin Core seems to be that an > instance of >> dct:LinguisticSystem denotes the Lingustic System itself not the name > of it. >>> >>> So at this level the semantics do not appear match. >> >> (Richard) Strictly speaking you are right. This seems to be a bit of > an angels- >> and-pinheads reason to reject this extremely convenient modelling >> approach though. >> > > I guess both Dave and Richard are right. > > However, from my perspective (purely personal of course) there are two > practical issues beyond the angels and pinheads discussion: > > - consistency: if some people read the DCMI spec and take Dave's > position, they will use URIs; other people will take Richard's view and > use the datatype. This means that any application reading stuff from > different places will have to count on getting a mix of the two. > Probably no way to get around that -- lots of people don't respect range > limitations anyway, so maybe this whole discussion is moot. True but in producing a W3C spec we should try to encourage best practice, whatever that is in this case, even if it is not uniformly respected. > - linked data: not using URIs for identification of languages throws > away the whole linked data machinery. Of course, if you know by looking > at the datatype that the code is from ISO639, a human receiver can look > it up at http://www.ethnologue.com/web.asp but for the machine it's a > dead end. > > There are several URI collections for languages: > http://id.loc.gov/vocabulary/iso639-1/en, > http://dbpedia.org/resource/English_language, > http://lexvo.org/id/term/eng/, http://www.lingvoj.org/lang/en, > http://downlode.org/rdf/iso-639/languages#en all represent English. I > like the ones maintained by the Library of Congress the best. > > So what is the convenience, anyway? If you're creating metadata, you'll > look up the code from the code table. In Richard's case you stick in the > two letters "en" in a literal; in Dave's case you append those same two > letters to the base URI, e.g. "http://id.loc.gov/vocabulary/iso639-1/en" > in the resource URI. No sweat. I agree (unsurprisingly) and thanks for the pointers to the URI collections, I wasn't aware of all of those. For practical use I think either is fine and I've a lot of sympathy with Richard's characterization of the semantic mismatch as "angels and pinheads". The convenience differences comes in consuming the data. If DCAT recommends literals then consumers of DCAT data can be sure they are getting an ISO639 code and can robustly deal with that but will need special knowledge to do so (e.g. to present a human readable choice of download languages they will need to consult a table of code -> name mappings, no big deal). If DCAT recommends resources then consumers might find data from any of those URI collections (and others) and will need to cope with any of them. They should be able to dereference those URIs but there is no guarantee over what information will be returned (and in what vocabularies). The chances are good there will be a presentation label using rdfs:label or skos:prefLabel but to find the ISO639 code itself you either need to know about the URI structure or understand the vocabulary used to convey the code (e.g. madsrdf:code for the Library of Congress data). If DCAT recommends resources and recommends a particular URI collection then that creates a dependency which we might not want. If it recommends resources and recommends using resources which are described according to some particular vocabulary then that's still a dependency (on data being available using the vocabulary) and is more work for us. I have a slight preference for sticking to the letter of the semantics, using resources and have some weasel recommendation like "such as the Library of Congress URI collection". However I will not object to the datatype approach if that's the group preference. Dave
Received on Saturday, 27 October 2012 10:33:55 UTC