- From: Peter Constable <petercon@microsoft.com>
- Date: Thu, 3 May 2007 06:47:04 -0700
- To: "Elisa F. Kendall" <ekendall@sandsoft.com>, Misha Wolf <Misha.Wolf@reuters.com>
- CC: "Gauri.Salokhe@FAO.ORG" <Gauri.Salokhe@FAO.ORG>, WWW International <www-international@w3.org>, Semantic web list <semantic-web@w3.org>, LTRU Working Group <ltru@ietf.org>
- Message-ID: <DDB6DE6E9D27DD478AE6D1BBBB8357955E34206F3E@NA-EXMSG-C117.redmond.corp.microsoft>
Elaine: It is unclear to me whether the ontology is clearly defined. (I have not participated in the development area in question so am not familiar with the relevant details of this case. This is general feedback.) In many cases, metadata elements are used for "language" where what is, in fact, meant may or may not be strictly language. In particular, the XML attribute xml:lang has always drawn values from an IETF specification for "language tags", and those cover a complex ontology that includes more than just language in the conventional sense. Specifically, IETF "language" tags can be used to declare language, dialect, written form, orthographic conventions and other such language-related attributes of content. If an XML schema defines some attribute that takes values from ISO 639, then the ontology involved could be considered that of language in the conventional sense. But if the attribute xml:lang is used, then "language" involves a different ontology. I discussed these issues in a paper five years ago, which you can find at http://www.unicode.org/notes/tn8/SILEWP2002-003.pdf. (This was a first attempt and should not be taken as reflecting up-to-date understanding.) Peter From: www-international-request@w3.org [mailto:www-international-request@w3.org] On Behalf Of Elisa F. Kendall Sent: Monday, April 23, 2007 10:25 AM To: Misha Wolf Cc: Gauri.Salokhe@FAO.ORG; WWW International; Semantic web list; LTRU Working Group Subject: Re: [Fwd: Language Ontology] Hi Misha, We are very aware of it, and have been following the work, but I failed to mention it in the email. I should say that our ontology was developed for offline use in an internal system, as an initial requirement. Having said that, if you look at the RFCs, they only describe tags, not an RDF vocabulary or OWL ontology. Our approach is compatible with the RFCs but adds capabilities that support co-reference resolution, for example, in target application. Best, Elisa Misha Wolf wrote: This sounds very worrying as you don't seem to be aware of BCP 47. Misha ________________________________ From: www-international-request@w3.org<mailto:www-international-request@w3.org> [mailto:www-international-request@w3.org] On Behalf Of Elisa F. Kendall Sent: 23 April 2007 17:32 To: Gauri.Salokhe@FAO.ORG<mailto:Gauri.Salokhe@FAO.ORG> Cc: 'WWW International'; Semantic web list Subject: Re: [Fwd: Language Ontology] Hi Gauri, We've done this for some of our government customers, using essentially the second approach you cite. We're also in the process of relating the ontology to another one we've built to represent ISO 3166, which includes the administrative languages used by countries and non-sovereign territories represented in that standard. If you can hang out for a few days, we (Sandpiper) are just finalizing a version that includes both ISO 639-1 and 639-2. The approach is more of a hybrid of the two you present, based on customer needs. It includes a fragment of ISO 1087, and also some inverse relations since there is a one-to-one correspondence between languages and codes. We elected to create a 'Language' class, rather than 'LanguageCode', which we reuse in other applications; classes for Alpha-2Code and Alpha-3Code are subclasses of CodeElement, from ISO 5127, with instances of these codes as first class individuals. We use literals (via datatype properties) to represent the set of English, French, and in the case of 639-1 Indigenous names. We've also created subclasses of Alpha-3Code to support distinctions between bibliographic and terminologic, collective, and special identifiers, with individual and macrolanguages to support 639-3. A subsequent release will include all of the languages described in ISO 639-3, as well as additions to support at least some of the subtagging that Dan mentions, fyi. Our intent is to publish it on a new portal that will become part of a new service offered by the Ontology PSIG in the OMG, since we've been asked to publish several ontologies in recent RFPs. I'll be happy to send our preliminary version when it's "baked and tested", and follow up with an announcement of the new portal (where a revision using OMG URIs will be posted) once that's available. It may be a couple of months before we're ready to make that announcement, but we're hoping that the service will be useful to many of us in the Semantic Web community. Best regards, Elisa Dan Brickley wrote: Forwarding from the Dublin Core list, in case folk here can advise. Gauri, one thing I'd suggest as useful would be to take the concepts implicit in RFC 4646, http://www.rfc-editor.org/rfc/rfc4646.txt see also http://www.w3.org/International/articles/language-tags/Overview.en.php ...and in particular the subtag mechanism, script, region, variant etc. It would be great to have those expressed explicitly. cheers, Dan ________________________________ Subject: Language Ontology From: "Salokhe, Gauri (KCEW)" <Gauri.Salokhe@FAO.ORG><mailto:Gauri.Salokhe@FAO.ORG> Date: Mon, 23 Apr 2007 17:28:39 +0200 To: DC-GENERAL@JISCMAIL.AC.UK<mailto:DC-GENERAL@JISCMAIL.AC.UK> To: DC-GENERAL@JISCMAIL.AC.UK<mailto:DC-GENERAL@JISCMAIL.AC.UK> Dear All, We are working on creating Ontology for languages. The need came up as we tried to convert our XML metadata files into OWL. In our metadata (XML) records, we have three types of occurrences of language information. <dc:language scheme="ags:ISO639-1">En</dc:language> <dc:language scheme="dcterms:ISO639-2">eng</dc:language> <dc:language>English</dc:language> We have two options for modelling the language ontology: 1) Create a class for each language, assign URI to it and add all the other lexical variations, ISO codes (create datatype property) as follows: OWL:Thing |_ Class:Language |_ Instance:URI1 |_ rdfs:label xml:lang="en" English |_ rdfs:label xml:lang="es" Inglés |_ rdfs:label xml:lang="it" Inglese |_ rdfs:label xml:lang="fr" Anglais |_ etc. |_ property:hasISO639-1Code en (string) |_ property:hasISO639-2Code eng (string) |_ etc. |_ Instance:URI2 |_ Instance:URI3 |_ Instance:URI4 2) Create Classes called Language and Language code and make links between instances of Language and Language Codes as follows: OWL:Thing |_ Class:Language |_ Instance:URI1 |_ property:hasCode en (link to the en instance of Class ISO639-1 below) |_ property:hasCode eng (link to the eng instance of Class ISO639-1 below) |_ Class:LanguageCode |_ SubClass ISO639-1 |_ Instance:en |_ Instance:fr |_ etc. |_ SubClass ISO639-2 |_ Instance:eng |_ Instance:fra |_ etc. |_ etc. Does anyone have similar experience with modelling in OWL? Any suggestions on which model is better and (extensible)? Does an ontology already exist that we can reuse? Than you, Gauri This email was sent to you by Reuters, the global news and information company. To find out more about Reuters visit www.about.reuters.com<http://www.about.reuters.com> Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Reuters Limited. Reuters Limited is part of the Reuters Group of companies, of which Reuters Group PLC is the ultimate parent company. Reuters Group PLC - Registered office address: The Reuters Building, South Colonnade, Canary Wharf, London E14 5EP, United Kingdom Registered No: 3296375 Registered in England and Wales
Received on Thursday, 3 May 2007 13:47:31 UTC