W3C home > Mailing lists > Public > www-international@w3.org > April to June 2007

Re: [Ltru] RE: [Fwd: Language Ontology]

From: JFC Morfin <jefsey@jefsey.com>
Date: Tue, 24 Apr 2007 03:42:22 +0200
To: "Debbie Garside" <debbie@ictmarketing.co.uk>, "'Elisa F. Kendall'" <ekendall@sandsoft.com>, "'Misha Wolf'" <Misha.Wolf@reuters.com>,Gauri.Salokhe@FAO.ORG, maaya@funredes.org,ietf-languages@jefsey.com
Cc: 'WWW International' <www-international@w3.org>, 'Semantic web list' <semantic-web@w3.org>, 'LTRU Working Group' <ltru@ietf.org>
Message-Id: <20070424014145.42E8D17C2E@smtp7-g19.free.fr>
Dear Debbie and all,
I suggest that everyone be extremely careful about the language 
ontologies convergence issue, in order to avoid duplicate work.

This mail is a personal mail. A project preliminary document I work 
on is to be released hopefully next week. Its purpose is to permit 
stable and easy interoperability in the ISO 3166 regalian issues, 
statistics, governments, administration, standards, etc. areas. I 
will copy you its URL with the associated working list for its 
further releases.

As you know, our need for an Internet multilingual distributed 
reference system project has been delayed for several years by the 
lack of proper support by ISO 3166, 639, 15924, 10646. They 
documented "names of languages", etc. when we need stable and clear 
"concept of language", and even  "emergence of language", to properly 
interlink related registries in an ISO 11179 conformant way. 
Otherwise it is a nightmare !

The need has been perceived but not identified on the language 
ontology, script subsection, region sub-subsection side (TC 37/ISO 
639) where texts now confuse "language names" and "languages" 
(standing for "language concept"?) This has an impact on the whole 
series consistency, on what macrolanguages are, etc. This then 
impacts on BCP47 with their review system and their delay in 
supporting ISO 639-3. We also have uncertain echoes about the final 
status of ISO CD 639 and its ISO 11179 conformance. I would be really 
glad as a user if you could answer these two question: will 639 be 
ISO 11179 conformant?  will it stabilize as a 3 letter code, or a mix 
of  2, 3 and 4 letter codes as BCP47?

Now, on the country oriented ontology with language subsection and 
script sub-subsection, ISO 3166 addressed the issue in adding 
administrative languages as a working example. We started working on 
it, following the ISO 3166 rules, and validated the concept, 
obtaining a very handy, consistent and powerful easily interoperable 
solution. We will be able to complete it, further on, with ISO 639 
codes as soon as TC 37 and 46 have coordinated and taken advantage 
from the ISO 3166-1:2006 experience and ISO 639-1 added the five (at 
least) missing alpha2..

At 00:54 24/04/2007, Debbie Garside wrote:
>Please be very careful with the use of the "Administrative Language" 
>information from ISO 3166-1.  It is incomplete and therefore not good data.

This is a a remark of TC37 sociolinguist expert!
As ISO 3166 users we do not want an exhaustive coverage of 20 or 
30.000 language entities. We just need pragmatic 192 local government 
validated information.

>For example, it shows only two "Administrative Languages" for India 
>where there are at least twenty-two.

If the Indian Gov said two are "administrative" and the South-African 
one said eleven, this is their decision.

>I am hoping that this information will be taken out of the standard 
>in the near future.

??? Where do you want to get it documented then?


>  I am currently writing an ISO NWIP for a revision of ISO 3166-1 
> which will include a proposal for the deletion of this data.

Another one?
I understand you just introduced a first NWIP to _add_ Internet 
related information (in Unicode points?) to these data?
It took decades to get that info there (I first called for it for the 
very reasons you use ... in 1984). Now we start being happy using it 
and have built a multilingual taxonomy on them, you want to delete them???

I am lost. Could you please clarify this? Thank you!
jfc

>Best regards
>
>Debbie Garside
>Editor ISO DIS 639-6
><BLOCKED::http://www.geolang.com>www.geolang.com
>
>
>
>----------
>From: www-international-request@w3.org 
>[mailto:www-international-request@w3.org] On Behalf Of Elisa F. Kendall
>Sent: 23 April 2007 18:25
>To: Misha Wolf
>Cc: Gauri.Salokhe@FAO.ORG; WWW International; Semantic web list; 
>LTRU Working Group
>Subject: Re: [Fwd: Language Ontology]
>Hi Misha,
>We are very aware of it, and have been following the work, but I 
>failed to mention it in the email.  I should say that our ontology 
>was developed for offline use in an internal system, as an initial 
>requirement.  Having said that, if you look at the RFCs, they only 
>describe tags, not an RDF vocabulary or OWL ontology.  Our approach 
>is compatible with the RFCs but adds capabilities that support 
>co-reference resolution, for example, in target application.
>Best,
>Elisa
>Misha Wolf wrote:
>>This sounds very worrying as you don't seem to be aware of BCP 47.
>>
>>Misha
>>----------
>>From: 
>><mailto:www-international-request@w3.org>www-international-request@w3.org 
>>[mailto:www-international-request@w3.org] On Behalf Of Elisa F. Kendall
>>Sent: 23 April 2007 17:32
>>To: <mailto:Gauri.Salokhe@FAO.ORG>Gauri.Salokhe@FAO.ORG
>>Cc: 'WWW International'; Semantic web list
>>Subject: Re: [Fwd: Language Ontology]
>>Hi Gauri,
>>We've done this for some of our government customers, using 
>>essentially the second approach you cite.  We're also in the 
>>process of relating the ontology to another one we've built to 
>>represent ISO 3166, which includes the administrative languages 
>>used by countries and non-sovereign territories  represented in that standard.
>>If you can hang out for a few days, we (Sandpiper) are just 
>>finalizing a version that includes both ISO 639-1 and 639-2. The 
>>approach is more of a hybrid of the two you present, based on 
>>customer needs.  It includes a fragment of ISO 1087, and also some 
>>inverse relations since there is a one-to-one correspondence 
>>between languages and codes.  We elected to create a 'Language' 
>>class, rather than 'LanguageCode', which we reuse in other 
>>applications; classes for Alpha-2Code and Alpha-3Code are 
>>subclasses of CodeElement, from ISO 5127, with instances of these 
>>codes as first class individuals. We use literals (via datatype 
>>properties) to represent the set of English, French, and in the 
>>case of 639-1 Indigenous names.  We've also created subclasses of 
>>Alpha-3Code to support distinctions between bibliographic and 
>>terminologic, collective, and special identifiers, with individual 
>>and macrolanguages to support 639-3.  A subsequent release will 
>>include all of the languages described in ISO 639-3, as well as 
>>additions to support at least some of the subtagging that Dan 
>>mentions, fyi.  Our intent is to publish it on a new portal that 
>>will become part of a new service offered by the Ontology PSIG in 
>>the OMG, since we've been asked to publish several ontologies in 
>>recent RFPs.  I'll be happy to send our preliminary version when 
>>it's "baked and tested", and follow up with an announcement of the 
>>new portal (where a revision using OMG URIs will be posted) once 
>>that's available.  It may be a couple of months before we're ready 
>>to make that announcement, but we're hoping that the service will 
>>be useful to many of us in the Semantic Web community.
>>
>>Best regards,
>>Elisa
>>Dan Brickley wrote:
>>>Forwarding from the Dublin Core list, in case folk here can advise.
>>>Gauri, one thing I'd suggest as useful would be to take the 
>>>concepts implicit in RFC 4646,
>>>
>>><http://www.rfc-editor.org/rfc/rfc4646.txt>http://www.rfc-editor.org/rfc/rfc4646.txt 
>>>
>>>see also 
>>><http://www.w3.org/International/articles/language-tags/Overview.en.php>http://www.w3.org/International/articles/language-tags/Overview.en.php 
>>>
>>>...and in particular the subtag mechanism, script, region, variant etc.
>>>It would be great to have those expressed explicitly.
>>>cheers,
>>>Dan
>>>
>>>Subject:
>>>Language Ontology
>>>From:
>>>"Salokhe, Gauri (KCEW)" 
>>><mailto:Gauri.Salokhe@FAO.ORG><Gauri.Salokhe@FAO.ORG>
>>>Date:
>>>Mon, 23 Apr 2007 17:28:39 +0200
>>>To:
>>><mailto:DC-GENERAL@JISCMAIL.AC.UK>DC-GENERAL@JISCMAIL.AC.UK
>>>To:
>>><mailto:DC-GENERAL@JISCMAIL.AC.UK>DC-GENERAL@JISCMAIL.AC.UK
>>>
>>>
>>>
>>>
>>>Dear All,
>>>
>>>
>>>
>>>
>>>We are working on creating Ontology for languages. The need came up as we
>>>
>>>
>>>
>>>tried to convert our XML metadata files into OWL. In our metadata (XML)
>>>
>>>
>>>
>>>records, we have three types of occurrences of language information.
>>>
>>>
>>>
>>>
>>><dc:language scheme="ags:ISO639-1">En</dc:language>
>>>
>>>
>>>
>>><dc:language scheme="dcterms:ISO639-2">eng</dc:language>
>>>
>>>
>>>
>>><dc:language>English</dc:language>
>>>
>>>
>>>
>>>
>>>
>>>We have two options for modelling the language ontology:
>>>
>>>
>>>
>>>
>>>1) Create a class for each language, assign URI to it and add all the other
>>>
>>>
>>>
>>>lexical variations, ISO codes (create datatype property) as follows:
>>>
>>>
>>>
>>>
>>>OWL:Thing
>>>
>>>
>>>
>>>|_ Class:Language
>>>
>>>
>>>
>>>         |_ Instance:URI1
>>>
>>>
>>>
>>>                 |_ rdfs:label <xml:lang=>xml:lang="en" English
>>>
>>>
>>>
>>>                 |_ rdfs:label <xml:lang=>xml:lang="es" InglÚs
>>>
>>>
>>>
>>>                 |_ rdfs:label <xml:lang=>xml:lang="it" Inglese
>>>
>>>
>>>
>>>                 |_ rdfs:label <xml:lang=>xml:lang="fr" Anglais
>>>
>>>
>>>
>>>                 |_ etc.
>>>
>>>
>>>
>>>                 |_ property:hasISO639-1Code  en (string)
>>>
>>>
>>>
>>>                 |_ property:hasISO639-2Code  eng (string)
>>>
>>>
>>>
>>>                 |_ etc.
>>>
>>>
>>>
>>>         |_ Instance:URI2
>>>
>>>
>>>
>>>         |_ Instance:URI3
>>>
>>>
>>>
>>>         |_ Instance:URI4
>>>
>>>
>>>
>>>
>>>
>>>2) Create Classes called Language and Language code and make links between
>>>
>>>
>>>
>>>instances of Language and Language Codes as follows:
>>>
>>>
>>>
>>>
>>>
>>>OWL:Thing
>>>
>>>
>>>
>>>|_ Class:Language
>>>
>>>
>>>
>>>         |_ Instance:URI1
>>>
>>>
>>>
>>>                 |_ property:hasCode  en  (link to the en instance of Class
>>>
>>>
>>>
>>>ISO639-1 below)
>>>
>>>
>>>
>>>                 |_ property:hasCode  eng  (link to the eng 
>>> instance of Class
>>>
>>>
>>>
>>>ISO639-1 below)
>>>
>>>
>>>
>>>
>>>|_ Class:LanguageCode
>>>
>>>
>>>
>>>         |_ SubClass ISO639-1
>>>
>>>
>>>
>>>                 |_ Instance:en
>>>
>>>
>>>
>>>                 |_ Instance:fr
>>>
>>>
>>>
>>>                 |_ etc.
>>>
>>>
>>>
>>>         |_ SubClass ISO639-2
>>>
>>>
>>>
>>>                 |_ Instance:eng
>>>
>>>
>>>
>>>                 |_ Instance:fra
>>>
>>>
>>>
>>>                 |_ etc.
>>>
>>>
>>>
>>>         |_ etc.
>>>
>>>
>>>
>>>
>>>Does anyone have similar experience with modelling in OWL? Any 
>>>suggestions on
>>>
>>>
>>>
>>>which model is better and (extensible)? Does an ontology already exist that
>>>
>>>
>>>
>>>we can reuse?
>>>
>>>
>>>
>>>
>>>Than you,
>>>
>>>
>>>
>>>Gauri
>>>
>>>
>>>
>>>
>>This email was sent to you by Reuters, the global news and 
>>information company.
>>To find out more about Reuters visit 
>><http://www.about.reuters.com>www.about.reuters.com
>>Any views expressed in this message are those of the individual 
>>sender, except where the sender specifically states them to be the 
>>views of Reuters Limited.
>>Reuters Limited is part of the Reuters Group of companies, of which 
>>Reuters Group PLC is the ultimate parent company. Reuters Group PLC 
>>- Registered office address: The Reuters Building, South Colonnade, 
>>Canary Wharf, London E14 5EP, United Kingdom
>>Registered No: 3296375
>>Registered in England and Wales
>_______________________________________________
>Ltru mailing list
>Ltru@ietf.org
>https://www1.ietf.org/mailman/listinfo/ltru
Received on Tuesday, 24 April 2007 01:42:15 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:13 GMT