W3C home > Mailing lists > Public > www-international@w3.org > April to June 2007

Re: [Fwd: Language Ontology]

From: Felix Sasaki <fsasaki@w3.org>
Date: Tue, 24 Apr 2007 15:51:35 +0900
Message-ID: <462DA8F7.1080209@w3.org>
To: "Elisa F. Kendall" <ekendall@sandsoft.com>
CC: Debbie Garside <md@ictenterprise.co.uk>, 'WWW International' <www-international@w3.org>, 'Semantic web list' <semantic-web@w3.org>, 'LTRU Working Group' <ltru@ietf.org>

Hello Elisa,

Elisa F. Kendall wrote:
> Hi Debbie,
>
> Thanks for the warning.  We did know that it was incomplete, but are 
> interested in representations of place names in local languages, so 
> having a structure for capturing this information, even if incomplete, 
> is useful.

Debbie might expect that I point you to this: CLDR [1] already has such 
as structure, and the structure is filled with region (and other) names 
in many "locales". See an excerpt of locale display names for English below:

<ldml>
    <identity> [...] <language type="en"/>
    </identity>
    <localeDisplayNames>
        <languages>
            <language type="de">German</language> [...] </languages>
        <scripts>
            <script type="Latn">Latin</script> [...] </scripts>
        <territories>
            <territory type="DE">Germany</territory> [...] </territories>
        <variants>
            <variant type="1901">Traditional German orthography</variant>
            <variant type="1996">German orthography of 1996</variant> 
[...] </variants>
    </localeDisplayNames>

you might want to see if this is useful for your efforts.

Regards, Felix.

[1] http://unicode.org/cldr/index.html

>   We're also looking at other government and research community 
> resources to assist with both structure and content.  If you have 
> suggestions for references, that would be helpful.
>
> Best regards,
>
> Elisa
>
> Debbie Garside wrote:
>> Please be very careful with the use of the "Administrative Language" 
>> information from ISO 3166-1.  It is incomplete and therefore not good 
>> data.
>>  
>> For example, it shows only two "Administrative Languages" for India 
>> where there are at least twenty-two.  I am hoping that this 
>> information will be taken out of the standard in the near future.  I 
>> am currently writing an ISO NWIP for a revision of ISO 3166-1 which 
>> will include a proposal for the deletion of this data.
>>  
>> Best regards
>>  
>> Debbie Garside
>> Editor ISO DIS 639-6
>> www.geolang.com <http://www.geolang.com>
>>
>>     ------------------------------------------------------------------------
>>     *From:* www-international-request@w3.org
>>     [mailto:www-international-request@w3.org] *On Behalf Of *Elisa F.
>>     Kendall
>>     *Sent:* 23 April 2007 18:25
>>     *To:* Misha Wolf
>>     *Cc:* Gauri.Salokhe@FAO.ORG; WWW International; Semantic web
>>     list; LTRU Working Group
>>     *Subject:* Re: [Fwd: Language Ontology]
>>
>>     Hi Misha,
>>
>>     We are very aware of it, and have been following the work, but I
>>     failed to mention it in the email.  I should say that our
>>     ontology was developed for offline use in an internal system, as
>>     an initial requirement.  Having said that, if you look at the
>>     RFCs, they only describe tags, not an RDF vocabulary or OWL
>>     ontology.  Our approach is compatible with the RFCs but adds
>>     capabilities that support co-reference resolution, for example,
>>     in target application.
>>
>>     Best,
>>
>>     Elisa
>>
>>     Misha Wolf wrote:
>>>     This sounds very worrying as you don't seem to be aware of BCP 47.
>>>      
>>>     Misha
>>>
>>>     ------------------------------------------------------------------------
>>>     *From:* www-international-request@w3.org
>>>     [mailto:www-international-request@w3.org] *On Behalf Of *Elisa
>>>     F. Kendall
>>>     *Sent:* 23 April 2007 17:32
>>>     *To:* Gauri.Salokhe@FAO.ORG
>>>     *Cc:* 'WWW International'; Semantic web list
>>>     *Subject:* Re: [Fwd: Language Ontology]
>>>
>>>     Hi Gauri,
>>>
>>>     We've done this for some of our government customers, using
>>>     essentially the second approach you cite.  We're also in the
>>>     process of relating the ontology to another one we've built to
>>>     represent ISO 3166, which includes the administrative languages
>>>     used by countries and non-sovereign territories  represented in
>>>     that standard.
>>>
>>>     If you can hang out for a few days, we (Sandpiper) are just
>>>     finalizing a version that includes both ISO 639-1 and 639-2. The
>>>     approach is more of a hybrid of the two you present, based on
>>>     customer needs.  It includes a fragment of ISO 1087, and also
>>>     some inverse relations since there is a one-to-one
>>>     correspondence between languages and codes.  We elected to
>>>     create a 'Language' class, rather than 'LanguageCode', which we
>>>     reuse in other applications; classes for Alpha-2Code and
>>>     Alpha-3Code are subclasses of CodeElement, from ISO 5127, with
>>>     instances of these codes as first class individuals. We use
>>>     literals (via datatype properties) to represent the set of
>>>     English, French, and in the case of 639-1 Indigenous names. 
>>>     We've also created subclasses of Alpha-3Code to support
>>>     distinctions between bibliographic and terminologic, collective,
>>>     and special identifiers, with individual and macrolanguages to
>>>     support 639-3.  A subsequent release will include all of the
>>>     languages described in ISO 639-3, as well as additions to
>>>     support at least some of the subtagging that Dan mentions, fyi. 
>>>     Our intent is to publish it on a new portal that will become
>>>     part of a new service offered by the Ontology PSIG in the OMG,
>>>     since we've been asked to publish several ontologies in recent
>>>     RFPs.  I'll be happy to send our preliminary version when it's
>>>     "baked and tested", and follow up with an announcement of the
>>>     new portal (where a revision using OMG URIs will be posted) once
>>>     that's available.  It may be a couple of months before we're
>>>     ready to make that announcement, but we're hoping that the
>>>     service will be useful to many of us in the Semantic Web community.
>>>
>>>     Best regards,
>>>
>>>     Elisa
>>>
>>>     Dan Brickley wrote:
>>>>
>>>>     Forwarding from the Dublin Core list, in case folk here can
>>>>     advise.
>>>>
>>>>     Gauri, one thing I'd suggest as useful would be to take the
>>>>     concepts implicit in RFC 4646,
>>>>
>>>>     http://www.rfc-editor.org/rfc/rfc4646.txt
>>>>     see also
>>>>     http://www.w3.org/International/articles/language-tags/Overview.en.php
>>>>
>>>>
>>>>     ...and in particular the subtag mechanism, script, region,
>>>>     variant etc.
>>>>
>>>>     It would be great to have those expressed explicitly.
>>>>
>>>>     cheers,
>>>>
>>>>     Dan
>>>>
>>>>     ------------------------------------------------------------------------
>>>>
>>>>     Subject:
>>>>     Language Ontology
>>>>     From:
>>>>     "Salokhe, Gauri (KCEW)" <Gauri.Salokhe@FAO.ORG>
>>>>     Date:
>>>>     Mon, 23 Apr 2007 17:28:39 +0200
>>>>     To:
>>>>     DC-GENERAL@JISCMAIL.AC.UK
>>>>
>>>>     To:
>>>>     DC-GENERAL@JISCMAIL.AC.UK
>>>>
>>>>
>>>>     Dear All, 
>>>>
>>>>     We are working on creating Ontology for languages. The need came up as we
>>>>     tried to convert our XML metadata files into OWL. In our metadata (XML)
>>>>     records, we have three types of occurrences of language information. 
>>>>
>>>>     <dc:language scheme="ags:ISO639-1">En</dc:language>
>>>>     <dc:language scheme="dcterms:ISO639-2">eng</dc:language>
>>>>     <dc:language>English</dc:language>
>>>>
>>>>
>>>>     We have two options for modelling the language ontology:
>>>>
>>>>     1) Create a class for each language, assign URI to it and add all the other
>>>>     lexical variations, ISO codes (create datatype property) as follows:
>>>>
>>>>     OWL:Thing
>>>>     |_ Class:Language
>>>>     	|_ Instance:URI1
>>>>     		|_ rdfs:label xml:lang="en" English
>>>>     		|_ rdfs:label xml:lang="es" InglÚs
>>>>     		|_ rdfs:label xml:lang="it" Inglese
>>>>     		|_ rdfs:label xml:lang="fr" Anglais
>>>>     		|_ etc.
>>>>     		|_ property:hasISO639-1Code  en (string)
>>>>     		|_ property:hasISO639-2Code  eng (string)
>>>>     		|_ etc.
>>>>     	|_ Instance:URI2
>>>>     	|_ Instance:URI3
>>>>     	|_ Instance:URI4
>>>>
>>>>
>>>>     2) Create Classes called Language and Language code and make links between
>>>>     instances of Language and Language Codes as follows:
>>>>
>>>>
>>>>     OWL:Thing
>>>>     |_ Class:Language
>>>>     	|_ Instance:URI1
>>>>     		|_ property:hasCode  en  (link to the en instance of Class
>>>>     ISO639-1 below)
>>>>     		|_ property:hasCode  eng  (link to the eng instance of Class
>>>>     ISO639-1 below)
>>>>
>>>>     |_ Class:LanguageCode
>>>>     	|_ SubClass ISO639-1
>>>>     		|_ Instance:en
>>>>     		|_ Instance:fr
>>>>     		|_ etc.
>>>>     	|_ SubClass ISO639-2
>>>>     		|_ Instance:eng
>>>>     		|_ Instance:fra
>>>>     		|_ etc.
>>>>     	|_ etc.
>>>>
>>>>     Does anyone have similar experience with modelling in OWL? Any suggestions on
>>>>     which model is better and (extensible)? Does an ontology already exist that
>>>>     we can reuse?
>>>>
>>>>     Than you, 
>>>>     Gauri
>>>>       
>>>
>>>     This email was sent to you by Reuters, the global news and
>>>     information company.
>>>     To find out more about Reuters visit www.about.reuters.com
>>>
>>>     Any views expressed in this message are those of the individual
>>>     sender, except where the sender specifically states them to be
>>>     the views of Reuters Limited.
>>>
>>>     Reuters Limited is part of the Reuters Group of companies, of
>>>     which Reuters Group PLC is the ultimate parent company. Reuters
>>>     Group PLC - Registered office address: The Reuters Building,
>>>     South Colonnade, Canary Wharf, London E14 5EP, United Kingdom
>>>     Registered No: 3296375
>>>     Registered in England and Wales
>>>
Received on Tuesday, 24 April 2007 06:51:47 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:13 GMT