W3C home > Mailing lists > Public > semantic-web@w3.org > April 2007

RE: [Ltru] RE: [Fwd: Language Ontology]

From: Debbie Garside <debbie@ictmarketing.co.uk>
Date: Tue, 24 Apr 2007 08:40:31 +0100
To: <jefsey@jefsey.com>, "'Elisa F. Kendall'" <ekendall@sandsoft.com>, "'Misha Wolf'" <Misha.Wolf@reuters.com>, <Gauri.Salokhe@FAO.ORG>, <maaya@funredes.org>, <ietf-languages@jefsey.com>
Cc: "'WWW International'" <www-international@w3.org>, "'Semantic web list'" <semantic-web@w3.org>
Message-ID: <E1HgFda-0002uY-Pe@aji.w3.org>
JFC wrote:
 
At 00:54 24/04/2007, Debbie Garside wrote:


Please be very careful with the use of the "Administrative Language"
information from ISO 3166-1.  It is incomplete and therefore not good data.


>>This is a a remark of TC37 sociolinguist expert!
>>As ISO 3166 users we do not want an exhaustive coverage of 20 or 30.000
language entities. We just need pragmatic 192 local government validated
information. 

If the data in ISO 3166-1 was validated by local government it would show 22
Administrative Languages for India.
 
>>If the Indian Gov said two are "administrative" and the South-African one
said eleven, this is their decision
 
They didn't say that.  Nobody asked them!
 
>??? Where do you want to get it documented then?
 
Most likely within the ISO "Standards as Database" initiative.
 
Debbie
 
 


  _____  

From: JFC Morfin [mailto:jefsey@jefsey.com] 
Sent: 24 April 2007 02:42
To: Debbie Garside; 'Elisa F. Kendall'; 'Misha Wolf'; Gauri.Salokhe@FAO.ORG;
maaya@funredes.org; ietf-languages@jefsey.com
Cc: 'WWW International'; 'Semantic web list'; 'LTRU Working Group'
Subject: Re: [Ltru] RE: [Fwd: Language Ontology]


Dear Debbie and all,
I suggest that everyone be extremely careful about the language ontologies
convergence issue, in order to avoid duplicate work. 

This mail is a personal mail. A project preliminary document I work on is to
be released hopefully next week. Its purpose is to permit stable and easy
interoperability in the ISO 3166 regalian issues, statistics, governments,
administration, standards, etc. areas. I will copy you its URL with the
associated working list for its further releases.

As you know, our need for an Internet multilingual distributed reference
system project has been delayed for several years by the lack of proper
support by ISO 3166, 639, 15924, 10646. They documented "names of
languages", etc. when we need stable and clear "concept of language", and
even  "emergence of language", to properly interlink related registries in
an ISO 11179 conformant way. Otherwise it is a nightmare !

The need has been perceived but not identified on the language ontology,
script subsection, region sub-subsection side (TC 37/ISO 639) where texts
now confuse "language names" and "languages" (standing for "language
concept"?) This has an impact on the whole series consistency, on what
macrolanguages are, etc. This then impacts on BCP47 with their review system
and their delay in supporting ISO 639-3. We also have uncertain echoes about
the final status of ISO CD 639 and its ISO 11179 conformance. I would be
really glad as a user if you could answer these two question: will 639 be
ISO 11179 conformant?  will it stabilize as a 3 letter code, or a mix of  2,
3 and 4 letter codes as BCP47?

Now, on the country oriented ontology with language subsection and script
sub-subsection, ISO 3166 addressed the issue in adding administrative
languages as a working example. We started working on it, following the ISO
3166 rules, and validated the concept, obtaining a very handy, consistent
and powerful easily interoperable solution. We will be able to complete it,
further on, with ISO 639 codes as soon as TC 37 and 46 have coordinated and
taken advantage from the ISO 3166-1:2006 experience and ISO 639-1 added the
five (at least) missing alpha2..

At 00:54 24/04/2007, Debbie Garside wrote:


Please be very careful with the use of the "Administrative Language"
information from ISO 3166-1.  It is incomplete and therefore not good data.


This is a a remark of TC37 sociolinguist expert!
As ISO 3166 users we do not want an exhaustive coverage of 20 or 30.000
language entities. We just need pragmatic 192 local government validated
information. 



For example, it shows only two "Administrative Languages" for India where
there are at least twenty-two.


If the Indian Gov said two are "administrative" and the South-African one
said eleven, this is their decision. 



I am hoping that this information will be taken out of the standard in the
near future. 


??? Where do you want to get it documented then?




 I am currently writing an ISO NWIP for a revision of ISO 3166-1 which will
include a proposal for the deletion of this data.


Another one?
I understand you just introduced a first NWIP to _add_ Internet related
information (in Unicode points?) to these data? 
It took decades to get that info there (I first called for it for the very
reasons you use ... in 1984). Now we start being happy using it and have
built a multilingual taxonomy on them, you want to delete them???
 
I am lost. Could you please clarify this? Thank you!
jfc



Best regards
 
Debbie Garside
Editor ISO DIS 639-6
www.geolang.com <BLOCKED::http://www.geolang.com>  



  _____  


From: www-international-request@w3.org [
<mailto:www-international-request@w3.org>
mailto:www-international-request@w3.org] On Behalf Of Elisa F. Kendall 

Sent: 23 April 2007 18:25 

To: Misha Wolf 

Cc: Gauri.Salokhe@FAO.ORG; WWW International; Semantic web list; LTRU
Working Group 

Subject: Re: [Fwd: Language Ontology] 

Hi Misha, 

We are very aware of it, and have been following the work, but I failed to
mention it in the email.  I should say that our ontology was developed for
offline use in an internal system, as an initial requirement.  Having said
that, if you look at the RFCs, they only describe tags, not an RDF
vocabulary or OWL ontology.  Our approach is compatible with the RFCs but
adds capabilities that support co-reference resolution, for example, in
target application. 

Best, 

Elisa 

Misha Wolf wrote: 


This sounds very worrying as you don't seem to be aware of BCP 47. 



Misha 

  _____  

From: www-international-request@w3.org [
<mailto:www-international-request@w3.org>
mailto:www-international-request@w3.org] On Behalf Of Elisa F. Kendall 

Sent: 23 April 2007 17:32 

To: Gauri.Salokhe@FAO.ORG 

Cc: 'WWW International'; Semantic web list 

Subject: Re: [Fwd: Language Ontology] 

Hi Gauri, 

We've done this for some of our government customers, using essentially the
second approach you cite.  We're also in the process of relating the
ontology to another one we've built to represent ISO 3166, which includes
the administrative languages used by countries and non-sovereign territories
represented in that standard. 

If you can hang out for a few days, we (Sandpiper) are just finalizing a
version that includes both ISO 639-1 and 639-2. The approach is more of a
hybrid of the two you present, based on customer needs.  It includes a
fragment of ISO 1087, and also some inverse relations since there is a
one-to-one correspondence between languages and codes.  We elected to create
a 'Language' class, rather than 'LanguageCode', which we reuse in other
applications; classes for Alpha-2Code and Alpha-3Code are subclasses of
CodeElement, from ISO 5127, with instances of these codes as first class
individuals. We use literals (via datatype properties) to represent the set
of English, French, and in the case of 639-1 Indigenous names.  We've also
created subclasses of Alpha-3Code to support distinctions between
bibliographic and terminologic, collective, and special identifiers, with
individual and macrolanguages to support 639-3.  A subsequent release will
include all of the languages described in ISO 639-3, as well as additions to
support at least some of the subtagging that Dan mentions, fyi.  Our intent
is to publish it on a new portal that will become part of a new service
offered by the Ontology PSIG in the OMG, since we've been asked to publish
several ontologies in recent RFPs.  I'll be happy to send our preliminary
version when it's "baked and tested", and follow up with an announcement of
the new portal (where a revision using OMG URIs will be posted) once that's
available.  It may be a couple of months before we're ready to make that
announcement, but we're hoping that the service will be useful to many of us
in the Semantic Web community.



Best regards, 

Elisa 

Dan Brickley wrote: 


Forwarding from the Dublin Core list, in case folk here can advise. 

Gauri, one thing I'd suggest as useful would be to take the concepts
implicit in RFC 4646, 



http://www.rfc-editor.org/rfc/rfc4646.txt 

see also
http://www.w3.org/International/articles/language-tags/Overview.en.php 

...and in particular the subtag mechanism, script, region, variant etc. 

It would be great to have those expressed explicitly. 

cheers, 

Dan 



Subject: 

Language Ontology 

From: 

"Salokhe, Gauri (KCEW)"  <mailto:Gauri.Salokhe@FAO.ORG>
<Gauri.Salokhe@FAO.ORG> 

Date: 

Mon, 23 Apr 2007 17:28:39 +0200 

To: 

DC-GENERAL@JISCMAIL.AC.UK 

To: 

DC-GENERAL@JISCMAIL.AC.UK








Dear All, 








We are working on creating Ontology for languages. The need came up

as we






tried to convert our XML metadata files into OWL. In our metadata

(XML)






records, we have three types of occurrences of language information. 








<dc:language

scheme="ags:ISO639-1">En</dc:language>






<dc:language

scheme="dcterms:ISO639-2">eng</dc:language>






<dc:language>English</dc:language>










We have two options for modelling the language ontology:








1) Create a class for each language, assign URI to it and add all the

other






lexical variations, ISO codes (create datatype property) as follows:








OWL:Thing






|_ Class:Language






        |_

Instance:URI1






        

        |_ rdfs:label

xml:lang="en" English






        

        |_ rdfs:label

xml:lang="es" InglÚs






        

        |_ rdfs:label

xml:lang="it" Inglese






        

        |_ rdfs:label

xml:lang="fr" Anglais






        

        |_ etc.






        

        |_

property:hasISO639-1Code  en (string)






        

        |_

property:hasISO639-2Code  eng (string)






        

        |_ etc.






        |_

Instance:URI2






        |_

Instance:URI3






        |_

Instance:URI4










2) Create Classes called Language and Language code and make links

between






instances of Language and Language Codes as follows:










OWL:Thing






|_ Class:Language






        |_

Instance:URI1






        

        |_

property:hasCode  en  (link to the en instance of Class






ISO639-1 below)






        

        |_

property:hasCode  eng  (link to the eng instance of Class






ISO639-1 below)








|_ Class:LanguageCode






        |_

SubClass ISO639-1






        

        |_ Instance:en






        

        |_ Instance:fr






        

        |_ etc.






        |_

SubClass ISO639-2






        

        |_ Instance:eng






        

        |_ Instance:fra






        

        |_ etc.






        |_

etc.








Does anyone have similar experience with modelling in OWL? Any

suggestions on






which model is better and (extensible)? Does an ontology already

exist that






we can reuse?








Than you, 






Gauri






  


This email was sent to you by Reuters, the global news and information
company. 

To find out more about Reuters visit www.about.reuters.com 

Any views expressed in this message are those of the individual sender,
except where the sender specifically states them to be the views of Reuters
Limited. 

Reuters Limited is part of the Reuters Group of companies, of which Reuters
Group PLC is the ultimate parent company. Reuters Group PLC - Registered
office address: The Reuters Building, South Colonnade, Canary Wharf, London
E14 5EP, United Kingdom 

Registered No: 3296375 

Registered in England and Wales 

_______________________________________________
Ltru mailing list
Ltru@ietf.org
https://www1.ietf.org/mailman/listinfo/ltru
Received on Tuesday, 24 April 2007 08:55:41 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 21:45:15 GMT