Languages as RDF resources Re: Could ISO-639 languages be defined as skos concepts? from Bernard Vatant on 2006-12-26 (public-esw-thes@w3.org from December 2006)

From: Bernard Vatant <bernard.vatant@mondeca.com>
Date: Tue, 26 Dec 2006 18:27:22 +0100
To: Sue Ellen Wright <sellenwright@gmail.com>
Cc: Felix Sasaki <fsasaki@w3.org>, Gerhard Budin <gerhard.budin@univie.ac.at>, Addison Phillips <addison@yahoo-inc.com>, Mark Davis <mark.davis@jtcsv.com>, Thomas Baker <baker@sub.uni-goettingen.de>, public-esw-thes@w3.org
Message-ID: <45915B7A.2050709@mondeca.com>
Hi all

To sum up things on a permanent place, I've created a page on ESW wiki 
to track this question
http://esw.w3.org/topic/Languages_as_RDF_Resources

Now I think I've caught the point made by Felix and Sue Ellen, thanks!
So I came up with an approach where Language (langtag) is defined as a 
class, but given the openness of combination of subtags, the instances 
of this class are not specified by an URI, but defined as anonymous 
resources to which subtag values are attached as properties. Subtags 
themselves are defined as SKOS concepts, with URIs based on the subtag 
type and value, and to which additional information can be added using 
the SKOS vocabulary, such as :

  <bcp47:LanguageSubtag rdf:about="#language-fr" skos:prefLabel="fr">
    <bcp47:suppressScript rdf:resource="#script-Latn"/>
    <skos:definition xml:lang="en">French</skos:definition>
    <skos:definition xml:lang="fr">Français</skos:definition>
  </bcp47:LanguageSubtag>

Definitions can be added in other languages of course.

The most important is the way to use in metadata, like in the following 
example. We have a document in language en-US (as far as I can figure) 
of which subject is the french as it is spoken in Québec.
The code for such a language should certainly be fr-CA. BCP 47 does not 
make provision for different flavours of canadian french. But note that 
the language being defined as an anonymous node, there is no absolute 
rule of identification. It's up to applications to decide the "same-ness 
rules". Some could use the "langtag" value, other who don't care about 
regional distinctions would rely on "primaryLanguage" value only.

  <foaf:Document rdf:about="http://en.wikipedia.org/wiki/Quebec_French">
    <dc:language>
      <bcp47:Language bcp47:langtag="en-US">
        <bcp47:primaryLanguage rdf:resource="#language-en"/>
        <bcp47:region rdf:resource="#region-US"/>
        <rdfs:label xml:lang="en">US English</rdfs:label>
        <rdfs:label xml:lang="fr">Anglais américain</rdfs:label>
      </bcp47:Language>
    </dc:language>
    <dc:subject>
      <bcp47:Language bcp47:langtag="fr-CA">
        <bcp47:primaryLanguage rdf:resource="#language-fr"/>
        <bcp47:region rdf:resource="#region-CA"/>
        <rdfs:label xml:lang="en">Quebec French</rdfs:label>
        <rdfs:label xml:lang="fr">Français québecois</rdfs:label>
      </bcp47:Language>
    </dc:subject>
  </foaf:Document>

A full RDF file with those examples is at
http://perso.orange.fr/universimmedia/lang/bcp47_sample.rdf

Waiting for comments of course, but certainly not before next year!

Best to all

Bernard

Sue Ellen Wright a écrit :
> Hi, All,
> I'm sure Felix will get back to us, but I know what he means about a 
> finite list. The RFC 4646 defines rules for generating language tags 
> based on the various code components that can be included. The 
> potential for possible combinations is huge, although, as the document 
> points out, some combinations are unrealistic or silly (Aleut as 
> spoken in Belgium is a great example.)
> Bye for now
> Sue Ellen
>
>  
> On 12/22/06, *Bernard Vatant* <bernard.vatant@mondeca.com 
> <mailto:bernard.vatant@mondeca.com>> wrote:
>
>     Hi Felix
>
>     Thanks for jumping in.
>
>     > I'm trying to understand what you want to achieve: Is it URIs for
>     > language values, e.g.
>     http://www.w3.org/2004/02/skos/language#en-US ?
>     >
>     Indeed. All the point is to identify and represent languages as
>     concepts, in order to be able to make RDF assertions about them,
>     beyond
>     the "tag" use.
>
>     > I don't think that it is feasible to have everything after "#" as an
>     > URI, since RFC 4646 or its successor define a grammar for
>     language tags.
>     >
>     Do you mean there is a technical issue forbidding to build valid URIs
>     out of language tags?
>     Not that although a single # namespace is the first idea which
>     comes to
>     mind it's not the only option.
>     Could be as well http://www.w3.org/2004/02/skos/language/en/US
>     <http://www.w3.org/2004/02/skos/language/en/US> or even
>     an opaque URI http://www.w3.org/2004/02/skos/language#1234
>     In any case subtag elements and other properties as revision date will
>     be explicitly attached as properties. You can't rely on the URI
>     string
>     to carry semantics. This is a "Semantic Web Axiom" :-)
>     > That is, you cannot have a finite set of URIs built out of that.
>     >
>     Sorry, I don't catch the point. What do you mean by a "finite set"?
>     Could you expand on that?
>     > Have you thought of registering an XPointer scheme at W3C? E.g.
>     > something like "language()" which can be used e.g. in
>     > http://www.w3.org/2004/02/skos/language#(en-US)
>     <http://www.w3.org/2004/02/skos/language#%28en-US%29> . You would
>     have to
>     > define that the scheme data "()" contains an BCP 47 identifier.
>     >
>     I think I see what you have in mind, but remember RDF is not mainly
>     about the structure of a published XML document, but about the
>     semantics
>     of URIs.
>     Besides the language values themselves, and even before, we need a
>     namespace for the ontology, the "Language" class, the different
>     "subtag"
>     properties etc.
>     And defining a namespace is  more or less dependent of the vocabulary
>     publication.
>     See e.g. http://www.w3.org/TR/swbp-vocab-pub/
>
>     Hope that helps, and that we don't speak cross each other.
>
>     Regards
>

-- 

*Bernard Vatant
*Knowledge Engineering
----------------------------------------------------
*Mondeca**
*3, cité Nollez 75018 Paris France
Web:    www.mondeca.com <http://www.mondeca.com>
----------------------------------------------------
Tel:       +33 (0) 871 488 459
Mail:     bernard.vatant@mondeca.com <mailto:bernard.vatant@mondeca.com>
Blog:    Leçons de Choses <http://mondeca.wordpress.com/>
Received on Tuesday, 26 December 2006 17:27:35 UTC