Re: Language in schema.org

Hi alll

I'm forwarding a message from my colleague Elena (in CC), HTH


Dear Jeni, all,

In the Ontology Engineering Group (http://www.oeg-upm.net/) at the UPM 
(Universidad Politécnica de Madrid), Spain, we have been working on the 
representation of multilingual information in ontologies for some time now.

I would say that depending on the final needs of the application, there 
are several options currently available to represent multilingual 
information in ontologies.
The simplest (but also most limited option) is the one offered by RDF(S) 
or SKOS in which the scope of the language of the label can be 
restricted by the use of language tags (@en), similar to what Jeni 
illustrates below.

This has the inconvenient that if you have several labels per language, 
you cannot establish any explicit relations among them. For example, you 
cannot say that one label is the full form and the other label is the 
accronym, and cannot relate these lables to their respective translations.

SKOS-XL provides a kind of /ad hoc/ solution to this problem by 
providing the class skosxl:Label and giving labels the status of RDF 
first-order resources. Now, you can make statements about lables and 
also establish relations between them.

In the framework of the European project *Monnet *(Multilingual 
Ontologies for Networked Knowledge), we go a step further and propose a 
more principled link between ontology elements (classes, properties, 
etc.) and linguistic descriptions in what we have called the /*lemon 
*/model, a model developed with the purpose of representing linguistic 
descriptions associated to ontologies (see The lemon cookbook for a 
thorough description of the model at: http://lexinfo.net/).
By keeping the ontology separated from the linguistic model, both models 
can evolve idependently, and the linguistic descriptions can be as 
complex as required by the final application.

This is a modular model that consists of a simple core and several 
modules that you can use if you need to model certain linguistic 
properties of lables (terminology variation, morphology or decomposition 
of terms, syntactic frames, translation relations...).

If you just need to associate several labels in several languages to 
classes and properties, you may simply need to have the lemon core, and 
if you want to explicitly say that some labels are translations of 
others, you may want to use the translation module.
(See a paper on this we just presented at the Multilingual Semantic Web 
Workshop at ISWC: 
http://iswc2011.semanticweb.org/fileadmin/iswc/Papers/Workshops/MSW/Elena.pdf)

I hope this may give you a broader picture of the available 
representation possibilities for ontologies/vocabularies
Do not hesitate to contact us if you need more information on this.

Btw, we have just stated a W3C Community Group on Ontology-Lexica 
(http://www.w3.org/community/ontolex/) to precisely deal with these issues!!

Regards,
Elena.



On 25/10/2011 22:51, Jeni Tennison wrote:
> Hi,
>
> How should multi-lingual content be handled in schema.org expressed in microdata?
>
> Language is not part of the microdata data model, and microdata vocabularies must provide vocabulary-specific mechanisms for supporting values that have an associated language [1].
>
> The schema.org vocabulary supports publishers indicating the language of the content of a CreativeWork through the inLanguage property [2]. From what I can tell, that's the only language-related schema.org property.
>
> How does schema.org deal with multi-lingual values for other properties? For example, I have a web page [3] which lists items of legislation that are available in both English and Welsh; it has the markup (simplified for this example)
>
> <tr class="oddRow">
>    <td class="bilingual en">The A477 Trunk Road (Backe Road Junction to Llanddowror, Carmarthenshire) (Temporary Traffic Restrictions and Prohibition) Order 2011</td>
>    <td rowspan="2">
>      <a href="/wsi/2011/2469/contents/made">2011 No. 2469</a>
>    </td>
>    <td rowspan="2">Wales Statutory Instruments</td>
> </tr>
> <tr class="oddRow">
>    <td class="bilingual cy" lang="cy" xml:lang="cy">Gorchymyn Cefnffordd yr A477  (Cyffordd Ffordd Bace i Landdowror, Sir Gaerfyrddin) (Cyfyngiadau a Gwaharddiad Traffig Dros Dro) 2011</td>
> </tr>
>
> I'd like to indicate that these two table rows related to the same CreativeWork and that it has two titles, one in English and one in Welsh. There doesn't seem to be a way to do this in schema.org.
>
> One way that could work would be to introduce a http://schema.org/LanguageString (or something less horrendously named) type and use that as an acceptable value for any natural language property, such as name:
>
> <tr class="oddRow" itemscope itemtype="http://schema.org/CreativeWork" itemref="welsh">
>    <td class="bilingual en">
>      <span itemprop="name" itemscope itemtype="http://schema.org/LanguageString">
>        <meta itemprop="lang" content="en">
>        <span itemprop="value">The A477 Trunk Road (Backe Road Junction to Llanddowror, Carmarthenshire) (Temporary Traffic Restrictions and Prohibition) Order 2011</span>
>      </span>
>    </td>
>    <td rowspan="2">
>      <a itemprop="url" href="/wsi/2011/2469/contents/made">2011 No. 2469</a>
>    </td>
>    <td rowspan="2">Wales Statutory Instruments</td>
> </tr>
> <tr class="oddRow" id="welsh">
>    <td class="bilingual cy" lang="cy" xml:lang="cy">
>      <span itemprop="name" itemscope itemtype="http://schema.org/LanguageString">
>        <meta itemprop="lang" content="cy">
>        <span itemprop="value">Gorchymyn Cefnffordd yr A477  (Cyffordd Ffordd Bace i Landdowror, Sir Gaerfyrddin) (Cyfyngiadau a Gwaharddiad Traffig Dros Dro) 2011</span>
>      </span>
>    </td>
> </tr>
>
> Might schema.org introduce a LanguageString class or is there some other method of supplying the language of a property value that's supported by schema.org?
>
> Thanks,
>
> Jeni
>
> [1] http://www.w3.org/Bugs/Public/show_bug.cgi?id=14470#c1
> [2] http://schema.org/CreativeWork
> [3] http://www.legislation.gov.uk/wsi/2011

Received on Thursday, 27 October 2011 17:50:15 UTC