W3C home > Mailing lists > Public > public-vocabs@w3.org > October 2011

Re: Language in schema.org

From: Boris Villazón Terrazas <bvillazon@fi.upm.es>
Date: Fri, 28 Oct 2011 17:24:46 +0200
Message-ID: <4EAAC93D.9060908@fi.upm.es>
To: Martin Hepp <martin.hepp@ebusiness-unibw.org>
CC: Jeni Tennison <jeni@jenitennison.com>, public-vocabs@w3.org, HTML Data Task Force WG <public-html-data-tf@w3.org>, Elena Montiel <emontiel@fi.upm.es>
Hi Martin

You are right, there are two distinct problems and we need *simple* 
solutions for them.
Elena's proposals are good for the second problem (#2) and she's working 
for improving and making them simpler.

Regarding problem #1, you are the expert and I agree with you  ...

thanks and best

Boris



On 28/10/2011 9:44, Martin Hepp wrote:
> Hi all:
>
> I think the important thing in here is that there are two distinct problems:
>
> 1. We need a simple mechanism for indicating the natural language of text literals.
> 2. There may be more advanced cases where you want to indicate the natural language of a structured value as a whole, or of any other entity that is not a text literal. This includes the cases where you want the text element to be an identifiable entity, which literals are not.
>
> For #2, there are different approaches possible and it is not even clear whether this must be solved at the Microdata spec level; it could also be a vocabulary-specific solution.
>
> But for #1, we should be, IMO,
>
> 1. as close as possible to the way it is handled in RDFa
> 2. reuse the same mechanisms employed for indicating the language of visible content.
>
> Language information is very valuable for any NLP-based operation on semi-structured data, so we should provide a stable mechanism for at least the simple cases.
>
> Best
>
> Martin
>
>
> On Oct 27, 2011, at 7:50 PM, Boris Villazón Terrazas wrote:
>
>> Hi alll
>>
>> I'm forwarding a message from my colleague Elena (in CC), HTH
>>
>>
>> Dear Jeni, all,
>>
>> In the Ontology Engineering Group (http://www.oeg-upm.net/) at the UPM (Universidad Politécnica de Madrid), Spain, we have been working on the representation of multilingual information in ontologies for some time now.
>>
>> I would say that depending on the final needs of the application, there are several options currently available to represent multilingual information in ontologies.
>> The simplest (but also most limited option) is the one offered by RDF(S) or SKOS in which the scope of the language of the label can     be restricted by the use of language tags (@en), similar to what Jeni illustrates below.
>>
>> This has the inconvenient that if you have several labels per language, you cannot establish any explicit relations among them. For example, you cannot say that one label is the full form and the other label is the accronym, and cannot relate these lables to their     respective translations.
>>
>> SKOS-XL provides a kind of ad hoc solution to this problem by providing the class skosxl:Label and giving labels the status of RDF first-order resources. Now, you can make statements about lables and also establish relations between them.
>>
>> In the framework of the European project Monnet (Multilingual Ontologies for Networked Knowledge), we go a step further and propose a more principled link between ontology elements (classes, properties, etc.) and linguistic descriptions in what we have called the lemon model, a model developed with the purpose of representing linguistic descriptions associated to ontologies (see The lemon cookbook for a thorough description of the model at: http://lexinfo.net/).
>> By keeping the ontology separated from the linguistic model, both models can evolve idependently, and the linguistic descriptions can     be as complex as required by the final application.
>>
>> This is a modular model that consists of a simple core and several modules that you can use if you need to model certain linguistic properties of lables (terminology variation, morphology or decomposition of terms, syntactic frames, translation relations...).
>>
>> If you just need to associate several labels in several languages to classes and properties, you may simply need to have the lemon core, and if you want to explicitly say that some labels are translations of others, you may want to use the translation module.
>> (See a paper on this we just presented at the Multilingual Semantic Web Workshop at ISWC: http://iswc2011.semanticweb.org/fileadmin/iswc/Papers/Workshops/MSW/Elena.pdf)
>>
>> I hope this may give you a broader picture of the available representation possibilities for ontologies/vocabularies
>> Do not hesitate to contact us if you need more information on this.
>>
>> Btw, we have just stated a W3C Community Group on Ontology-Lexica (http://www.w3.org/community/ontolex/) to precisely deal with these issues!!
>>
>> Regards,
>> Elena.
>>
>>
>>
>> On 25/10/2011 22:51, Jeni Tennison wrote:
>>> Hi,
>>>
>>> How should multi-lingual content be handled in schema.org expressed in microdata?
>>>
>>> Language is not part of the microdata data model, and microdata vocabularies must provide vocabulary-specific mechanisms for supporting values that have an associated language [1].
>>>
>>> The schema.org vocabulary supports publishers indicating the language of the content of a CreativeWork through the inLanguage property [2]. From what I can tell, that's the only language-related schema.org property.
>>>
>>> How does schema.org deal with multi-lingual values for other properties? For example, I have a web page [3] which lists items of legislation that are available in both English and Welsh; it has the markup (simplified for this example)
>>>
>>> <tr class="oddRow">
>>>    <td class="bilingual en">The A477 Trunk Road (Backe Road Junction to Llanddowror, Carmarthenshire) (Temporary Traffic Restrictions and Prohibition) Order 2011</td>
>>>    <td rowspan="2">
>>>      <a href="/wsi/2011/2469/contents/made">2011 No. 2469</a>
>>>    </td>
>>>    <td rowspan="2">Wales Statutory Instruments</td>
>>> </tr>
>>> <tr class="oddRow">
>>>    <td class="bilingual cy" lang="cy" xml:lang="cy">Gorchymyn Cefnffordd yr A477  (Cyffordd Ffordd Bace i Landdowror, Sir Gaerfyrddin) (Cyfyngiadau a Gwaharddiad Traffig Dros Dro) 2011</td>
>>> </tr>
>>>
>>> I'd like to indicate that these two table rows related to the same CreativeWork and that it has two titles, one in English and one in Welsh. There doesn't seem to be a way to do this in schema.org.
>>>
>>> One way that could work would be to introduce a
>>> http://schema.org/LanguageString
>>>   (or something less horrendously named) type and use that as an acceptable value for any natural language property, such as name:
>>>
>>> <tr class="oddRow" itemscope itemtype=
>>> "http://schema.org/CreativeWork"
>>>   itemref="welsh">
>>>    <td class="bilingual en">
>>>      <span itemprop="name" itemscope itemtype=
>>> "http://schema.org/LanguageString"
>>>        <meta itemprop="lang" content="en">
>>>        <span itemprop="value">The A477 Trunk Road (Backe Road Junction to Llanddowror, Carmarthenshire) (Temporary Traffic Restrictions and Prohibition) Order 2011</span>
>>>      </span>
>>>    </td>
>>>    <td rowspan="2">
>>>      <a itemprop="url" href="/wsi/2011/2469/contents/made">2011 No. 2469</a>
>>>    </td>
>>>    <td rowspan="2">Wales Statutory Instruments</td>
>>> </tr>
>>> <tr class="oddRow" id="welsh">
>>>    <td class="bilingual cy" lang="cy" xml:lang="cy">
>>>      <span itemprop="name" itemscope itemtype=
>>> "http://schema.org/LanguageString"
>>>        <meta itemprop="lang" content="cy">
>>>        <span itemprop="value">Gorchymyn Cefnffordd yr A477  (Cyffordd Ffordd Bace i Landdowror, Sir Gaerfyrddin) (Cyfyngiadau a Gwaharddiad Traffig Dros Dro) 2011</span>
>>>      </span>
>>>    </td>
>>> </tr>
>>>
>>> Might schema.org introduce a LanguageString class or is there some other method of supplying the language of a property value that's supported by schema.org?
>>>
>>> Thanks,
>>>
>>> Jeni
>>>
>>> [1]
>>> http://www.w3.org/Bugs/Public/show_bug.cgi?id=14470#c1
>>>
>>> [2]
>>> http://schema.org/CreativeWork
>>>
>>> [3]
>>> http://www.legislation.gov.uk/wsi/2011
>
Received on Friday, 28 October 2011 15:25:07 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 22 May 2012 06:48:56 GMT