W3C home > Mailing lists > Public > public-vocabs@w3.org > October 2011

Re: Language in schema.org

From: Gregg Kellogg <gregg@kellogg-assoc.com>
Date: Tue, 25 Oct 2011 19:58:53 -0400
To: Jeni Tennison <jeni@jenitennison.com>
CC: Jason Douglas <jasondouglas@google.com>, "public-vocabs@w3.org" <public-vocabs@w3.org>, HTML Data Task Force WG <public-html-data-tf@w3.org>
Message-ID: <B81D9D49-EDD5-41D7-AB8A-0E54A1D52EFF@greggkellogg.net>
On Oct 25, 2011, at 2:16 PM, Jeni Tennison wrote:

> Jason,
> 
> Yes, I agree, but from what I think Hixie is saying, it's not conformant with microdata to use the HTML lang attribute to provide language information: it has to be explicitly indicated using a property in the vocabulary to carry the language.

I guess my take is a bit different. In RDF, Microdata processing allows certain forms of typed literals, and using for using in-scope @lang attributes to tag literals are currently defined, and would seem to be in order.  The JSON representation of Microdata doesn't carry any language information, and so depending on extracting language information from the value is what is not conformant, not that the values may carry extra information.

This becomes a statement about vocabulary requirements for Microdata: for a vocabulary to be compatible with Microdata and represent language information for string literals, it must provide semantic representations that include this language information, rather than rely upon information contained within the value.

Requiring that vocabularies provide semantic support for expressing language information would then be part of the guidance given to publishers when choosing a serialization format.

For course, another alternative would be for the Microdata JSON representation to support language-tagged values, as JSON-LD does, for example:

{
  "@context": { "@vocab": "http://schema.org/" },
  "name": [
    { "@lang": "en", "@literal": "The A477 Trunk Road (Backe Road Junction to Llanddowror, Carmarthenshire) (Temporary Traffic Restrictions and Prohibition) Order 2011"},
    { "@lang": "cy", "@literal": "Gorchymyn Cefnffordd yr A477  (Cyffordd Ffordd Bace i Landdowror, Sir Gaerfyrddin) (Cyfyngiadau a Gwaharddiad Traffig Dros Dro) 2011"}
  ]
}

Given that language information for values is available in the DOM, then it would seem that the Microdata API would allow developers working within the DOM to access this information, it's just the JSON serialization looses this information.

Gregg

> Jeni
> 
> On 25 Oct 2011, at 22:05, Jason Douglas wrote:
> 
>> I've been personally suggesting allowing multiple itemprops, even for unique properties, as long as they have different html5 lang= attribute values. Even supporting that capability on only Thing/name would cover a lot of use cases without much added complexity. 
>> 
>> -jason
>> 
>> On Tuesday, October 25, 2011, Jeni Tennison <jeni@jenitennison.com> wrote:
>>> Hi,
>>> 
>>> How should multi-lingual content be handled in schema.org expressed in microdata?
>>> 
>>> Language is not part of the microdata data model, and microdata vocabularies must provide vocabulary-specific mechanisms for supporting values that have an associated language [1].
>>> 
>>> The schema.org vocabulary supports publishers indicating the language of the content of a CreativeWork through the inLanguage property [2]. From what I can tell, that's the only language-related schema.org property.
>>> 
>>> How does schema.org deal with multi-lingual values for other properties? For example, I have a web page [3] which lists items of legislation that are available in both English and Welsh; it has the markup (simplified for this example)
>>> 
>>> <tr class="oddRow">
>>> <td class="bilingual en">The A477 Trunk Road (Backe Road Junction to Llanddowror, Carmarthenshire) (Temporary Traffic Restrictions and Prohibition) Order 2011</td>
>>> <td rowspan="2">
>>>   <a href="/wsi/2011/2469/contents/made">2011 No. 2469</a>
>>> </td>
>>> <td rowspan="2">Wales Statutory Instruments</td>
>>> </tr>
>>> <tr class="oddRow">
>>> <td class="bilingual cy" lang="cy" xml:lang="cy">Gorchymyn Cefnffordd yr A477  (Cyffordd Ffordd Bace i Landdowror, Sir Gaerfyrddin) (Cyfyngiadau a Gwaharddiad Traffig Dros Dro) 2011</td>
>>> </tr>
>>> 
>>> I'd like to indicate that these two table rows related to the same CreativeWork and that it has two titles, one in English and one in Welsh. There doesn't seem to be a way to do this in schema.org.
>>> 
>>> One way that could work would be to introduce a http://schema.org/LanguageString (or something less horrendously named) type and use that as an acceptable value for any natural language property, such as name:
>>> 
>>> <tr class="oddRow" itemscope itemtype="http://schema.org/CreativeWork" itemref="welsh">
>>> <td class="bilingual en">
>>>   <span itemprop="name" itemscope itemtype="http://schema.org/LanguageString">
>>>     <meta itemprop="lang" content="en">
>>>     <span itemprop="value">The A477 Trunk Road (Backe Road Junction to Llanddowror, Carmarthenshire) (Temporary Traffic Restrictions and Prohibition) Order 2011</span>
>>>   </span>
>>> </td>
>>> <td rowspan="2">
>>>   <a itemprop="url" href="/wsi/2011/2469/contents/made">2011 No. 2469</a>
>>> </td>
>>> <td rowspan="2">Wales Statutory Instruments</td>
>>> </tr>
>>> <tr class="oddRow" id="welsh">
>>> <td class="bilingual cy" lang="cy" xml:lang="cy">
>>>   <span itemprop="name" itemscope itemtype="http://schema.org/LanguageString">
>>>     <meta itemprop="lang" content="cy">
>>>     <span itemprop="value">Gorchymyn Cefnffordd yr A477  (Cyffordd Ffordd Bace i Landdowror, Sir Gaerfyrddin) (Cyfyngiadau a Gwaharddiad Traffig Dros Dro) 2011</span>
>>>   </span>
>>> </td>
>>> </tr>
>>> 
>>> Might schema.org introduce a LanguageString class or is there some other method of supplying the language of a property value that's supported by schema.org?
>>> 
>>> Thanks,
>>> 
>>> Jeni
>>> 
>>> [1] http://www.w3.org/Bugs/Public/show_bug.cgi?id=14470#c1
>>> [2] http://schema.org/CreativeWork
>>> [3] http://www.legislation.gov.uk/wsi/2011
>>> --
>>> Jeni Tennison
>>> http://www.jenitennison.com
>>> 
>>> 
>>> 
> 
> -- 
> Jeni Tennison
> http://www.jenitennison.com
> 
> 
Received on Tuesday, 25 October 2011 23:59:33 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 22 May 2012 06:48:56 GMT