Re: WOFF and extended metadata

>> Of course, even within Tal's proposal there's nothing to stop the author doing...
> 
> It's not just that. With the name and value stored in different nodes, we must rely on something else to pair them together and that seems to be the lang attribute.

No, <name> and <value> are paired by virtue of being children of the same <item>.

> So now we get to define what should happen when the pairing is broken e.g. using Tal's example:
> 
>  <item>
>   <name>
>    <text lang="en">Message</text>
>    <text lang="nl">Bericht</text>
>    <text lang="fr">Message</text> <!-- dangling property with no value -->
>   </name>
>   <value>
>    <text lang="en">Hello!</text>
>    <text lang="nl">Hallo!</text>
>    <text lang="jp">ohayoo</text> <!-- dangling value with no property name -->
>   </value>
>  </item>
> 
> One option is to show the French 'Message' with no value; another is to ignore it; yet another is be very strict and treat the entire item as invalid.

The lang attribute does not have anything to do with "pairing", and so there's nothing broken about this.

For each element that uses <text> subelements to carry localizable content, the UA will choose the best match from among the available languages. If the user's locale is French, but some elements don't have a "fr" localization, the UA will fall back to the next-best option. If it is aware of a hierarchy of language preferences, it can use that; if not, we should define what the last-resort fallback will be. My suggestion is to specify that if no "preferred" language is found, the UA is to use the first of the <text> elements; that means the author can determine what the fallback will be, simply by ordering the languages appropriately.

It is of course expected that authors will normally provide the same set of languages for all elements they localize, and so the UA will end up picking the same language every time. But nothing breaks if this pattern is not followed.

<extension lang="en"><!-- untagged subelements are English -->
  <item>
    <name>
      <text>Message</text>
      <text lang="nl">Bericht</text>
      <text lang="fr">Message</text>
    <name>
    <value>
      <text>Hello!</text>
      <text lang="nl">Hallo!</text>
      <text lang="fr">Salut!</text>
      <text lang="fr-CA">Bonjour!</text>
      <text lang="ja">こんにちは。</text>
    </value>
  <item>
</extension>

Japanese users would see the English label for this item, as no "ja" localization was provided for that, but they'd get their own version of the value. And French-Canadian users would see a customized value, but the generic French label. There's no "pairing" by language.

I don't see this as either confusing or fragile. The depth could be reduced by the alternate form:

<extension lang="en"><!-- untagged subelements are English -->
  <item>
    <name>Message</name>
    <name lang="nl">Bericht</name>
    <name lang="fr">Message</name>
    <value>Hello!</value>
    <value lang="nl">Hallo!</value>
    <value lang="fr">Salut!</value>
    <value lang="fr-CA">Bonjour!</value>
    <value lang="ja">こんにちは。</value>
  <item>
</extension>

but it seems to me that the consistent use of <text> subelements for all localizable content is conceptually simpler.

JK

Received on Wednesday, 2 June 2010 18:20:12 UTC