[Bug 14470] Microdata: Language handling from bugzilla@jessica.w3.org on 2011-11-11 (public-html-bugzilla@w3.org from November 2011)

From: <bugzilla@jessica.w3.org>
Date: Fri, 11 Nov 2011 20:01:21 +0000
To: public-html-bugzilla@w3.org
Message-Id: <E1ROxHt-00064R-H5@jessica.w3.org>

http://www.w3.org/Bugs/Public/show_bug.cgi?id=14470

--- Comment #8 from Ian 'Hixie' Hickson <ian@hixie.ch> 2011-11-11 20:01:18 UTC ---
(In reply to comment #7)
> A use case is that a search engine wants to bring together reviews and other
> information about films into film-centric pages. It gathers that information
> about that film from all over the web and wants to present people with reviews
> in their preferred language(s). This requires it to preserve information about
> the language of the reviews.

(I assume you mean aggregator, not search engine.)

The above can be solved today, you just need to include the language
information in the microdata:

   <p itemscope itemtype="http://example.com/movie/review">
    <span itemprop=text> bla bla bla </span>
    <meta itemprop=language content="en">
   </p>

It's redundant with lang="", but lang="" doesn't have the same coarseness as
microdata. Consider:

   <p itemscope itemtype="http://example.com/movie/review" lang="en">
    <span itemprop=text>
     <span lang="de">bla</span>
     <span lang="fr">bla</span>
    </span>
   </p>

What language would you associate with the "text" property?

Also, note that microdata isn't currently intended for handling cases where
entire blobs of HTML content are aggregated. For example, it would completely
fail with something like:

  <div itemprop=adcopy>
   <style scoped> em { color: purple } </style>    
   This product costs <s>$500</s> just $100!
   You should get <em>this</em> version, not any version.
  </p>

The microdata extraction would get:

   "adcopy": [ "\n    em { color: purple }     \n   This product costs $500
just $100!\n   You should get this version, not any version.\n  \n" ]

...which isn't at all what was intended.


> A perhaps more esoteric use case: translation services such as Google Translate
> might look for examples where the same information about an item was given in
> different languages as potential sources for improving its translation
> services.

Such a tool would presumably want intra-text language annotations, not just
coarse language annotations.


I think if we're to address the use cases presented, we need to add more than
just lang="" support; we need to add subtree support (which would give us
language support for free). I don't think it makes sense to make such a radical
addition so early in the technology's development. We should wait to see how
people are using it, first.

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Friday, 11 November 2011 20:01:26 UTC