Re: Type of schema.org/Text, and limitations of same from Jeni Tennison on 2011-11-23 (public-vocabs@w3.org from November 2011)

From: Jeni Tennison <jeni@jenitennison.com>
Date: Wed, 23 Nov 2011 08:00:05 +0000
To: John Panzer <jpanzer@google.com>
Cc: public-vocabs <public-vocabs@w3.org>, HTML Data Task Force WG <public-html-data-tf@w3.org>
Message-Id: <AD2CD01D-EC22-4760-915A-C6D64DB4465E@jenitennison.com>

John,

On 23 Nov 2011, at 04:11, John Panzer wrote:
> Reading the implied restrictions on microdata, which schema.org operates in, I assume any property with type Text is supposed to be non-HTML and specifically limited to the results of "innerText" in the DOM.  If this is a correct interpretation:  How would one handle standard bold, italic, etc. markup inside properties such as articleBody (http://schema.org/Article)?  This seems like a fairly major limitation, in that receivers don't even necessarily have access to the original HTML content in order to perform their own transformations or sanitization.

I've made an attempt to write up the issues for potential submission as a bug on microdata here:

  http://www.w3.org/wiki/HTML_Data_Improvements#Structured_Values

As I note in that text, the inability for microdata to capture HTML values has been raised as a bug before [1] and closed by Hixie. I think that to be persuaded that there is a need for capturing HTML structures, he would need to see evidence of consumers currently using HTML structures in whatever they do.

Is this something that schema.org processors currently do and if so how do they use the HTML content? Does it come through in Rich Snippets?

Thanks,

Jeni

[1] http://www.w3.org/Bugs/Public/show_bug.cgi?id=13468
-- 
Jeni Tennison
http://www.jenitennison.com

Received on Wednesday, 23 November 2011 08:00:45 UTC