W3C home > Mailing lists > Public > public-html-data-tf@w3.org > November 2011

Re: Type of schema.org/Text, and limitations of same

From: Jeni Tennison <jeni@jenitennison.com>
Date: Wed, 23 Nov 2011 08:00:05 +0000
Cc: public-vocabs <public-vocabs@w3.org>, HTML Data Task Force WG <public-html-data-tf@w3.org>
Message-Id: <AD2CD01D-EC22-4760-915A-C6D64DB4465E@jenitennison.com>
To: John Panzer <jpanzer@google.com>

On 23 Nov 2011, at 04:11, John Panzer wrote:
> Reading the implied restrictions on microdata, which schema.org operates in, I assume any property with type Text is supposed to be non-HTML and specifically limited to the results of "innerText" in the DOM.  If this is a correct interpretation:  How would one handle standard bold, italic, etc. markup inside properties such as articleBody (http://schema.org/Article)?  This seems like a fairly major limitation, in that receivers don't even necessarily have access to the original HTML content in order to perform their own transformations or sanitization.

I've made an attempt to write up the issues for potential submission as a bug on microdata here:


As I note in that text, the inability for microdata to capture HTML values has been raised as a bug before [1] and closed by Hixie. I think that to be persuaded that there is a need for capturing HTML structures, he would need to see evidence of consumers currently using HTML structures in whatever they do.

Is this something that schema.org processors currently do and if so how do they use the HTML content? Does it come through in Rich Snippets?



[1] http://www.w3.org/Bugs/Public/show_bug.cgi?id=13468
Jeni Tennison
Received on Wednesday, 23 November 2011 08:00:42 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 20:08:25 UTC