- From: Jeni Tennison <jeni@jenitennison.com>
- Date: Sat, 12 Nov 2011 20:45:22 +0000
- To: Toby Inkster <tai@g5n.co.uk>, Henri Sivonen <hsivonen@iki.fi>
- Cc: HTML Data Task Force WG <public-html-data-tf@w3.org>
(Henri, question for you embedded in here: aside from the movement of <link> and <meta> which we know about, is being a valid HTML5 document enough to avoid unanticipated movement of elements that might result in changes to a publisher's intended microdata/RDFa/microformats?)
On 12 Nov 2011, at 00:58, Toby Inkster wrote:
> A better example, which, although invalid, should be accepted but
> interpreted differently under HTML and XML parsers is:
>
> <div about="">
> <table about="#me">
> <p property="dc:title">Hello World</p>
> </table>
> </div>
>
> An HTML parser will lift the <p> element out of the table and place it
> as a child of the <div> (IIRC it will be inserted prior to the <table>)
> because <p> is not an allowed child of <table>.
>
> So in HTML+RDFa will get:
>
> <> dc:title "Hello World" .
> # and no triples about <#me>
>
> And in XHTML+RDFa:
>
> <#me> dc:title "Hello World" .
> # and no triples about <>
>
> Yay, fun. :-)
It looks as though this is something that hits microdata too. If you have:
<div itemscope itemid="#foo" itemtype="http://schema.org/CreativeWork">
<table itemscope itemid="#bar" itemtype="http://schema.org/CreativeWork">
<p itemprop="title">Hello World</p>
</table>
</div>
then the same re-jigging of the content causes the 'title' property to be associated with #foo rather than #bar.
I think this is probably worth documenting in the section on Good Publishing Practice [1] in the wiki. Is it enough to say that documents should be valid HTML(+RDFa/microdata) to avoid the potential for the movement of properties?
> What other fun ones can I think of...?
>
> <html about="#foo">
> <h1 property="dc:title">Hello World</h1>
> <p property="dc:description">A global greeting for all.</p>
> </html>
>
> Anyone want to guess the subject URI for the two triples when processed
> as HTML? Hint: it ain't <#foo>.
So this one expands to:
<html about="#foo">
<head></head>
<body>
<h1 property="dc:title">Hello World</h1>
<p property="dc:description">A global greeting for all.</p>
</body>
</html>
and because the <body> has no @about attribute, the title and description are about the base URI for the document rather than #foo, right?
(Hmm, validator.nu shows
<!DOCTYPE html>
<html>
<head><title></title></head>
<h1>Hello World</h1>
<p>A global greeting for all.</p>
</html>
as a valid HTML5 document despite it not having a <body> element, but perhaps that's a validator.nu bug…)
Cheers,
Jeni
[1] http://www.w3.org/wiki/Choosing_an_HTML_Data_Format#Good_Publishing_Practice
--
Jeni Tennison
http://www.jenitennison.com
Received on Saturday, 12 November 2011 20:45:42 UTC