W3C home > Mailing lists > Public > public-html-data-tf@w3.org > November 2011

Re: RDFa in HTML vs XHTML

From: Jeni Tennison <jeni@jenitennison.com>
Date: Sat, 12 Nov 2011 20:45:22 +0000
Cc: HTML Data Task Force WG <public-html-data-tf@w3.org>
Message-Id: <65AAD1CE-2D3A-46C6-A670-4CE00195190F@jenitennison.com>
To: Toby Inkster <tai@g5n.co.uk>, Henri Sivonen <hsivonen@iki.fi>
(Henri, question for you embedded in here: aside from the movement of <link> and <meta> which we know about, is being a valid HTML5 document enough to avoid unanticipated movement of elements that might result in changes to a publisher's intended microdata/RDFa/microformats?)

On 12 Nov 2011, at 00:58, Toby Inkster wrote:
> A better example, which, although invalid, should be accepted but
> interpreted differently under HTML and XML parsers is:
> 
> 	<div about="">
> 	  <table about="#me">
> 	    <p property="dc:title">Hello World</p>
> 	  </table>
> 	</div>
> 
> An HTML parser will lift the <p> element out of the table and place it
> as a child of the <div> (IIRC it will be inserted prior to the <table>)
> because <p> is not an allowed child of <table>.
> 
> So in HTML+RDFa will get:
> 
> 	<> dc:title "Hello World" .
> 	# and no triples about <#me>
> 
> And in XHTML+RDFa:
> 
> 	<#me> dc:title "Hello World" .
> 	# and no triples about <>
> 
> Yay, fun. :-)

It looks as though this is something that hits microdata too. If you have:

  <div itemscope itemid="#foo" itemtype="http://schema.org/CreativeWork">
    <table itemscope itemid="#bar" itemtype="http://schema.org/CreativeWork">
      <p itemprop="title">Hello World</p>
    </table>
  </div>

then the same re-jigging of the content causes the 'title' property to be associated with #foo rather than #bar.

I think this is probably worth documenting in the section on Good Publishing Practice [1] in the wiki. Is it enough to say that documents should be valid HTML(+RDFa/microdata) to avoid the potential for the movement of properties?

> What other fun ones can I think of...?
> 
> 	<html about="#foo">
> 	  <h1 property="dc:title">Hello World</h1>
> 	  <p property="dc:description">A global greeting for all.</p>
> 	</html>
> 
> Anyone want to guess the subject URI for the two triples when processed
> as HTML? Hint: it ain't <#foo>.


So this one expands to:

  <html about="#foo">
    <head></head>
    <body>
      <h1 property="dc:title">Hello World</h1>
      <p property="dc:description">A global greeting for all.</p>
    </body>
  </html>

and because the <body> has no @about attribute, the title and description are about the base URI for the document rather than #foo, right?

(Hmm, validator.nu shows 

 <!DOCTYPE html>
 <html>
  <head><title></title></head>
  <h1>Hello World</h1>
  <p>A global greeting for all.</p>
 </html>

as a valid HTML5 document despite it not having a <body> element, but perhaps that's a validator.nu bug…)

Cheers,

Jeni

[1] http://www.w3.org/wiki/Choosing_an_HTML_Data_Format#Good_Publishing_Practice
-- 
Jeni Tennison
http://www.jenitennison.com
Received on Saturday, 12 November 2011 20:45:42 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Saturday, 12 November 2011 20:45:42 GMT