- From: Jeni Tennison <jeni@jenitennison.com>
- Date: Sat, 12 Nov 2011 20:45:22 +0000
- To: Toby Inkster <tai@g5n.co.uk>, Henri Sivonen <hsivonen@iki.fi>
- Cc: HTML Data Task Force WG <public-html-data-tf@w3.org>
(Henri, question for you embedded in here: aside from the movement of <link> and <meta> which we know about, is being a valid HTML5 document enough to avoid unanticipated movement of elements that might result in changes to a publisher's intended microdata/RDFa/microformats?) On 12 Nov 2011, at 00:58, Toby Inkster wrote: > A better example, which, although invalid, should be accepted but > interpreted differently under HTML and XML parsers is: > > <div about=""> > <table about="#me"> > <p property="dc:title">Hello World</p> > </table> > </div> > > An HTML parser will lift the <p> element out of the table and place it > as a child of the <div> (IIRC it will be inserted prior to the <table>) > because <p> is not an allowed child of <table>. > > So in HTML+RDFa will get: > > <> dc:title "Hello World" . > # and no triples about <#me> > > And in XHTML+RDFa: > > <#me> dc:title "Hello World" . > # and no triples about <> > > Yay, fun. :-) It looks as though this is something that hits microdata too. If you have: <div itemscope itemid="#foo" itemtype="http://schema.org/CreativeWork"> <table itemscope itemid="#bar" itemtype="http://schema.org/CreativeWork"> <p itemprop="title">Hello World</p> </table> </div> then the same re-jigging of the content causes the 'title' property to be associated with #foo rather than #bar. I think this is probably worth documenting in the section on Good Publishing Practice [1] in the wiki. Is it enough to say that documents should be valid HTML(+RDFa/microdata) to avoid the potential for the movement of properties? > What other fun ones can I think of...? > > <html about="#foo"> > <h1 property="dc:title">Hello World</h1> > <p property="dc:description">A global greeting for all.</p> > </html> > > Anyone want to guess the subject URI for the two triples when processed > as HTML? Hint: it ain't <#foo>. So this one expands to: <html about="#foo"> <head></head> <body> <h1 property="dc:title">Hello World</h1> <p property="dc:description">A global greeting for all.</p> </body> </html> and because the <body> has no @about attribute, the title and description are about the base URI for the document rather than #foo, right? (Hmm, validator.nu shows <!DOCTYPE html> <html> <head><title></title></head> <h1>Hello World</h1> <p>A global greeting for all.</p> </html> as a valid HTML5 document despite it not having a <body> element, but perhaps that's a validator.nu bug…) Cheers, Jeni [1] http://www.w3.org/wiki/Choosing_an_HTML_Data_Format#Good_Publishing_Practice -- Jeni Tennison http://www.jenitennison.com
Received on Saturday, 12 November 2011 20:45:42 UTC