Re: RDFa in HTML vs XHTML from Jeni Tennison on 2011-11-14 (public-html-data-tf@w3.org from November 2011)

From: Jeni Tennison <jeni@jenitennison.com>
Date: Mon, 14 Nov 2011 19:29:21 +0000
To: HTML Data Task Force WG <public-html-data-tf@w3.org>
Cc: Henri Sivonen <hsivonen@iki.fi>, Toby Inkster <tai@g5n.co.uk>
Message-Id: <F63971EF-954B-46C8-BB38-1CF1CCD6C783@jenitennison.com>

Hi,

I've written some text warning people about potential restructuring of invalid HTML [1] which I've reproduced below.

I haven't mentioned the issue around omitted tags for <head> and <body>, which having thought about it I think is a HTML+RDFa bug. It is, after all, HTML+RDFa which introduces the rules that rely on the presence of head/body [2]:

* In Section 7.5: Sequence, processing step 6, if no URI is provided by a resource attribute, then first check to see if the element is the head or body element. If it is, then act as if there is an empty @about present, and process it according to the rule for @about.
* In Section 7.5: Sequence, processing step 7, if no URI is provided, then first check to see if the element is the head or body element. If it is, then act as if there is an empty @about present, and process it according to the rule for @about.

I think the solution is probably to add a rule that RDFa attributes such as @about aren't permitted on the <html> element.

What do you think, worth raising?

Jeni

[1] http://www.w3.org/wiki/Choosing_an_HTML_Data_Format#Good_Publishing_Practice
[2] http://dev.w3.org/html5/rdfa/#additional-rdfa-processing-rules

---

Valid HTML is particularly important in pages that contain embedded markup. All methods of embedding data within HTML use the structure of the HTML to determine the meaning of the additional markup. For example, the item to which an element with an @itemprop attribute assigns a property is usually the closest ancestor element with a @itemscope attribute.

In some cases, elements can be moved when HTML is parsed into a DOM. This can lead to properties unexpectedly referring to the wrong entity, and, if you are serving your documents as XHTML (with a application/xhtml+xml mime type), it can cause discrepancies between the data gleaned by XML-based consumers and HTML-aware consumers. There are two causes for this:

* Error correction in HTML parsing can restructure invalid HTML is restructured to make it valid, for example non-table markup within a table is moved to before the table. You can avoid this restructuring by making sure that your HTML is valid so that it is not needed.
* Older browsers move meta and link elements in the body of an HTML document to after the head element, because they could not validly appear within the body in older versions of HTML. If you are targeting consumers which run within older browsers, such as scripts or plug-ins, you can avoid this restructuring by using empty span or other elements instead of linkor meta; other consumers should be using an up-to-date HTML5 parser which will not do this.

--
Jeni Tennison
http://www.jenitennison.com

Received on Monday, 14 November 2011 19:29:48 UTC