- From: Steven Pemberton <steven.pemberton@cwi.nl>
- Date: Wed, 3 Apr 2002 15:46:33 +0200
- To: <www-annotation@w3.org>
- Cc: "HTML WG" <w3c-html-wg@w3.org>
I have been asked to communicate to this group the HTML Working Group's feelings about using XPointer to index into HTML (i.e. the SGML version, not XHTML). As I understand it, it is particularly in reference to elements such as <tbody> which may or may not be in the markup. The group discussed this topic recently. Many thanks to Masayasu Ishikawa for his comments, many of which are echoed here. We understand the motivation for wanting to annotate HTML. But: Firstly a technical caveat: The abstract to XPointer says that it's for "a resource whose Internet media type is one of text/xml, application/xml, text/xml-external-parsed-entity, or application/xml-external-parsed-entity". Unless we update RFC 2854, "XPointer for HTML" would be non-conformant. (http://www.w3.org/TR/2001/CR-xptr-20010911/#abstract) Secondly, an observation: most HTML documents are seriously broken. Trying to create a robust mapping from broken HTML to XML is a minefield we do not wish to step on. Thirdly, because of the difference between XML and SGML, XHTML and HTML have different but compatible content models. This means that an XHTML document served as text/html will have a different parse tree to that of the physically same document served as text/xml or application/xhtml+xml. This means that depending on the mime type you would need different XPointers to get to the same element. However, if you persist, let us observe that the DTD for HTML 4.01 says of <tbody>: <!ELEMENT TABLE - - (CAPTION?, (COL*|COLGROUP*), THEAD?, TFOOT?, TBODY+)> <!ELEMENT TBODY O O (TR)+ -- table body --> (http://www.w3.org/TR/html401/struct/tables.html#edef-TBODY) This says that <tbody> is an element with optional begin and end tags. This means that whether or not <tbody> is present in the markup, it is present in the document, and therefore in the tree. On the other hand, the DTD for XHTML says: <!ELEMENT table (caption?, (col*|colgroup*), thead?, tfoot?, (tbody+|tr+))> <!ELEMENT tbody (tr)+> (http://www.w3.org/MarkUp/Group/2002/REC-xhtml1-20020301/dtds.html#a_dtd_XHT ML-1.0-Strict) This says that <tbody> is an optional element: if it is not in the markup it is not in the tree. (We had to do it this way, because XML does not give you optional tags). Therefore the answer to the question "what should an XPointer into HTML look like?" is a very loud "it depends". Best wishes, Steven Pemberton Chair, W3C HTML Working Group
Received on Wednesday, 3 April 2002 08:46:35 UTC