Using XPointer with HTML

I have been asked to communicate to this group the HTML Working Group's
feelings about using XPointer to index into HTML (i.e. the SGML version, not
XHTML). As I understand it, it is particularly in reference to elements such
as <tbody> which may or may not be in the markup.

The group discussed this topic recently. Many thanks to Masayasu Ishikawa
for his comments, many of which are echoed here.

We understand the motivation for wanting to annotate HTML. But:

Firstly a technical caveat: The abstract to XPointer says that it's for "a
resource whose Internet media type is one of text/xml, application/xml,
text/xml-external-parsed-entity, or application/xml-external-parsed-entity".
Unless we update RFC 2854, "XPointer for HTML" would be non-conformant.
(http://www.w3.org/TR/2001/CR-xptr-20010911/#abstract)

Secondly, an observation: most HTML documents are seriously broken. Trying
to create a robust mapping from broken HTML to XML is a minefield we do not
wish to step on.

Thirdly, because of the difference between XML and SGML, XHTML and HTML have
different but compatible content models. This means that an XHTML document
served as text/html will have a different parse tree to that of the
physically same document served as text/xml or application/xhtml+xml. This
means that depending on the mime type you would need different XPointers to
get to the same element.

However, if you persist, let us observe that the DTD for HTML 4.01 says of
<tbody>:

    <!ELEMENT TABLE - -
         (CAPTION?, (COL*|COLGROUP*), THEAD?, TFOOT?, TBODY+)>
    <!ELEMENT TBODY    O O (TR)+           -- table body -->

(http://www.w3.org/TR/html401/struct/tables.html#edef-TBODY)

This says that <tbody> is an element with optional begin and end tags. This
means that whether or not <tbody> is present in the markup, it is present in
the document, and therefore in the tree.

On the other hand, the DTD for XHTML says:

<!ELEMENT table     (caption?, (col*|colgroup*), thead?, tfoot?,
(tbody+|tr+))>
<!ELEMENT tbody    (tr)+>

(http://www.w3.org/MarkUp/Group/2002/REC-xhtml1-20020301/dtds.html#a_dtd_XHT
ML-1.0-Strict)

This says that <tbody> is an optional element: if it is not in the markup it
is not in the tree.
(We had to do it this way, because XML does not give you optional tags).

Therefore the answer to the question "what should an XPointer into HTML look
like?" is a very loud "it depends".

Best wishes,

Steven Pemberton
Chair, W3C HTML Working Group

Received on Wednesday, 3 April 2002 08:46:35 UTC