Re: Using XPointer with HTML from Jim Ley on 2002-04-03 (www-annotation@w3.org from January to June 2002)

From: Jim Ley <jim@jibbering.com>
Date: Wed, 3 Apr 2002 14:39:28 -0000
To: <www-annotation@w3.org>, "HTML WG" <w3c-html-wg@w3.org>
Message-ID: <00e101c1db1d$5c6768e0$ca969dc3@emedia.co.uk>
"Steven Pemberton" <steven.pemberton@cwi.nl>
> We understand the motivation for wanting to annotate HTML. But:
>
> Firstly a technical caveat: The abstract to XPointer says that it's for
"a
> resource whose Internet media type is one of text/xml, application/xml,
> text/xml-external-parsed-entity, or
application/xml-external-parsed-entity".
> Unless we update RFC 2854, "XPointer for HTML" would be non-conformant.
> (http://www.w3.org/TR/2001/CR-xptr-20010911/#abstract)

So no matter about HTML documents, XHTML ones can't be annotated either?
(RFC 3236 defines application/xhtml+xml )

> Secondly, an observation: most HTML documents are seriously broken.
Trying
> to create a robust mapping from broken HTML to XML is a minefield we do
not
> wish to step on.

but neither is such a thing necessary to be able to use XPointer like
constructs on HTML documents, as you say, they are obviously not
XPointers because XPointers are only defined on XML documents (and even
then on a very restricted set of MIME-types.)  All you need to do is be
able to indentify elements in the parse tree, XPointer like syntax is
good for this, especially if the document has id's.

There are of course other problems with XPointer, that to me have much
more serious implications than this, that of what a URI points to, is not
a particular XML representation so can't have an XPointer dangling off of
it, it's a Resource, which may contain all sorts of different things,
including non XML representations.

> Thirdly, because of the difference between XML and SGML, XHTML and HTML
have
> different but compatible content models. This means that an XHTML
document
> served as text/html will have a different parse tree to that of the
> physically same document served as text/xml or application/xhtml+xml.
This
> means that depending on the mime type you would need different
XPointers to
> get to the same element.

Well of course you would, they're different documents!  However if the
Pointer is
xpointer(id('Moomin')) then it will happily point to the same element
within the Resource (assuming the resources are appropriately authored,
and content negotiation on the document returned from the URI is
logical.)

> However, if you persist, let us observe that the DTD for HTML 4.01 says
of
> <tbody>:
> On the other hand, the DTD for XHTML says:
>
> This says that <tbody> is an optional element: if it is not in the
markup it
> is not in the tree.
> (We had to do it this way, because XML does not give you optional
tags).
>
> Therefore the answer to the question "what should an XPointer into HTML
look
> like?" is a very loud "it depends".

This seems very confused to me, in one part you define xpointer to only
work with the idea that a URI returns a particular document (not a
particular resource) and now you argue against xpointers in HTML
documents by comparing them with XHTML - the differences betwenn XHTML
and HTML are irrelevant to the purposes of XPointers in HTML - in HTML,
we know there's a TBODY in the parse tree, in an XHTML document we know
there's not unless it's in the source document.  Those differences are
irrelevant to whether you can point to a particular part of an HTML parse
tree.

The question is not how does XPointer into HTML compare to XPointer into
XHTML, but can we point to something in an HTML document?

Jim.
Received on Wednesday, 3 April 2002 09:44:21 UTC