Re: Using XPointer with HTML

On Wed, 3 Apr 2002, Steven Pemberton wrote:

> Secondly, an observation: most HTML documents are seriously broken. Trying
> to create a robust mapping from broken HTML to XML is a minefield we do not
> wish to step on.

We are well aware of that, but it affects the practice, not the
principle, of XPointers into HTML.

> Therefore the answer to the question "what should an XPointer into HTML look
> like?" is a very loud "it depends".

Indeed.  It depends on defining a canonical normalisation of HTML.
If we can do that, we're fine.

This has come up in ER and in Annotea.  I wasn't involved in Annotea
at the time of its introduction, so I won't comment on that now.

The subject arose in ER because we wanted an xpointer-like mechanism
available in Valet reports for client software - specifically Jim's
stuff.  Jim and I called them "Fuzzy Xpointers".  We haven't properly
formalised them, but rather we have an empirical working model:
  * Construct an XML normalisation of the HTML
  * Use a (simplified) Xpointer into that
We called them "Fuzzy Xpointers".

The mechanism works provided our respective parsers make compatible
HTML->XML normalisations.  After a couple of iterations, we were
able to make it work with OpenSP, MSXML and Mozilla - the tools we
were using in the applications in question.  This looks like a
reasonable starting point to define a canonical normalisation.

-- 
Nick Kew

Site Valet - the mark of Quality on the Web.
<URL:http://valet.webthing.com/>

Received on Wednesday, 3 April 2002 12:40:43 UTC