Re: Using XPointer with HTML from Nick Kew on 2002-04-03 (www-annotation@w3.org from January to June 2002)

From: Nick Kew <nick@webthing.com>
Date: Wed, 3 Apr 2002 18:40:36 +0100 (BST)
To: Steven Pemberton <steven.pemberton@cwi.nl>
cc: <www-annotation@w3.org>, <w3c-wai-er-ig@w3.org>, HTML WG <w3c-html-wg@w3.org>
Message-ID: <20020403182002.X684-100000@fenris.webthing.com>

On Wed, 3 Apr 2002, Steven Pemberton wrote:

> Secondly, an observation: most HTML documents are seriously broken. Trying
> to create a robust mapping from broken HTML to XML is a minefield we do not
> wish to step on.

We are well aware of that, but it affects the practice, not the
principle, of XPointers into HTML.

> Therefore the answer to the question "what should an XPointer into HTML look
> like?" is a very loud "it depends".

Indeed.  It depends on defining a canonical normalisation of HTML.
If we can do that, we're fine.

This has come up in ER and in Annotea.  I wasn't involved in Annotea
at the time of its introduction, so I won't comment on that now.

The subject arose in ER because we wanted an xpointer-like mechanism
available in Valet reports for client software - specifically Jim's
stuff.  Jim and I called them "Fuzzy Xpointers".  We haven't properly
formalised them, but rather we have an empirical working model:
  * Construct an XML normalisation of the HTML
  * Use a (simplified) Xpointer into that
We called them "Fuzzy Xpointers".

The mechanism works provided our respective parsers make compatible
HTML->XML normalisations.  After a couple of iterations, we were
able to make it work with OpenSP, MSXML and Mozilla - the tools we
were using in the applications in question.  This looks like a
reasonable starting point to define a canonical normalisation.

-- 
Nick Kew

Site Valet - the mark of Quality on the Web.
<URL:http://valet.webthing.com/>

Received on Wednesday, 3 April 2002 12:40:43 UTC