- From: Nick Kew <nick@webthing.com>
- Date: Wed, 3 Apr 2002 18:40:36 +0100 (BST)
- To: Steven Pemberton <steven.pemberton@cwi.nl>
- cc: <www-annotation@w3.org>, <w3c-wai-er-ig@w3.org>, HTML WG <w3c-html-wg@w3.org>
On Wed, 3 Apr 2002, Steven Pemberton wrote: > Secondly, an observation: most HTML documents are seriously broken. Trying > to create a robust mapping from broken HTML to XML is a minefield we do not > wish to step on. We are well aware of that, but it affects the practice, not the principle, of XPointers into HTML. > Therefore the answer to the question "what should an XPointer into HTML look > like?" is a very loud "it depends". Indeed. It depends on defining a canonical normalisation of HTML. If we can do that, we're fine. This has come up in ER and in Annotea. I wasn't involved in Annotea at the time of its introduction, so I won't comment on that now. The subject arose in ER because we wanted an xpointer-like mechanism available in Valet reports for client software - specifically Jim's stuff. Jim and I called them "Fuzzy Xpointers". We haven't properly formalised them, but rather we have an empirical working model: * Construct an XML normalisation of the HTML * Use a (simplified) Xpointer into that We called them "Fuzzy Xpointers". The mechanism works provided our respective parsers make compatible HTML->XML normalisations. After a couple of iterations, we were able to make it work with OpenSP, MSXML and Mozilla - the tools we were using in the applications in question. This looks like a reasonable starting point to define a canonical normalisation. -- Nick Kew Site Valet - the mark of Quality on the Web. <URL:http://valet.webthing.com/>
Received on Wednesday, 3 April 2002 12:40:42 UTC