Re: Using XPointer with HTML

So one approach the RE group could take is to define a document namespace
which is in fact defined as the Tidied version of something, where there is a
reulst defined for when Tidy just gives up.

A variation is to annotate a given docuemnt with an annotation type of
"valid XML representation so we know what the xpointers refer to" or
something, and make Xpointers refer to that (and define it, also, as the
result of applying Tidy or something, so the actual thing can be
autogenerated). Anyone want to make a server that does this?

chaals

On Wed, 10 Apr 2002, Steven Pemberton wrote:

  From: "Nick Kew" <nick@webthing.com>
  > > Therefore the answer to the question "what should an XPointer into HTML
  look
  > > like?" is a very loud "it depends".
  >
  > Indeed.  It depends on defining a canonical normalisation of HTML.
  > If we can do that, we're fine.

  And what I said is: that is a minefield onto which we [the HTML working
  group] do not want to step.

  Real-world HTML documents are jokingly called "tag soup" for a reason. You
  take a goodly collection of HTML tags, stir them up, put them into a file,
  and publish it on the web. <style> elements before the <html> tag; <titles>
  outside the <head>; misspelled closing tags, misspelled opening tags, <ul>s
  with no enclosed <li>s; <li>s outside <ul>s. Imagine a combination of tags,
  you will find a document that contains that combination. Even Tidy throws up
  its hands sometimes, and instructs you to go back and change the source
  file!

  Finding a canonical normalisation of real-world HTML documents is not
  something the HTML WG feels inclined to spend its scarce time on.

  Best wishes,

  Steven Pemberton



-- 
Charles McCathieNevile    http://www.w3.org/People/Charles  phone: +61 409 134 136
W3C Web Accessibility Initiative     http://www.w3.org/WAI  fax: +33 4 92 38 78 22
Location: 21 Mitchell street FOOTSCRAY Vic 3011, Australia
(or W3C INRIA, Route des Lucioles, BP 93, 06902 Sophia Antipolis Cedex, France)

Received on Wednesday, 10 April 2002 10:52:00 UTC