- From: Leonard R. Kasday <kasday@acm.org>
- Date: Fri, 03 Nov 2000 15:19:55 -0500
- To: w3c-wai-er-ig@w3.org
- Message-Id: <4.3.2.7.2.20001103095206.00e5c400@pop3.concentric.net>
I like Al's sugggestion regarding how we represent documents we want accessibility statements to point to: >Usually this winds up with some intermediate choice; you don't go all the way >back to flat text but agree to a scheme that works if a certain repair >strategy >works, and wash your hands of the documents that don't clean up by the >application of this repair method; they are just beyond what you can exchange >references about under the convention so defined. We've been talking about HTML and XHTML, but I think we also need to talk about representing - style sheets - javascript (well, ecmascipt). There's some innocuous javascipt and some that can hurt accessibility. We need to be able to point to what we're talking about. - scripting in server side pages. These are scripts that appear in pages that the server interprets and that the user never directly sees. We may want to point to those scripts as well, These scripts can be in various languages, e.g. visual basic, perl, php, etc. Now, one way to point into documents is XPath and XPointer, which, as we discused last time, only works for XML. So what to do? If the document is already XHTML, we can point with XPointer and no further work. However, - there can be many different XPointers that point to the same place. E.g. you can point to the third image, or the image with src="foo.gif". This is a slight implementation complication if we use two different tools and they use different representations, two Xpointer specs that look very different can point to the same thing. But it's straightforward to match them up in principle so I'm not going to worry too much about this for now... If the document is HTML, we can process it to create XHTML. But there's another complication: even if the HTML is correct, there's more than one valid XHTML document into which it can be converted. For example, in the following valid HTML <P> <BR> <P> The BR can be be inside or outside the scope of the preceding P Could be transformed to <p> </p> <br/> <p> </p> or <p> <br/> </p> <p> </p> So there's no one to one correspondence of HTML to XHTML. Similarly, a given XHTML can correspond to more than one HTML. This leads to the question: If we point to a location in an XHTML document that was derived from a known HTML document, does it uniquely correspond define the point in the HTML document? Off hand I'd say yes, but given the many-to-many correspondence between HTML and XHTML, it's not obvious to me. I'm especially concerned if the XHTML came from the HTML via heuristics. This concern escalates if the HTML was not valid to begin with. ----------- So getting back to Al's suggestion: we need some intermediate representation that applies to reasonably valid, if not completely valid, documents. Here's a couple of proposals. The first proposal is "Chunk Markup Language" or CML. Basically, we parse the input into a series of chunks that describes the code. It is NOT an attempt to do a real parse tree. Just a way to describe what's there; and a way that will remain valid even if there's some (though not all) errors in the original. For example, <A href=quib.html> <img src=foo.gif alt=bar> </A> would be represented as <tag name="A" <attr name="href" value="quib.html" /> </tag> <tag name="img"> <attr name="src" value="foo.gif" /> <attr name="alt" value="bar" /> </tag> <endtag name="A" /> Note that this applies to documents that messed up nesting, e.g. <B><I> </B> </I> No need to apply Tidy type heuristics. And there's a simple one to one correspondence between chunks in the XML and things in the original... even if the original is invalid. It's straightforward to recover the original, at least without whitespace and other details. But those can be added. see below. --------------- Another way would be to create XHTML, but to mark the closing tag that were added in the conversion. This makes it straightforward to go back to the oiginal HTML. e.g. <P> Hello <BR> World <P> might be transformed into <p> Hello <br/> <added/> </p> World <p> <added/> </p> (we can't wrap the closing tag in <added></p></added> because that would be invalid XML) Now to recover the original HTML (at least without the whitespace and some other details) we just remove the closing tags marked by <added/> and change <tag/> to <tag> This gets hairier if the original wasn't syntactically correct and you're e.g. switching around closing tags... ------ In either case, If we want to recover the exact appearance of the original code, we can add further tags, e.g. <whitespace value="sssstttssssn" /> <!-- s is space, t is tab, n is newline --> and further tags for upper lower case. Awkward to put in markings for use of quotes in variable names though... the first method's better for that. ------ Now, which of these notations is more convenient for software that uses XPointer? ------ also there's the whole other issue of how we talk about documents that are not HTML, e.g. css style sheets, or parts of pges like javascipt. Xpointer can deal with those as character strings, but it would be good to do something higher level... Anyway that's my thoughts so far... -- Leonard R. Kasday, Ph.D. Institute on Disabilities/UAP and Dept. of Electrical Engineering at Temple University (215) 204-2247 (voice) (800) 750-7428 (TTY) http://astro.temple.edu/~kasday mailto:kasday@acm.org Chair, W3C Web Accessibility Initiative Evaluation and Repair Tools Group http://www.w3.org/WAI/ER/IG/ The WAVE web page accessibility evaluation assistant: http://www.temple.edu/inst_disabilities/piat/wave/
Received on Friday, 3 November 2000 15:20:34 UTC