- From: Leonard R. Kasday <kasday@acm.org>
- Date: Fri, 03 Nov 2000 15:19:55 -0500
- To: w3c-wai-er-ig@w3.org
- Message-Id: <4.3.2.7.2.20001103095206.00e5c400@pop3.concentric.net>
I like Al's sugggestion regarding how we represent documents we want
accessibility statements to point to:
>Usually this winds up with some intermediate choice; you don't go all the way
>back to flat text but agree to a scheme that works if a certain repair
>strategy
>works, and wash your hands of the documents that don't clean up by the
>application of this repair method; they are just beyond what you can exchange
>references about under the convention so defined.
We've been talking about HTML and XHTML, but I think we also need to talk
about representing
- style sheets
- javascript (well, ecmascipt). There's some innocuous javascipt and some
that can hurt accessibility. We need to be able to point to what we're
talking about.
- scripting in server side pages. These are scripts that appear in pages
that the server interprets and that the user never directly sees. We may
want to point to those scripts as well, These scripts can be in various
languages, e.g. visual basic, perl, php, etc.
Now, one way to point into documents is XPath and XPointer, which, as we
discused last time, only works for XML. So what to do?
If the document is already XHTML, we can point with XPointer and no further
work. However,
- there can be many different XPointers that point to the same place. E.g.
you can point to the third image, or the image with src="foo.gif". This is
a slight implementation complication if we use two different tools and they
use different representations, two Xpointer specs that look very different
can point to the same thing. But it's straightforward to match them up in
principle so I'm not going to worry too much about this for now...
If the document is HTML, we can process it to create XHTML. But there's
another complication: even if the HTML is correct, there's more than one
valid XHTML document into which it can be converted.
For example, in the following valid HTML
<P> <BR> <P>
The BR can be be inside or outside the scope of the preceding P
Could be transformed to
<p> </p> <br/> <p> </p>
or
<p> <br/> </p> <p> </p>
So there's no one to one correspondence of HTML to XHTML. Similarly, a
given XHTML can correspond to more than one HTML. This leads to the question:
If we point to a location in an XHTML document that was derived from a
known HTML document, does it uniquely correspond define the point in the
HTML document? Off hand I'd say yes, but given the many-to-many
correspondence between HTML and XHTML, it's not obvious to me. I'm
especially concerned if the XHTML came from the HTML via heuristics.
This concern escalates if the HTML was not valid to begin with.
-----------
So getting back to Al's suggestion: we need some intermediate
representation that applies to reasonably valid, if not completely valid,
documents. Here's a couple of proposals.
The first proposal is "Chunk Markup Language" or CML.
Basically, we parse the input into a series of chunks that describes the
code. It is NOT an attempt to do a real parse tree. Just a way to
describe what's there; and a way that will remain valid even if there's
some (though not all) errors in the original.
For example,
<A href=quib.html> <img src=foo.gif alt=bar> </A>
would be represented as
<tag name="A"
<attr name="href" value="quib.html" />
</tag>
<tag name="img">
<attr name="src" value="foo.gif" />
<attr name="alt" value="bar" />
</tag>
<endtag name="A" />
Note that this applies to documents that messed up nesting, e.g. <B><I>
</B> </I> No need to apply Tidy type heuristics. And there's a simple one
to one correspondence between chunks in the XML and things in the
original... even if the original is invalid.
It's straightforward to recover the original, at least without whitespace
and other details. But those can be added. see below.
---------------
Another way would be to create XHTML, but to mark the closing tag that were
added in the conversion. This makes it straightforward to go back to the
oiginal HTML. e.g.
<P> Hello <BR> World <P>
might be transformed into
<p> Hello <br/> <added/> </p> World <p> <added/> </p>
(we can't wrap the closing tag in <added></p></added> because that would be
invalid XML)
Now to recover the original HTML (at least without the whitespace and some
other details) we just remove the closing tags marked by <added/> and
change <tag/> to <tag>
This gets hairier if the original wasn't syntactically correct and you're
e.g. switching around closing tags...
------
In either case,
If we want to recover the exact appearance of the original code, we can add
further tags, e.g.
<whitespace value="sssstttssssn" /> <!-- s is space, t is tab, n is
newline -->
and further tags for upper lower case.
Awkward to put in markings for use of quotes in variable names though...
the first method's better for that.
------
Now, which of these notations is more convenient for software that uses
XPointer?
------
also there's the whole other issue of how we talk about documents that are
not HTML, e.g. css style sheets, or parts of pges like javascipt. Xpointer
can deal with those as character strings, but it would be good to do
something higher level...
Anyway that's my thoughts so far...
--
Leonard R. Kasday, Ph.D.
Institute on Disabilities/UAP and Dept. of Electrical Engineering at Temple
University
(215) 204-2247 (voice) (800) 750-7428 (TTY)
http://astro.temple.edu/~kasday mailto:kasday@acm.org
Chair, W3C Web Accessibility Initiative Evaluation and Repair Tools Group
http://www.w3.org/WAI/ER/IG/
The WAVE web page accessibility evaluation assistant:
http://www.temple.edu/inst_disabilities/piat/wave/
Received on Friday, 3 November 2000 15:20:34 UTC