- From: Al Gilman <asgilman@iamdigex.net>
- Date: Tue, 31 Oct 2000 10:56:51 -0500
- To: "Leonard R. Kasday" <kasday@acm.org>, w3c-wai-er-ig@w3.org
At 09:24 AM 2000-10-31 -0500, Leonard R. Kasday wrote: >At 02:14 PM 10/30/00 -0500, Al Gilman wrote: >> >I think you can actually write Xpointer / Xpath expressions to select >> bits of >> >text (I think I saw that it could be used in XSLT). But you should ask an >> >expert. > >I think ability to point into text is critical for additional reasons, >besides things like language, misspellings, and missing tags in the text. > >- an accessibility assertion could apply to a part of CSS or Javascript >inside a web page. These are typically (and prefereably) inside comments... > >so we need to point to text inside comments! Can we do this with Xpointer? > >- If in the future we deal with server-side scripting that follows e.g. asp >or php files, and we want to test the asp or php file before it gets >processed into HTML by the server, we are again are pointing inside a >programming language whose parse tree would not be exposed to XPATH constructs. > >- what if the original page is illegal HTML? This is the crux of the matter. The community of interoperating tools [as resolved in last telecon] needs to have an agreement or convention as to what to do, here. Using 'tidy' to define the canonicalization that you will employ for references into imperfect HTML is in a sense a hack; but it gets you a normal form that you can then manipulate without fear for those pages it copes with; and it copes with a lot more pages than those that will pass validation before processing. But 'tidy' is not the generic issue; it is how bad of HTML are you going to try to understand one another in discussing? > How can we point to the >illegal bits in a tidyfied version? Seems to me we'd have to cast the >whole page, or at least a portion of the page, into CDATA to talk about it. No; that is not necessary. > >These are leaning me in the direction of just considering the page one big >text string against which we make XML or RDF statements. If we do that, >that whether we point by interspersing the comments in the string or >pointing into the string is largely an implementation detail, it seems to >me, since it would be straightforward to convert from one to another. Yes; the way you exchange mutually understood pointers into something that is not valid HTML is to drop back to some class of text string that you trust all the documents you want to talk about conform to in fact, and use an indexing scheme that works there. Usually this winds up with some intermediate choice; you don't go all the way back to flat text but agree to a scheme that works if a certain repair strategy works, and wash your hands of the documents that don't clean up by the application of this repair method; they are just beyond what you can exchange references about under the convention so defined. There are i18n issues about character counting in text. I would have to send you back to the i18n people to get an explanation of what they are. Other pitfalls to character counting deal with the fact that character string changes are permissible as a result of transport. Communicating repair tools, having recovered the same URI-reference, are not necessarily holding bit-for-bit replicas in their local storage. Not even character-for-character. This is why signature (q.v.) involves a canonicalization transformation, IIRC. One point is that the XML Working Group is firmly committed to the idea that something that is not well-formed XML is pure garbage and they will not define processing methods that step into that void. It is not necessarily practical for a community of HTML repair tools to be quite so strictly orthodox in what they will eat. So what I am saying is that you may wish to re-engineer X-Pointer just a bit to add a bit of repair or robustness to unorthodoxy somehow. You have a task requirement that invites a more permissive standard than what the XML community would agree to define. But the mission statement for X-Path and/or X-Pointer (oops, I should have been saying both) is very close to the same problem as what you need. Part of the problem is my bias regarding the implementation of sofware in this area. I tend to be biased in the direction that this group should be coming up with rapid-prototype reference implementations of techniques that we can then sell off into the format specifications because we have working models. The commercial implementers of the formats can come up with the efficient implementations. So we don't make 'tidy' the definition of our pointer scheme. What we say is that you can point into a repaired image of a document if you document the repair with a standard diff. Then 'tidy' can be replaced within the agreed architecture; the standard is for the diff, not the repair. This is costly in compute cycles, but efficient in programming calendar months. This lets us build working systems with standard DIFF and standard XML tools and libraries. And capitalize on all the work Raggett has poured into 'tidy' over the years. > >Please take the above (possibly inflammatory) statements as just MHO for now. Temperate and amiable as ever, Lenny. And well framed, to boot. Al > >Len >-- >Leonard R. Kasday, Ph.D. >Institute on Disabilities/UAP and Dept. of Electrical Engineering at Temple >University >(215) 204-2247 (voice) (800) 750-7428 (TTY) ><http://astro.temple.edu/~kasday>http://astro.temple.edu/~kasday <mailto:kasday@acm.org>mailto:kasday@acm.org > >Chair, W3C Web Accessibility Initiative Evaluation and Repair Tools Group ><http://www.w3.org/WAI/ER/IG/>http://www.w3.org/WAI/ER/IG/ > >The WAVE web page accessibility evaluation assistant: ><http://www.temple.edu/inst_disabilities/piat/wave/>http://www.temple.edu/ inst_disabilities/piat/wave/ >
Received on Tuesday, 31 October 2000 10:29:00 UTC