W3C home > Mailing lists > Public > w3c-wai-er-ig@w3.org > February 2002

31 Jan 2002 chat re: PageValet and EARL, "fuzzy HTML pointers"

From: Wendy A Chisholm <wendy@w3.org>
Date: Fri, 01 Feb 2002 13:02:11 -0500
Message-Id: <5.1.0.14.2.20020201125747.04bda470@localhost>
To: w3c-wai-er-ig@w3.org
[13:08] <nick_kew> hi, I've been thinking about Page Valet+earl
[13:09] <nick_kew> I'm concerned about how to identify testSubjects
<wendy> what's your concern?
[13:10] <nick_kew> Basic testSubject should be the page being evaluated, right?
<wendy> yes, a URI
[13:11] <nick_kew> Now, that's fine for isValid, or Passes-WCAG-AA
[13:12] <nick_kew> But for the detailed output of Page Valet, it's losing lots of information
<wendy> you mean, you want to say ImageX is missing alt-text.
[13:12] <sbp> can you provide an example?
[13:12] <sbp> ah
[13:12] <sbp> well, then you have to use XPointer, or a line/column count
[13:12] <sbp> or map the DOM
[13:13] <nick_kew> Precisely.  If I comment on ImageX, I want to identify ImageX.
[13:13] <sbp> O.K.
<wendy> and then what ImageX fails is wcag1.0#text-equiv
[13:13] <sbp> so how are you identifying it in the PV output?
[13:13] <nick_kew> Now, that works fine *if* I make testSubject an internal anchor in the Normalised source generated by Valet
[13:15] <nick_kew> Is that something I can reasonably do?
[13:15] <JibberJim> I don't think so as no-one else will have your normalised source.
[13:15] <JibberJim> Can you construct XPointers?
[13:16] <JibberJim> (which hopefully won't differ between normalised and non-normalised source.
[13:16] <nick_kew> XPointers into what?  It's no use unless the original is well-formed XML
[13:16] <nick_kew> - which most Web docs aren't
[13:16] <sbp> er... I presume that you're addressing it by column and line number, right?
[13:16] <nick_kew> No, sbp
[13:16] <sbp> so what are you using?
[13:17] <sbp> or is it indictinct?
[13:17] <nick_kew> The normalised source comes out as
[13:17] <nick_kew> <sp element name="h1" id="e11"> .... </sp:element>
[13:17] <nick_kew> and the tests have an IDREF to the element
[13:18] <nick_kew> <sp:element name="h1" id="e11"> .... </sp:element>
[13:18] <nick_kew> (typo correction)
[13:19] <sbp> right. so: [ a earl:WebContent; :partOf [ earl:reprOf <http://blargh.org/> ]; a :Element; :name "h1"; :id "e11" ] .
[13:19] <nick_kew> One solution to this would be an XHTML+EARL approach, analagous to XHTML+XML
[13:19] <sbp> you can probably borrow :partOf from one of Tim's namespaces...
[13:19] <sbp> XHTML+EARL?
[13:19] <nick_kew> <html> ......
[13:19] <JibberJim> But Sean, that would be Valet specific, I couldn't translate the e11 into an h1 in my tool.
[13:20] <nick_kew> <div id="earlreport">
[13:20] <nick_kew> <rdf:RDF ...>
[13:20] <sbp> Valet specific: well, it must get converted into something less specific
[13:20] <nick_kew> <earl:Assertor...>
[13:20] <sbp> less specific to SV, that is
[13:20] <sbp> why embed the EARL in the XHTML? You can't say "this bit of RDF talks about the element in which it is embedded"
[13:21] <sbp> you can point to the element using line number, XPointer, DOM, or whatever, though
[13:21] <sbp> I feel as if I'm missing something rather important
[13:22] <nick_kew> - so the testsubject can be an IDREF within the report
[13:22] <nick_kew> Can't.
[13:22] <nick_kew> Most web docs aren't well-formed XML
[13:22] <sbp> you can still use line/column on them, though
[13:22] <JibberJim> I think we need a fuzzy kind of HTMLPointer
[13:22] * sbp browses the http://valet.webthing.com/page/wcag.cgi?url=http://www.w3.org/&fmt=XML&wcag=0 source
[13:23] <nick_kew> Can't.  Don't have that info available in the normalised source
[13:23] <sbp> well, we'd need a canonical HTML parser to do that
[13:23] <JibberJim> surely an xpointer like thing pointing to the basic structure and attributes would be possible?
[13:24] <sbp> if you ran it through HTML Tidy...
[13:24] <JibberJim> html/body/image[src='moo.gif']
[13:24] <sbp> assuming near-decent SGML, yeah
[13:24] <nick_kew> <font><p><blockquote><b>this<i>and</b>that></font> ...
[13:24] <sbp> but what if there's some odd SGML error barfing the validator? You can only use line/column ni that case
[13:25] <sbp> if you don't have that information, I don't know what you expect to do
[13:25] <nick_kew> The XML (and HTML) report formats include full normalised source, precisely so as to have a reference point for the messages
[13:25] <JibberJim> How about the EARL in that case being "this is really bad" and not worry about specifics in the EARL.
[13:26] <sbp> since the output IDs are obviously case-by-case specific, they seem rather pointless
[13:26] <sbp> full normalized source: good
[13:26] <JibberJim> The HTMLPointer could have a confidence level against it?
[13:27] <nick_kew> I want a testSubject referencing somewhere in the normalised source, as opposed to the original source
[13:27] <sbp> so do it by ID
[13:27] <sbp> you have IDs for every element and attribute!
[13:27] <JibberJim> That would be no good nick_kew - as other people won't have your normalising parser available so will not be able to swap.
[13:28] <nick_kew> It can work if the normalised source is part of the same report
[13:28] <nick_kew> - provided this can be represented in a way other tools will understand
[13:29] <sbp> SP doesn't seem particularly set up to have other tools understand it
[13:29] <nick_kew> yes, sbp, I have IDs.  But original docs don't
[13:29] <sbp> what exactly do you want to identify within the EARL report?
[13:31] <nick_kew> When I report an issue, I want the report to identify exactly what element in the original doc it's complaining about
[13:31] <JibberJim> Wendy - http://jibbering.com/snufkin/ has a working Annotea client now - just looking at putting EARL in, but I'm not sure of the point!
[13:31] <sbp> hello?
<wendy> not sure of the point?
[13:32] * sbp is having network problems...
[13:32] <JibberJim> Well, We can't query the EARL using the Annotea interface, we just get given it, I think a more generic EARL database would be better.
<wendy> hmm. i thought the point of http://www.w3.org/2001/08/AnnoteaOxygenDemo.html
<wendy> was to query using the annotea interface...
[13:33] <sbp> the element in te original document: and therefore, you need someway to identify that element as it is in the originial document. As I have said, the is limited to line/column (possible a range), or XML DOM/XPointer. If it's not XML, you'll be wanting to use line/column. If you don't have that information, then what else can you do?
[13:33] <sbp> s/originial/original/
[13:33] <JibberJim> I like the idea of an HTMLPointer that's fuzzy.
[13:33] <sbp> I like the idea too
[13:35] <sbp> http://www.w3.org/2001/08/AnnoteaOxygenDemo is rather odd
[13:35] <JibberJim> Fuzzy is probably accurate enough, you only need to find the nearest matching in the parse tree, which isn't too hard with DOM (that I use for such things)
<wendy> how would a fuzzy htmlpointer work?
[13:35] <JibberJim> Just like XPointer, but wouldn't be accurate :-)
[13:35] <sbp> well, you generally know how to parse an HTML document: some end elements are implied, and there are empty elements hither and thither...
[13:36] <sbp> but if you couldn't parse it, what would be the point of HTML?
<wendy> is there any way to use hash to get this stuff? 
[13:36] <sbp> you could just extract the bit of content in question, make sure that it's unique, and then use that
[13:36] <sbp> if not unique, give the index number
<wendy> al had an interesting idea at the F2F:
<wendy> it was an svg and xforms interface.
[13:37] <nick_kew> harumph.  Element id is index number, but how useful is it
<wendy> you'd have the source of the doc and mixed w/it EARL. using xforms, you would query each element
<wendy> for its bit of earl.
[13:38] <JibberJim> it's not safe as we don't know how many implied elements your parser has put in - or do you?
<wendy> he was using svg i think for the visual presentation.
[13:38] <sbp> .google XForms implementations
[13:38] <xena> XForms implementations: http://www.w3.org/2000/04/xforms-pressrelease
[13:38] <sbp> .google real XForms implementations
[13:38] <xena> real XForms implementations: http://lists.w3.org/Archives/Public/www-forms/2000Jun/0035.html&e=921
[13:38] <nick_kew> Jim, I know that, you may not if your parser's different
[13:38] <nick_kew> That's why I'm not happy with it!
[13:39] <JibberJim> but if you know how many you included, and what index it is, you know the real index in the sourec?
[13:40] <nick_kew> No, my parser doesn't know whether implied things (like <html>, <head> and <body>) are there or implied
<wendy> doesn't the dom give us a "fuzzy html pointer?"
[13:40] <nick_kew> visval will put in <head> if necessary, thereby upping the element count
[13:40] <JibberJim> We have wellformed HTMLPointers, but only on our normalised source.
[13:41] <nick_kew> If your parser doesn't, then we're not compatible
[13:41] <nick_kew> Yep, jim, that's exactly it
[13:41] <JibberJim> it depends if the Fuzzy HTMLPointers can survive different normalisations 
[13:42] <JibberJim> Is Open SP's normalisation compatible with IE's normalisation.
[13:42] <nick_kew> I've no idea what IE's normalisation is
[13:42] <JibberJim> It's very close to OpenSP's actually!
[13:42] <JibberJim> For HTML4 at any rate.
[13:43] <sbp> normalization differences: hence the "fuzzy" in "fuzzy HTMLPointers"...
[13:43] <nick_kew> Hmmm .. maybe a bit of trial-and-error?
[13:43] <JibberJim> I reckon that in practice they'll be accurate enough.
[13:43] <nick_kew> sbp, the pointers won't be fuzzy.  Jst right or wrong
[13:44] <nick_kew> So I can say testSubject = http://www.foo.org/subject.html#e42
[13:44] <JibberJim> I'm using Fuzzy in the sense you'll say 
[13:44] <JibberJim> html/body/table/tbody/tr/td/img[src='moomin.gif']
[13:44] <JibberJim> and I'll have
[13:44] <JibberJim> html/body/table/tr/td/img[src='moomin.gif']
[13:45] <JibberJim> but it won't take us much to find the same image in that instance.
[13:46] <nick_kew> Jim, will you have html/body stuff if the source omits those elements?
[13:46] <JibberJim> I would in my parsers.
[13:46] <JibberJim> but as we know they are optional in HTML, someone implementing Fuzzy HTMLPointers would know it should look without those.
[13:47] <JibberJim> As in if it doesn't have an HTML, your parser probably stuck it in. Same as if you don't find an HTML, you know it's optional so stick it and try again.
[13:48] <nick_kew> hmmmmm..
[13:48] <JibberJim> We know what elements are optional, therefore we know what elements may or not be needed in the Fuzzy Pointer.
[13:49] <nick_kew> But that still leaves us with
[13:49] <nick_kew>  html/body/table/tr/td
[13:50] <nick_kew> - which table
[13:50] <nick_kew> -which tr 
[13:50] <nick_kew> ...
[13:50] <JibberJim> Well that's the XPointer syntax - do you know XPointer?
[13:50] <nick_kew> With the help of a reference 
[13:51] *** sbp has quit IRC (Homer: 20 dollars? I wanted a peanut!)
[13:51] <JibberJim> Xpointers in Annotea seem to work reasonably well, and they are using Amaya to enter them, and IE to read them back... (for me.)
[13:51] <nick_kew> I'm just thinking this is looking significantly harder than referencing the normalised source by ID
[13:52] <JibberJim> yes, but that's specific to you!
[13:52] <nick_kew> Well, for everything Page Valet has done hitherto, that's not been an issue
<wendy> chris ridpath asked these questions when he was working on a-prompt. he kept wanting to create a version of the HTML w/EARL embedded.
<wendy> are there arguments against doing that?
[13:54] <JibberJim> fuzzyPointer( /html/body/table[2]/tr[1]/td[0]/img[src='moomin.gif'] )
[13:55] <nick_kew> That's what I want to do, wendy.
<wendy> are jim and sbp arguing against that?
[13:55] <JibberJim> the [src='moomin.gif'] isn't XPointer I don't think.
<wendy> jim - it would be "img/@src='moomin.gif'" (if i remember correctly...)
[13:56] <JibberJim> yeah that sounds likely!
<wendy> or img[@src='moomin'gif']" depedning on what you're doing.
[13:56] <nick_kew> I think I'll try that anyway
[13:56] <JibberJim> What HTML source and EARL combined?
[13:57] <nick_kew> - maybe do other things too
[13:57] <nick_kew> yep, Jim
[13:57] <JibberJim> I wouldn't be able to take that and combine it with my report though :-(
<wendy> it seems you ought to be able to say, "earl:testsubject [bunch of html code] passes checkpointx"
<wendy> or something like that, right?
<wendy> then, it's still in EARL, but you don't have to worry about ids.
<wendy> i guess the issue then is order of the html
[13:58] <nick_kew> Well, I'm thinking in practical terms.  Do what I can first, then extend it to do what you want ...
<wendy> and replication - if bits of code have more than one issue...
<wendy> oh well.
<wendy> i'm sure you'll come up with something interesting nick
<wendy> it certainly is an issue.
[13:58] <nick_kew> wendy, can do that easily, but it's going to generate horrendously big reports
<wendy> i'm glad you're pushign it. this is the type of feedback we need to stabilize the language.
<wendy> make it robust and good.
[13:59] <JibberJim> Do you not like the Fuzzy HTML Pointers?
<wendy> yes, it would be horrendous.
<wendy> fuzzy is good. :)
[13:59] <nick_kew> Jim, need to think through that one
[13:59] <JibberJim> Okay, I'll let you think...
[13:59] <nick_kew> duh
<wendy> should we post the log of this to the list?
[14:00] <nick_kew> if you like
[14:36] <JibberJim> Wendy?
<wendy> yes?
[14:36] <JibberJim> I've got some EARL into the Annotation database...
[14:37] <JibberJim> What do you want me to do with it now?
<wendy> :)
<wendy> can it be viewed through the annotea client?
[14:37] <JibberJim> Erm.  Mine (because it's IE, and doesn't understand the application/rdf+xml mime type doesn't at the moment.
[14:38] <JibberJim> I don't have any other annotea clients to know.
[14:38] <JibberJim> If you've got Amaya handy...?
<wendy> i do have amaya handy.
[14:38] <JibberJim> Then look at my annotation on http://jibbering.com/earl/ and see what you get.
* wendy tries
<wendy> i need to select an annotation server - help?
[14:40] <JibberJim> http://annotest.w3.org/annotations
[14:40] <JibberJim> Assuming you've got a username and password?
<wendy> jim - ok. i got access to the server (an automatic process!) - i get an error:
<wendy> there were some orphan annotations. You may see them with the Links view.
<wendy> however, nothing shows up in the links view.
<wendy> an amaya problem? wendy configuration problem? or jim's annotation problem?
<wendy> ah. apparently the problem was me.
<wendy> i do get that message, but i now also see links.
[15:05] <JibberJim> Links?
[15:05] <JibberJim> What links? and what message do you get?
<wendy> 1. i select "load annotations" in amaya
<wendy> 2. a window pops up that says, "Annotation load: there were some orphan annotations. You may see them with the Links view."
<wendy> 3. from the "view" menu i select "show links"
<wendy> 4. in the links view I see some links. one is marked as an annotation. another is marked as an annotation with a question mark.  then, there is a list of other links.
<wendy> ---
[15:07] <JibberJim> And this is on my url? http://jibbering.com/earl/  ?
<wendy> yes
[15:07] <JibberJim> but it only has one annotation on it!
[15:08] <JibberJim> and it shouldn't be orphaned it was applied to the HTML element!
<wendy> i don't have the latest version of amaya...i should probably upgrade.
<wendy> when selecting the annotation, instead of viewing it it saved it to disk. odd.
<wendy> let me upgrade and see what happens.
<wendy> hold on a few mins.
[15:09] <JibberJim> That's because it's sending the incorrect MIME-type for the EARL wendy.
<wendy> ah.
* wendy looks at what was saved
[15:10] <JibberJim> I'm not sure if it's a bug in my RDF, or a bug in the Annotea server, I need an RDF expert :-)
[15:11] <JibberJim> well maybe not an expert, just someone better than me...
<wendy> ah. yep - it's downloading your rdf.
<wendy> did you send it by sbp?
[15:12] <JibberJim> Yes, he told me that syntax.
<wendy> hmmmm. what about ralph or ericp?
[15:13] <JibberJim> And the RDF validator, does what I expect, I think it might be a bug in the Annotea server.
[15:14] <JibberJim> Shall I post to the Annotation list about it and the problem?
<wendy> hmm. dunno. have you talked w/eric and ralph? do you want me to?
<wendy> yes! do that.
<wendy> although...
[15:15] <JibberJim> I've never met them other than in passing.
<wendy> let me upgrade first to make sure there isn't anything wrong w/my client.
[15:16] <JibberJim> They're definately sending the wrong mime-type, the question is more if they are sending the wrong mime-type because I've send them the wrong one or not!
<wendy> :)
<wendy> ok. confirmed that there is an issue. definitely raise it w/the mailing list.
[15:24] <JibberJim> So what do you think my browser should do with the earl it gets?
<wendy> well...having not used your browser yet (eeks, will do soon)...
<wendy> for a start, how about showing the earl?
[15:24] <JibberJim> It's just IE.
<wendy> it would be cool to use sean's earl to xhtml transform to display it.
[15:25] <JibberJim> url?
[15:25] <JibberJim> It's in python though - my browser doesn't talk python...
<wendy> fyi - all earl imps that I know of i've listed at: http://www.w3.org/WAI/ER/#earl
<wendy> http://lists.w3.org/Archives/Public/w3c-wai-er-ig/2001Sep/0012.html
<wendy> no, it's xslt not python

-- 
wendy a chisholm
world wide web consortium 
web accessibility initiative
seattle, wa usa
/--
Received on Friday, 1 February 2002 13:01:41 GMT

This archive was generated by hypermail 2.2.0 + w3c-0.30 : Thursday, 9 June 2005 12:10:40 GMT