Re: Adding more information about TestSubject from Nils Ulltveit-Moe on 2005-04-11 (public-wai-ert@w3.org from April 2005)

From: Nils Ulltveit-Moe <nils@u-moe.no>
Date: Mon, 11 Apr 2005 20:34:59 +0200
To: Charles McCathieNevile <charles@sidar.org>, public-wai-ert@w3.org
Cc: Nils Ulltveit-Moe <nils@u-moe.no>, Mikael Snaprud <Mikael.Snaprud@hia.no>
Message-Id: <1113244499.23865.355.camel@moe-ulltveit-moe.com>
Hi Charles,

man, 11,.04.2005 kl. 23.45 +1000, skrev Charles McCathieNevile:
> Yep. Having looked at Annotea, do you think we can simply directly use  
> some of their properties?

Use case discussion:
--------------------

A first discussion may be whether EARL would be used
in a push or a pull scenario, or if this matters.

Annotea uses a push protocol to store HTML annotations and RDF metadata
using  HTTP post, and the Annotea server replies with RDF code pointing
to the newly created annotation. 

I have earlier thought of EARL as a pull protocol that returns RDF data
as response to to an accessibility assessment request somehow.

In a fully distributed large scale accessibility assessment scenario,
one might also envision an EARL push scenario, where some distributed
accessibility checker has responsibility for some part of the web, and
at regular intervals checks some URLs, and then posts the results to
some main EARL repository. 

A practical push scenario may be where the web sites are required by law
to be accessible, e.g. for public transport services. The government can
require that accessibility assessment data are delivered to e.g.
national statistics at regular intervals as terms for running the
business. The government need not be bothered with performing the tests
themselves, since that can be done with any accessibility monitoring
tool or service the company prefers, as long as the tool or service
complies with the governments regulations.

The main use case for including the page source, may be help tracing the
problem for web designers for pages that were required by law to be
accessible, and that have got a high change frequency, e.g. the front
page of a public services news site. 

I do not think public officials would be interested in storing these
pages, so the delivered EARL would only refer to the cached version in
local assessment tool. However the local cached version might be EARL
with HTML embedded, depending on how the tool was implemented.

For static pages, it would probably not make sense to embed the source
document.

Note that even if it is useful to store the pages being assessed, as
proof of where accessibility claims are, and also to prove operation of
the accessibility monitoring tool, the downloaded pages, and probably
also EARL data have limited value after a period of time, and may
probably be deleted after national statistics authorities have extracted
the key data they are interested in.

If a public body is doing the accessibility assessments, then
the problem concerning copyright might be solved by placing the
responsibility for doing the large scale accessibility checks on the
national libraries and national archives, which are required to archive
public material by law, and that have legal excempts from copyright law
for doing this. Information about national web archiving institutions
can be found here: http://www.nla.gov.au/padi/topics/92.html

One example of a machinery designed to archive parts of the internet is
the WayBack machine http://www.archive.org/. 


Technical outline:
------------------

HTML may be embedded according to the Annotea RDF namespace for
describing HTTP headers (http://www.w3.org/1999/xx/http#).

A pointer to where the fault occurred the original document (test
subject) is still needed if the HTML is embedded in the EARL. It would
be natural to have a local pointer of some kind (XPointer, fuzzy pointer
or line number) that indexed into the embedded document (i.e. a local
reference not including the host name). 

It might work something similar to the <a:body> tag of Annotea that
either refers to the annotation or embraces the embedded HTML resource.

Annotea example:

Embedded HTML:

  <a:body>
   <r:Description>
    <h:ContentType>text/html</h:ContentType>
    <h:ContentLength>289</h:ContentLength>
    <h:Body r:parseType="Literal">
     <html xmlns="http://www.w3.org/1999/xhtml">
      <head>
       <title>Ralph's Annotation</title>
      </head>
      <body>
       <p>This is an <em>important</em> concept; see
       <a href="http://serv1.example.com/other/page.html">other page</a>.</p>
      </body>
     </html>
    </h:Body>
   </r:Description>
  </a:body>

Reference to HTML body in repository:
<a:body
r:resource="http://annotea.example.org/Annotation/body/3ACF6D756"/>

(in the use case above, the EARL with the embedded HTML is stored in the
local assessment tool, and the link above is passed on to public
officials.)


To do this in EARL, one would need to be able to differentiate between
the original test subject and the cached version embedded in EARL by
using two different references.

I have not had time to think out how the resulting EARL would look
in detail.

My conclusion is that having an option to include the source document
may be useful in some cases, mostly for the web designers and
accessibility tool vendors, to verify the functionality for the tool on
dynamic pages, and to understand what fault the tool reported. It is
probably not so important for public bodies.

Mvh.
-- 
Nils Ulltveit-Moe <nils@u-moe.no>
Received on Monday, 11 April 2005 18:31:13 UTC