Re: Building a test corpus for ER tools

>>  * Set up a new EARL server, based on the Annotea server.  Ensure that
>>    it properly tracks changes in pages, as this is more important to
>>    us than to Annotea.
>
>Tracking changes isn't an Annotea issue certainly, but we need some way of
>looking at changes, I've been trying to find time to look at this from my
>own link maintenance needs but haven't got anywhere really yet. (link
>monitoring in usenet FAQs after I got caught out with pages changing
>subjects entirely without changing urls) This is an important issue I
>think, and one that whilst we've often touched on in meetings haven't
>fully researched, although again real world experience is likely needed
>which a corpus would also give (perhaps it would be a good idea to cache
>pages we evaluate somewhere.

I think tracking changes in web pages is more of a TestSubject issue (see
the thread above).  It should be the EARL-producing client's responsibility
to compute a hash of the page contents (or whatever) that will uniquely
identify a page's content with a high probability - the EARL database just
needs to be able to tell the difference between two hashes if you ask for a
specific one from a certain date.

>One thing before this is an agreement on a single namespaces to use, to
>make the evaluation tools simpler, I'd propose either rapidly agreeing a
>1.0 namespace or formalise the 1.0-test.

One problem with the current namespace (this has probably already been
discussed, forgive me if I'm repeating anything) is that there are too many
ways to say the same thing.  Example: earl:passes vs. earl:validity
earl:Pass; earl:confidence earl:Certain.  If someone asserts a Likely Pass
we should be able to query for any passing state and return that along with
plain old "passes."  Algae queries can't do that because they don't "know"
what "passes" is supposed to represent.  I'm also worried by the 0.95
examples allowing 
:Joe earl:asserts{ :SVGTool :passMedium :CircleTest}
As far as I can tell (correct me if I'm wrong here), Algae will think
that's different from
:Joe earl:asserts {rdf:type :Assertion; 
			rdf:subject :SVGTool; 
			rdf:predicate :passMedium;
			rdf:object :CircleTest;}

So either we need to make EARL stricter so that there is only one way to
say anything so that we can know what the exact relationship between any
bits of triples we want to query against each other (like Annotea), or the
database needs to be able to somehow look up the canonical form of any
statement and convert when something is submitted.

It'd also be kind of cool to have a database that allows queries to go
"triple-surfing" so you could find the relationship between any two bits of
triples by walking up a chain of identical subject-object pairs. It seems
like something that would be potentially really slow, but has probably
already been implemented somewhere.

Nadia

Received on Thursday, 7 March 2002 06:07:12 UTC