Re: ISSUE-71: annotatorsRef from Yves Savourel on 2013-01-21 (public-multilingualweb-lt-comments@w3.org from January 2013)

From: Yves Savourel <ysavourel@enlaso.com>
Date: Mon, 21 Jan 2013 07:33:34 -0700
To: <public-multilingualweb-lt-comments@w3.org>
Message-ID: <assp.073319af06.assp.0733c5f76a.00be01cdf7e4$4ad09440$e071bcc0$@com>

>> Updating the annotatorsRef also puts a big burden on the implementer:
>> the easiest way to do it is to set it for the modified node, and then 
>> you can end-up with a document full of meaningless annotatorsRef 
>> (set on parents and immediately overridden).
>
> I understand the issue, but do you have a solution in mind? Just trying 
> to move the discussion forward.

I think the issue of possibly having a lot of useless annotatorsRef attributes dangling around after an update can be solved by a simple cleanup of the tree once in a while: you traverse the tree and remove any redundant attributes. So I think that part is not a big issue.

The most problematic part is what to do with LQI and Provenance, especially Provenance where annotatorsRef is required) when you have entries coming from different annotators on the same node. (question ii from chase & Kevin).

I don't have a solution for that.

Just to illustrate the problem:

Let say we have an initial document marked up by Tool1:

<doc xmlns:its="http://www.w3.org/2005/11/its" its:version="2.0">
 <prolog>
  <its:rules version="2.0">
   <its:withinTextRule selector="//its:span" withinText="yes"/>
  </its:rules>
  <its:locQualityIssues xml:id="lqi-1">
   <its:locQualityIssue locQualityIssueComment="Should be Rome"/>
  </its:locQualityIssues>
 </prolog>
 <body>
  <para id="p1"><its:span annotatorsRef="lq-issue|Tool1" 
   locQualityIssuesRef="#lqi-1">nome</its:span> is the capital city of Italy.</para>
 </body>
</doc>

Then Tool2 reads the document, preserve the existing annotations, and add its own. In that process there is no way currently to end up with the information about which tool generated which of the two entries:

<doc xmlns:its="http://www.w3.org/2005/11/its" its:version="2.0">
 <prolog>
  <its:rules version="2.0">
   <its:withinTextRule selector="//its:span" withinText="yes"/>
  </its:rules>
  <its:locQualityIssues xml:id="lqi-1">
   <its:locQualityIssue locQualityIssueComment="Should be Rome"/>
   <its:locQualityIssue locQualityIssueComment="Should start with a capital" locQualityIssueType="grammar"/>
  </its:locQualityIssues>
 </prolog>
 <body>
  <para id="p1"><its:span annotatorsRef="lq-issue|Tool2" 
   locQualityIssuesRef="#lqi-1">nome</its:span> is the capital city of Italy.</para>
 </body>
</doc>

It's probably not a huge problem with LQI, but I'm guessing it's more critical with Provenance where the annotator is an important part of the information.

The only solution I can think of is really not nice. It would be to add some attribute to <locQualityIssue> and <provenanceRecord> that tells the annotator and override any annotatorsRef value. But that is starting to make things really complicated. They are already quite difficult to understand.


cheers,
-yves

Received on Monday, 21 January 2013 14:34:06 UTC