RE: issue-68 from an annotation representation point of view, with potential implications for annotatorsRef and standoff markup from Pablo Nieto Caride on 2013-01-29 (public-multilingualweb-lt@w3.org from January 2013)

From: Pablo Nieto Caride <pablo.nieto@linguaserve.com>
Date: Tue, 29 Jan 2013 12:53:20 +0100
To: 'Tadej Štajner' <tadej.stajner@ijs.si>, "'Felix Sasaki'" <fsasaki@w3.org>
Cc: "'Phil Ritchie'" <philr@vistatec.ie>, 'Mārcis Pinnis' <marcis.pinnis@Tilde.lv>, "'Yves Savourel'" <ysavourel@enlaso.com>, <public-multilingualweb-lt@w3.org>, 'Artūrs Vasiļevskis' <arturs.vasilevskis@Tilde.lv>
Message-ID: <056a01cdfe17$3bf27220$b3d75660$@linguaserve.com>

Hi Felix, Tadej, all,

Apologies in advance for the long mail.
I don't know if pointing to just one external standoff unit was a feature or an oversight, in any case what I don't like about the current approach is that given this example:

<body>
<p>Some text</p>
<p>Some text</p>
</body>

After the annotation of a MT System we could have this:
 <body>
 <p its-provenance-records-ref="#pr1">This paragraph was translated from the machine.</p>
 <p its-provenance-records-ref="#pr1">This paragraph was translated from the machine.</p>
 </body>
And this:
<its:provenanceRecords xml:id="pr1" xmlns:its="http://www.w3.org/2005/11/its" version="2.0">
<its:provenanceRecord toolRef="http://www.cngl.ie/mlwlt-dev/" />
</its:provenanceRecords>

If the second p element is revised with a Content Editor we would have this:
<body>
<p its-provenance-records-ref="#pr1">This paragraph was translated from the machine.</p>
<p its-provenance-records-ref="#pr2">This text was translated from the machine and revised.</p>
</body>
And this:
<script id=pr1 type=application/its+xml>
<its:provenanceRecords xml:id="pr1" xmlns:its="http://www.w3.org/2005/11/its" version="2.0">
<its:provenanceRecord toolRef="http://www.cngl.ie/mlwlt-dev/" />
</its:provenanceRecords>       
</script>
<script id=pr2 type=aplication/its+xml>
<its:provenanceRecords xml:id="pr2" xmlns:its="http://www.w3.org/2005/11/its" version="2.0">
<its:provenanceRecord revPerson="John Smith" />
<its:provenanceRecord toolRef="http://www.cngl.ie/mlwlt-dev/" />
</its:provenanceRecords>
</script>

The problem here is that you have to split the original provenanceRecords and create a new one (with the correspondent script element) and then you create a lot of duplicated records of stand-off markup, so with a combination of various MT System and revisor the document could end up unreadable for a human being.

With the approach of inverting the reference mechanism between the element being annotated and the local stand-off element, you avoid the problem of tons of markup and besides it would solve the problem with several annotators for Provenance and LQI, but it goes without saying that is not easy to implement too, plus I see the problem of too many references since the IDs of the elements must be unique, for instance ,given the previous example, you would eventually have something like this:
<p id="pr1">This paragraph was translated from the machine.</p>
<p id="pr2">This text was translated from the machine and revised.</p>
And this:
<its:provenanceRecords xmlns:its="http://www.w3.org/2005/11/its" version="2.0">
<its:provenanceRecord ref ="pr2" revPerson="John Smith" />
<its:provenanceRecord ref ="pr1 pr2" toolRef="http://www.cngl.ie/mlwlt-dev/" />
</its:provenanceRecords>

But what if you have hundreds of elements? You would end up with a reference like this:
<its:provenanceRecord ref ="pr1 pr2 ... prn" toolRef="http://www.cngl.ie/mlwlt-dev/" />
Or maybe I'm missing something.

I don't have any problem with leaving things as they are, IMO they are covering the requirements, and changing the implementations and the parsers at this stage could be troublesome for some implementers.

Cheers,
Pablo.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

>Hi, Felix, Phil,
>maybe 'tanRefs' was misleading. the intention was to point to an its:textAnalysisAnnotations, element which could in turn contain contain several its:textAnalysisAnnotation elements that all describe the same fragment. Is this valid usage of the its:textAnalysisAnnotations, or was it only meant to be a >container for the individual rules? I was looking at this example for inspiration:
>http://www.w3.org/International/multilingualweb/lt/drafts/its20/examples/xml/EX-locQualityIssue-local-2.xml

>Alternatively, having multiple values would also work equivalently, then we could point to individual textAnalysisAnnotation statements.
>-- Tadej

On 29. 01. 2013 10:41, Felix Sasaki wrote:
> Thanks, Phil. Tadej, was the intention of its:tanRefs at 
> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2013Jan/
> 0212.html
>
> to have several pointers, e.g. allow for
> its:tanRefs="tan1 tan2 tan3"
> or just one, that is only "tan1"?
>
> Best,
>
> Felx
>
>
> Am 29.01.13 10:34, schrieb Phil Ritchie:
>> All
>>
>> @Felix: "But while doing that a question on the LQI/Provenance
>> implementers: is it a feature that you point to just one external 
>> standoff unit, or an oversight, and it could it be several ones?"
>>
>> My current thinking is that stand-off stores many annotations for one 
>> segment. This is because if several segments are linked to one 
>> stand-off block, then if one of those segments needs to have another 
>> unique issue registered against it, you have to copy the stand-off, 
>> add the unique annotation and change the reference id's so that the 
>> link is between the segment with the additional annotation and the copied stand-off.
>> Complex.
>>
>> Another argument for pointing to a single stand-off is that although 
>> the "classification" attributes of the markup might be identical (e.g.
>> loc-quality-issue-type="style" loc-quality-issue-severity="75") each 
>> may have a different loc-quality-issue-comment to highlight the 
>> specific nature of the error.
>>
>> Hmm. The benefit of the id being on the segment/element and the 
>> idRefs being on the stand-off really comes into its own if you want 
>> to have multiple annotations across many data categories for the same 
>> segment/element.
>>
>> <span id="loaded">blah</span>
>>
>> <its:prov ref="loaded"...
>> <its:locQualityIssues ref="loaded"...
>> <its:textAnalysis ref="loaded"
>> (on the train, I know this is not valid markup.)
>>
>> Phil
>>
>>
>>
>> On 28 Jan 2013, at 19:57, "Felix Sasaki" <fsasaki@w3.org> wrote:
>>
>>> But while doing that a question on the LQI/Provenance implementers: 
>>> is it
>> a feature that you point to just one external standoff unit, or an 
>> oversight, and it could it be several ones?
>>
>>
>> ************************************************************
>> This email and any files transmitted with it are confidential and 
>> intended solely for the use of the individual or entity to whom they 
>> are addressed. If you have received this email in error please notify 
>> the sender immediately by e-mail.
>>
>> www.vistatec.com
>> ************************************************************
>>
>>
>
>

Received on Tuesday, 29 January 2013 11:53:49 UTC