RE: [xliff] ITS scope with sm/em

Hi Fredrik, all,

> This can be solved by lowering the <pc> into an <sc/>,<ec/> pair.

That is a good point for that example, and a solution that should work most of the time.

But I believe we will have some cases at least of overlapping annotations.

As an example, below is the result of two text analysis Web services that detected two entities: One "Port Metro Vancouver" and the
other "City of Vancouver" based on the content "Port Metro of Vancouver City". So we end up with "Vancouver" being shared by the
two--otherwise distinct--annotation spans. 

<sm id="m1" type="dbp:entity" ref="http://www.wikidata.org/wiki/Q1187234"/>Port Metro of <sm id="m2" type="oc:entity/City"
value="City of Vancouver" ref="http://en.wikipedia.org/wiki/Vancouver"/>Vancouver<em startRef="m1"/> City</em startRef="m2"/>

One of the annotations could be set to an <mrk>, but that would leave one as <sm/>/<em/>.

And the point I was trying to make for Felix is that such annotation, unlike for a Translate data category for example, cannot be
decomposed into several <mrk> because the ITS information (here it would some Text Analysis data), applies only to the complete span
not its parts.

In other words we cannot do:

<mrk id="m1" type="dbp:entity" ref="http://www.wikidata.org/wiki/Q1187234">Port Metro of <mrk id="m2" type="oc:entity/City"
value="City of Vancouver" ref="http://en.wikipedia.org/wiki/Vancouver">Vancouver</mrk></mrk><mrk id="m2bis" type="oc:entity/City"
value="City of Vancouver" ref="http://en.wikipedia.org/wiki/Vancouver"/> City</mrk>

because "City" should not be associated alone with the ITS data.

Sure, a tool could detect that two consecutive <mrk> with the same ITS information should be seen as a single one, but that is not
an ITS processing expectation.

I'm not sure what transformation would resolve this problem.

Cheers,
-ys

Received on Sunday, 12 October 2014 12:32:15 UTC