RE: [xliff] ITS scope with sm/em

Hi Felix, all,

> Not sure . the limitation of not allowing for overlap in ITS is 
> shared with general XML and HTML. The reason is that if you constrain yourself 
> to hierarchical structures, hierarchy based queries become possible - like in 
> CSS selectors or XPath - and simple processing like styling based on nesting.
> NIF allows you for describing all kinds of relations, but you cannot query 
> hierarchies in NIF.
> I probably don't know yet how the overlap issues is solved in XLIFF or other XML 
> markup languages. I know of a typical set of solutions, see 
> http://www.tei-c.org/release/doc/tei-p5-doc/en/html/NH.html
> what does XLIFF do about this and how does XLIFF then deal with the "query overlap 
> + hierarchies at the same time" challenge?

That a good question. My guess is that we have not run into such type of requirement yet.

But my other thought is that XLIFF doesn't necessarily have to deal with "query overlap + hierarchies", as it is just an exchange
format. The application importing the documents does, and it is not always an XML-based one.

For example, the current version of the Okapi XLIFF2 library has a unit.getAnnotatedSpans() method that return a list of all spans
of content delimited by markers (mrk, sm/em) which works across overlaps or even segments.
Another unit.getTranslateStateEndings() provides the list of the translate state at the end of each segment or ignorable part in the
unit. It allows to do things such as generating an HTML view of the content where translatable or non-translatable parts (including
overlapping ones) are styled differently.

I suppose what I'm trying to say is that we have to be careful to not always assume applications using XLIFF have the constraints
and advantages of XML applications.

But I'm getting away from the topic of discussion. In this specific case we are just looking at how the overlapping ITS annotations
could be processed with pure ITS engines: The answer is that the various transformation steps Fredrik and you came up with
(including the global rules for the last remaining overlaps) can probably do it.


> ...
> That looks like it indeed. The next step would be test input files and 
> test output I guess? 

+1


Cheers,
-yves

Received on Monday, 13 October 2014 12:05:19 UTC