Re: Issue-55: XLIFF mapping - Terminology and termInfoPointer from Dr. David Filip on 2013-02-20 (public-multilingualweb-lt@w3.org from February 2013)

From: Dr. David Filip <David.Filip@ul.ie>
Date: Wed, 20 Feb 2013 18:21:57 +0000
To: Yves Savourel <ysavourel@enlaso.com>
Cc: public-multilingualweb-lt@w3.org
Message-ID: <CANw5LKnVqMd6QoM_gjvmh9hQn47x-a5V+=Sq4+N+OEokqzrpeQ@mail.gmail.com>
Thanks Yves, inline again..

Dr. David Filip
=======================
LRC | CNGL | LT-Web | CSIS
University of Limerick, Ireland
telephone: +353-6120-2781
cellphone: +353-86-0222-158
facsimile: +353-6120-2734
mailto: david.filip@ul.ie


On Wed, Feb 20, 2013 at 12:38 PM, Yves Savourel <ysavourel@enlaso.com> wrote:
>>> --- Can we put other ITS data categories in that same <mrk> too?
>>> -> why not?
>>
>> I understand that the mrk is taken exclusively for term if
>> the mtype="term" and I think this is OK.
>> Generally I am not opposed to using mrk for multiple functions.
>> But we should be using core repertoire for encoding ITS stuff
>> whenever possible to nurture general interoperability and not
>> enforce support for its specific constructs to make use of the
>> metadata. So for me being able to use a generic method is more
>> important than making mrk generally usable for encoding more its
>> categories at the same time..
>
> That's nice. But in the original document side we may have combinations of ITS data categories on the same element. We can split them into several <mrk>, but then again, the round-trip become complicated: we change the markup of the original content, with all the side effects I've mentioned already.

Yves, my basic assumption is that merge assumes full extractor
knowledge. It is indeed out of XLIFF scope to ensure fully compatible
re-merge without full extractor knowledge..

If the extractor "knows" that several markers were extracted from one
span, or if it "knows" that a terminology marker originates from a
structural element, it should be able to repopulate the ITS info onto
the right locations.
I think there is a bigger issue with categories that are not
introduced at the authoring time, but come instead during the
localization cycle, such as mt-confidence or LQ related stuff.
mtconfidence should be less problematic as it should always occur
structurally; it does not make sense on sub-segment level IMHO.
But the LQ stuff can appear on any spans, structurals and
combinations. You are basically unable to return this info after
roundtrip without intimate source format knowledge such as what type
of span is allowed in the given XML vocabulary at the level where you
are returning the LQ info..

I think that we should look into logical equivalence in a few
prominent source formats, to be able to solve this bunch of issues.
We can for instance say that
<p its-term="yes">Prague</p> is logically equivalent to <p><span
its-term="yes">Prague</span></p> and that the latter is encouraged

>
>
>>> --- How do we express its;term='no'?
>>> Is it even needed in XLIFF?
>>
>> I do not think it is needed. Term='no' can be either ignored on
>> extraction. Or if we insist on having it we can introduce mtype='x-its-Term-No'
>> or similar.
>> This is similar to the translate solution, we chose mrk mtype="protected">...</mrk>
>> and the verbose <mrk mtype="x-its-Translate-Yes">...</mrk> for the opposite value.
>
> I tend to agree. But that would mean XLIFF would have no way, in a round-trip, to "un-termify" a text that was defined as a term in the original. that's fine by me. But is it by all?

I tend to think that un-termification during the process can be a
valid use case [even if minor compared to the main success scenario].
Why shouldn't we go for mtype='x-its-Term-No' to un-termify, while
stickyng to mtype='term' for the main success scenario

Sofisticated people who need to untermify will be able to cope with
the custom value, while the vast majority of main stream use cases
will be handled within the core vocabulary with mtype='term'

I see that these solutions are not elegant, but the advantage is using
as much of the core vocabulary as possible
>
>
>>> --- Do we want to have a <source>/<target>-level terminology info?
>>
>> I do not think that terminology on structural level is strong enough use case.
>>
>>> If no: then what do we do with something like <html:p its-term='yes'>word</html:p>?
>>
>> I do not think that paragraphs should be systematically considered as being
>> possible terms. If a paragraphs happens to consist of a single term, I think that
>> it is an exception, and even the authors should be encouraged to use an
>> embedded span for encoding this rather than say that the whole paragraph is a term.
>> I believe that terminology generally and typically appears 'inline'
>> and that we should be concentrating on this is as the main success scenario.
>
> Fine by me. that solves a lot of issues.
> But then we need to convey this in the BP note going along with the mapping.
> Regardless of what we decide, there will be people who will be using term on structural elements. So it has to be very clear to them that the XLIFF mapping does not allow this.

Above I mused about full extractor knowledge and logical equivalnce of
notation, this is also relevant here..
>
> Shouldn't we start an actual Note document rather than a simple wiki table for the mapping?
>
I agree that we should start a note document and grandfather the
mapping page, is there a template?
Still I would try to stick to the principles captured on the mapping
page as much as possible, at least until it becomes obsoleted with the
note content..
>
> Cheers,
> -yves
>
>
Received on Wednesday, 20 February 2013 18:23:11 UTC