Potential for synergies on the implementation and "data" level: MQM and ITS 2.0 from Felix Sasaki on 2013-06-07 (public-i18n-its-ig@w3.org from June 2013)

From: Felix Sasaki <fsasaki@w3.org>
Date: Fri, 07 Jun 2013 21:34:51 +0200
To: "Arle Lommel (Arle.Lommel@dfki.de)" <Arle.Lommel@dfki.de>, Aljoscha Burchardt <aljoscha.burchardt@dfki.de>, kim_harris@textform.com
CC: public-i18n-its-ig@w3.org
Message-ID: <51B235DB.8050404@w3.org>
Hi Arle, Aljoscha, Kim, with CC to the W3C i18n ITS Interest Group,

there is now a great opportunity to build synergies between the 
QTLaunchPad Multidimensional Quality Metrics (MQM) and ITS 2.0.

Some background for the ITS IG members who don't know MQM: The EC-funded 
QTLaunchPad project is developing a unified and customizable, 
multidimensional framework for translation quality assement built around 
metrics of fluency, accuracy, and end-user adequacy.

Some background for all in this thread: so far the relation between the 
MQM model and ITS 2.0 is rather general, see e.g. the description part 
of the next week Localization World FEISGILTT event
http://www.localizationworld.com/lwlon2013/feisgiltt/accepted.html
"A further point of contact (of MQM) is with the ITS 2.0 specification, 
which provides a mechanism to refer to the quality expectations outlined 
in an STS and to integrate them into a standard, QTLaunchPad-compatible 
mechanism that enables quality to be addressed in any tool that 
implements ITS 2.0’s quality markup. "

By "rather general" I mean that the integration of MQM into the ITS 2.0 
on a detailed, "implementation" level hasn't happend yet. Some 
activities have rather happened in parallel, like:


1) specifying MQM types and ITS 2.0 types
http://www.w3.org/TR/its20/#lqissue-typevalues
As I understand Arle there is an informal mapping (which is in flux), 
but no formal relation has been defined, that is something implementable 
as an automatic conversion. Since MQM is more expressiv than ITS .20 
such a formal mapping for sure would be with information loss, but 
having an exact description of what's lost will be very valuable.

2) specifying a serialization of MQM and of ITS 2.0 quality issue 
markup. ITS 2.0 has a mechanism to serialize one or more localization 
quality isssues for the same span of text, see
http://www.w3.org/TR/its20/#EX-locQualityIssue-local-2
As I understand Arle, for MQM there is the requirement of annotating 
potentially overlapping quality issues - this couldn't be done with ITS 
2.0 markup, that is: one cannot reuse the ITS 2.0 markup for all of MQM 
markup.

3) The ITS 2.0 links to an informal mapping of existing tools to ITS 2.0 
types
http://www.w3.org/International/its/wiki/Tool_specific_mappings
as I understand Arle, MQM is working on a similar mapping, taking 
detailed feedback from LSPs into account.


There might be other areas, if you see them please let me know.

Now, if we resolve 1-3 or at least describe for implementers how MQM and 
ITS 2.0 relate, we can
- avoid confusion by implementers why there are two ways to express 
localization issue information, but just explain the differences in detail;
- get implementers actually to implement both MQM and ITS 2.0. ITS 2.0 
quality issue is currently being implemented in three tools
http://www.w3.org/TR/2013/WD-mlw-metadata-us-impl-20130307/#Quality_Check
http://www.w3.org/TR/2013/WD-mlw-metadata-us-impl-20130307/#Harnessing_ITS_2.0_Metadata_to_Improve_the_Human_Review_Process
http://www.languagetool.org/

We may be able to convince the ITS 2.0 implementers to integrate tooling 
for MQM in their tools as well. This would be a big success for both 
efforts.


So I have written this mail to start a conversation, so that we get 
feedback from all stakeholders. In addition and to move this forward, I 
have a concrete suggestion, based on discussions I had with Arle and 
Aljoscha already: AFAKI both MQM and ITS 2.0 will be presented at 
TCWorld this year. We could take this as a milestone for setting the 
relation in stone on an implementation level, and integrate examples in 
presentations vice versa. What do you think?

Btw., two ITS 2.0 localization quality issue implementers, Yves Savourel 
and Phil Ritchie, will be at LocWorld next week too. So you may already 
touch base?


As some input, from the ITS 2.0 side there is this input, summarized below:

- Localization Quality Issue definition http://www.w3.org/TR/its20/#lqissue
- Normative type values http://www.w3.org/TR/its20/#lqissue-typevalues
- Non normative mappings to tools 
http://www.w3.org/International/its/wiki/Tool_specific_mappings
- ITS 2.0 Localization Quality issue in the ITS 2.0 test suite
** input files 
https://github.com/finnle/ITS-2.0-Testsuite/tree/master/its2.0/inputdata/locqualityissue
** output files 
https://github.com/finnle/ITS-2.0-Testsuite/tree/master/its2.0/expected/locqualityissue
** outpuf files in XLIFF (just informative, not set in stone) 
https://github.com/finnle/ITS-2.0-Testsuite/tree/master/its2.0/xliffsamples/inputdata/locqualityissue
** an output of the XML intput files in RDF, using the RDF "NIF" vocabulary
https://github.com/finnle/ITS-2.0-Testsuite/tree/master/its2.0/nif-conversion/expected 

I am mentioning NIF here since it provides a solution to the overlapping 
representation issue that I had mentioned above.

It would now be interesting to see the latest MQM model here and example 
files.

Finally, let me mention that this mailling list is not for the 
development of ITS 2.0 - this is an open list to discuss issues like 
with this mail. And in the next months we will use regular phone calls 
to discuss topics like this.

Best,

Felix
Received on Friday, 7 June 2013 19:35:22 UTC