- From: Arle Lommel <arle.lommel@dfki.de>
- Date: Mon, 10 Jun 2013 11:54:09 +0200
- To: Phil Ritchie <philr@vistatec.ie>
- Cc: "Arle Lommel" <arle.lommel@dfki.de>, "public-i18n-its-ig@w3.org" <public-i18n-its-ig@w3.org>, "Felix Sasaki" <fsasaki@w3.org>, "kim_harris@textform.com" <kim_harris@textform.com>, "Hans Uszkoreit" <uszkoreit@dfki.de>, "Aljoscha Burchardt" <aljoscha.burchardt@dfki.de>
- Message-Id: <19C37BE4-8D74-4972-8126-1E03DCDCB898@dfki.de>
I'm not entirely certain on using offset markup, although that is how xliff:would handle it. We considered it, and one of our internal staff here at DFKI strongly recommended it, but our worry is that the offsets are dependent on the content not changing at all (or on having a tool update them when making changes), which are risky dependencies. But maybe there are ways to mitigate that. -Arle On 2013 Jun 10, at 11:47 , Phil Ritchie <philr@vistatec.ie> wrote: > Having worked on the NIF mapping I think the way it encodes the character offsets of the text which is the subject of the metadata is very elegant. > > e.g. its-loc-quality-isssue-char-offsets="9,18" > > However, I guess anything like that is out of the question at this stage because it'd require a change to the ITS spec. > > Phil > > > > On 10 Jun 2013, at 10:27, "Arle Lommel" <arle.lommel@dfki.de> wrote: > >> Hi all, >> >> One of the issues Felix and I discussed for improving compatibility between Mutlidimensional Quality Metrics (MQM) (the QTLaunchPad quality system originally derived from ITS 2.0) and ITS 2.0 is the following: >> >> We need a way to mark up overlapping spans. For example, if you have the following HTML5 segment: >> >> <p>Fifteen <em>relays is</em> involved in the operation.</p> >> >> Which should be >> >> <p><em>Fifteen relays</em> are involved in the operation.</p> >> >> You have two issues: >> >> The markup is misplaced (ITS 2.0 markup and MQM markup, misplaced, which is a subtype of markup) >> There is an agreement error (ITS 2.0 grammar and MQM agreement, which is a subtype of grammar) >> >> The mapping from MQM to ITS 2.0 is clear here, but we need a way to mark up the overlapping spans. So far we have internally used something like this: >> >> <p>Fifteen <mqm-startIssue type="markup, misplaced" id="1" /><em>relays <mqm-startIssue type="agreement" id ="2" />is</em><mqm-endIssue id="1" /> involved</mqm-endIssue id="2" /> in the operation.</p> >> >> We want a good path to interoperability with ITS. So we need a way to put the following information in the document on overlapping spans using local markup: >> >> its-loc-quality-issue-type="grammar" itsx-mqm-issue-type="agreement" its-loc-quality-comment="should be "relays are"" (etc…) >> >> Any suggestions for how to handle this use case? We want to make it as easy as possible to use MQM and ITS together, where MQM provides mechanisms for greater granularity while still retaining compatibility with ITS and ITS provides a way to share MQM data at a common granularity with other systems. >> >> Right now we are working to ensure that ITS 2.0 will be fully conformant to MQM (with a few simple mappings for things like issue type names) and that MQM will have a clean mapping to ITS 2.0. (Note as well that MQM will provide ways to define quality profiles and handle some things not covered by ITS, like sharing scoring methods, possible data category selections, etc., so MQM adds significant capability to ITS 2.0 and isn't just an alternative, but rather a larger way of handling some details out of scope for ITS 2.0. >> >> I'll write more up later, but if anyone has good ideas for how to hand the overlapping spans in an ITS 2.0-friendly way, please make suggestions. >> >> Best, >> >> Arle > > ************************************************************ > VistaTEC Ltd. Registered in Ireland 268483. > Registered Office, VistaTEC House, 700, South Circular Road, > Kilmainham. Dublin 8. Ireland. > > The information contained in this message, including any accompanying > documents, is confidential and is intended only for the addressee(s). > The unauthorized use, disclosure, copying, or alteration of this > message is strictly forbidden. If you have received this message in > error please notify the sender immediately. > ************************************************************ >
Received on Monday, 10 June 2013 09:54:39 UTC