Re: idValue requirement updated from David Lewis on 2012-05-01 (public-multilingualweb-lt@w3.org from May 2012)

From: David Lewis <dave.lewis@cs.tcd.ie>
Date: Tue, 01 May 2012 21:54:00 +0100
To: public-multilingualweb-lt@w3.org
Message-ID: <4FA04D68.9080102@cs.tcd.ie>
Hi Yves,
I think you are right about me thinking more along the lines of XLIFF id 
rather than XLIFF resname, but perhaps not exactly in the way you 
characterise it.

I am thinking in terms of an id that can be used to track the progress 
of a specific segment in the content documents (lets park the use of 
multi-segement translation unit in XLIFF for the moment) against the 
corresponding XLIFF id.  However, I'm specifically concerned with the 
round trip use cases where the document may pass from a CMS to an XLIFF 
cycle and back again several times. The use cases I see for this are 
driven by the need for more continuous translation, pipelined at the 
granularity of the segment rather than the document, rather than once 
off hand-overs of documents between processes. Possible use cases might be:

1) a document is having its source revised and is being translated at 
the same time. Readiness of different elements is signalled in the 
document using the readiness/processTrigger data category, which is 
monitored by an LSP which provide updates of segments to be translated 
based on these flags and distributes translations using XLIFF. 
Consistent mapping between all segements and xliff translation unit ids 
is required to ensure that new, modified and deleted trans-units are 
correctly updated and kept in sequence.

2) Translations from one LSP may be undergoing monolingual review 
through direct access to the target on the CMS, while selected 
bi-lingual translation review is being conducted in parallel by another 
LSP. Feedback from both reviews may need to be routed back to the 
translating LSP, so document element-to-XLIFF mappings would be need to 
be reliably maintained for the two sets of XLIFF ids  operated by two 
different LSPs.

In these sort of use cases, where their is ongoing round-tripping 
between the CMS and TMS/XLIFF, then the need for consistent mapping 
between the source document on the CMS and the versions LSPs have, may 
soften the assumption that clients won't be willing to add additional 
elements to the document on the CMS. One can imagine that any augmented 
versions of the content documents would live on a 'staging' CMS while it 
is subject to preparation, translation and review, but prior to publication.

So, this implies a need for an id that is indeed relevant just to the 
localization process, but that never-the-less needs to support a 
persistent mapping between CMS element and trans-unit ID, potentially 
over several CMS-TMS roundtrips. The difference to resname as I 
understand it, is that resname is optional and in a sense best effort - 
if you can't map a trans-unit back to a particular element in the 
source, you can still try and translate the string, you just loose some 
contextual info. So it doesn't have the requirement to comprehensively 
*maintain a mapping between all *trans-units and  source content 
elements in the way I think the above use cases require.

Hope that explains the requirement I had in mind a bit more clearly.

Finally, I'm not sure in any of these cases we are talking about an 
explicit id data category are we?

Would the implementation in fact be rules for generating and maintaining 
the mapping between source elements and XLIFF ids. Very speculatively, 
these could be expressed as some cascading rules for using: 1st) 
existing ids if present; 2nd) combo rules of ID and element names as 
your the updated text; 3rd) if allowed new id in existing elements; 4th) 
if allowed new elements with specific ids; 5th) some sort of external 
hashing pointer (e.g. 
http://nlp2rdf.org/nif-1-0#toc-nif-recipe-context-hash-based-uris) ; 
6th) some sort of character count-based pointer (e.g. 
http://nlp2rdf.org/nif-1-0#toc-nif-recipe-offset-based-uris).  It would 
be a ruleset applicable to the document that we would need to record.

cheers,
Dave




On 01/05/2012 15:20, Yves Savourel wrote:
> I guess what we need to clarify is what are the requirements of the ID value we are discussing.
>
> To me it should be:
> - unique at least within the document
> - the value should be the same in new versions of the document
>
> That's because the type of tasks I would use it for are tasks across versions of the same document.
>
> But Dave, you are maybe thinking of something different: how to get an ID valid for a given document during its localization cycle. In other word a value that doesn't need to survive after the document is done.
>
> In other words you are thinking XLIFF 'id' and I'm thinking XLIFF 'resname'.
>
> Cheers,
> -ys
Received on Tuesday, 1 May 2012 20:54:32 UTC