Re: Missing requirment - content format/type from David Lewis on 2012-05-01 (public-multilingualweb-lt@w3.org from May 2012)

From: David Lewis <dave.lewis@cs.tcd.ie>
Date: Tue, 01 May 2012 16:20:04 +0100
To: Arle Lommel <arle.lommel@gmail.com>
CC: Yves Savourel <ysavourel@enlaso.com>, public-multilingualweb-lt@w3.org
Message-ID: <4F9FFF24.8060802@cs.tcd.ie>
Hi Arle, Yves, Christian,
In general yes, with time limitations and 'small and simple' quality of 
ITS1.0 we may need some 'complexity cap' on some of these proposed 
requirements, to weight against their desirability. Again, the most 
pragmatic test of the balance of these is implementation commitments.

Where we are faced with complexity at this stage, its important to 
refocus on use cases before diving too deep into technical discussion.

So to try and flesh out the requirements for context a bit more, what 
are the use cases as we understand it in terms or using any context data 
category?

I can think of:
1) to provide some extra meta-data information to the human 
translator/posteditor when making translation decisions
2) to drive a WYSIWYG, 'in-context' CAT tool
3) to provide some relevant hints to an MT engine on the best 
translation to select

are there other use cases we should consider for context?

For (1), are there good examples of what context meta-data has been 
found useful for translators? Would the XLIFF restype represent a useful 
established set?

For (2), this does seem very complex, especially to do in a generic 
manner. For HTML5 we it would be possible to dynamically render the 
HTML5 skeleton file and the on-going translation. Doing this generically 
for XML is a lot more complex since you also need to know how this is 
rendered, e.g. into HTML or PDF etc, to present an accurate WYSIWIG

For (3), there would be fewer relevant properties. Perhaps we only need 
to know maximum target string length, rather then any knowledge of 
whether its in a menu or pop-up box etc, since an MT engine wouldn't 
process these meaningfully - its just weighting options.

cheers,
Dave


On 01/05/2012 15:53, Arle Lommel wrote:
> Hi all.
>
> Actually the mapping is a little complex. formatType, as it was described by whoever proposed it, was an indication of the *kind* of item something is (e.g., it is a subtitle), but it applies whether or not something is shown in any broader context.
>
> Context was initially proposed by Christian Lieske and I've talked to him about it but not had a chance to write it up. He envisions a very complex model of different kinds of contexts. One of them actually does correspond to formatType, but the others do not. So we probably can collapse the two categories with formatType as one item in context.
>
> At the same time, I think there is a conceptual value to having simpler data categories rather than complex ones. This is an issue we need to discuss. When we hit something like processTrigger or context (as I describe it), these are hugely complex items. When we see a data category with twenty sub-categories, it starts looking pretty complex to implement. Even if it is just a psychological barrier, we do need to consider how the perceived complexity of these things will be received.
>
> -Arle
>
> Sic scripsit Yves Savourel in May 1, 2012 ad 03:05 :
>
>> Hi Dave, all,
>>
>>> The text we currently have in the requirements for
>>> the content data category seems very similar, albeit
>>> a subset of, the values defined for restype in XLIFF.
>> I assume you mean the 'context' (not 'content') data category:
>> http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#context
>>
>> You are right indeed, 'context' seems to be what would be mapped to XLIFF's 'restype', not 'formatType'.
>>
>> (which leaves us with the question of what exactly is the 'formatType' data category?)
>>
>>
>>> Yves, I'm not very familiar with this feature in XLIFF,
>>> could you provide a bit more background on where
>>> it used and by whom?
>> Essentially that metadata (restype) identifies the type of "resource" where the text is used in the original format. This is easily generated for software-type formats like Windows RC, etc.
>>
>> It is used, for instance, to provide a basic context to the translator (e.g. a "help" button and a "help" label on a menu bar don't translate the same in French in some cases; or the translation of a title or caption may require different casing rules in some languages, etc.)
>>
>> It is also used to help leveraging translations. If two entries with the same source text have no unique names (resname), then the restype value, if present, may help in guessing which translation should be used.
>>
>> In my experience, early on this was mostly used for software formats. Nowadays the difference between UI and document-type files is much more fuzzy and many XML-based formats could provide the element name (or a mapping to the element name to a more standard list) as a good context information. HTML5 is certainly a good example of a more "modern" UI format.
>>
>> As for the list of values: I suppose, like for 'domain' or other metadata, we could start with a list of basic values, and have a mechanism to allow its customization/expansion. Ideally we would have the same solution for all lists. I'm just not sure what exactly is the best solution:
>>
>> - x- values: its-context="x-mywidget"
>>
>> - namespace-like values: its-context="myns:mywidget"
>>
>> - an extra attribute to extend the main attribute: its-context="string" its-context-ext="mywidget"
>>
>> - something else?
>>
>> Cheers,
>> -yves
>>
>>
>>
>>
Received on Tuesday, 1 May 2012 15:20:27 UTC