Re: [ACTION-79]Consider consolidation of status-related data categories and process trigger from David Lewis on 2012-05-07 (public-multilingualweb-lt@w3.org from May 2012)

From: David Lewis <dave.lewis@cs.tcd.ie>
Date: Mon, 07 May 2012 13:51:26 +0100
To: public-multilingualweb-lt@w3.org
Message-ID: <4FA7C54E.4020803@cs.tcd.ie>
Hi Pedro, Guys,
Following the previous discussion on the proposal for consolidation 
around these data categories I have now made the following changes to 
the requirements document.

Pedro, as discussed on Friday's call could you and any other interested 
parties examine these changes and flag anything issues on this thread.

1) I have update processTrigger and changed its name to 'readiness' as 
previously discussed
http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#readiness

2) I have moved the need for a process model to a new requirement to 
reflect its relevance to several of the other data categories, including 
readiness, progress-indicator and provenance, and it need for further 
careful consideration:
http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#Process_Model

3) As part of this consolidation I have removed the data categories of:
processTrigger, cacheStatus, legalStatus, processState, 
proofreadingState and revision state

4) I've updated the data category tables and the related interests 
accordingly

5) I've highlighted issues (in bold below) to consider about the 
following properties of the removed processTrigger that are no longer 
present (as recorded in the notes for the readiness data category)

  * /contentType/, values: MIME or custom values - This indicates the
    format or the type of the content used in the content in order to
    apply the right filter or normalization rules, and the subsequent
    processes. For example, to express HTML we could use: “contentType:
    text/html: *consider consolidation with formatType or languageResource*
  * /sourceLang/ – value: standard ISO 639 value - this value indicates
    the source language for the current translation requested. It is
    different from the sourceLanguage (provenance) Data Category , since
    this indicates the language the original source text was and
    sourceLang indicates the current source language to be used for the
    translation that can be different from the original source - *this
    should be considered as an attribute for proveance*
  * /contentResultSource/ –value: yes / no. Indicates the format if the
    Localisation chain needs to give back the original - *is this
    necessary as an attribute here or as a separate attribute*
  * /contentResultTarget/ – value: monolingual, multilingual; indicates
    if the resulting translation, in the cases of several target
    languages, should be delivered in several monolingual content files
    or in a single multilingual content file *this would require a more
    general purpose return file indicator*
  * /pivotLang/ - value: standard ISO value. Indicates the intermediate
    language in the case is needed. Two examples: 1) Going from a source
    language to two language variants (eg. into Brazil and Portugal
    Portuguese), it is more cost-effective to go to one first (being
    this first variant a "pivot" language) and to revise later to the
    second variant; Going from one language to another via an
    intermediate language (eg. from Maltese into English and from
    English into Irish, because there is not direct Maltese into Irish
    available translation). - *consider consolidation with source
    language, , i.e. it is an attibute of the source language*


Regards,
Dave

On 04/05/2012 01:46, David Lewis wrote:
> Hi Moritz, guys,
> I added this progress-indicator data category to the requirements:
> http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#progress-indicator
>
> Regards,
> Dave
>
> On 28/04/2012 22:11, David Lewis wrote:
>> Hi Morwitz,
>> I moved this onto this separate thread related to the relevant 
>> consolidation action.
>>
>> I think there are two different data categories here.
>>
>> What you describe is a progress indicator. This would be a common 
>> feature on a lot of CMS-based and crowdsourced translation tools. It 
>> would be measured as the number of segments (or perhaps words) of a 
>> document (or a group of document representing a job) that have been 
>> processes as a proportion of the total that need to be processed.
>>
>> The other, which is what the current text for 'process state' 
>> (http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#process_state) 
>> specifies, is an indication of  which point in a process sequence has 
>> currently been reached. As discussed, this could be covered by the 
>> processTrigger/readiness data category we are discussing.
>>
>> Moritz, does this distinction match with your view here? If so then 
>> we could introduce a new 'progress-indicator' data category 
>> requirement, and then continue discussing the consolidation of 
>> 'process state' with processTrigger/readiness.
>>
>> thanks,
>> Dave
>>
>>
>> On 27/04/2012 18:40, Moritz Hellwig wrote:
>>> Hello,
>>>
>>> I might make this a separate thread, but since we are already 
>>> talking about processState here...
>>>
>>> There were quite a lot of requests from our editorial team to have 
>>> something like
>>>
>>> processIndicator
>>> Values integer, 0 to 100
>>>
>>> Zero would be "LSP process not begun"-ish, 100 would be "Completed".
>>>
>>> There are - from our point of view - considerable advantages:
>>> A) we can show a process progress indicator (in whichever visual 
>>> representation) that does not require an understanding of what the 
>>> actual process phase is on the MT side.
>>> B) the indicator can be agnostic to the number of processes / stages 
>>> on the side of the LSP. If you run a hundred separate processes or 
>>> feedback loops: fine by me.
>>>
>>> This would be beneficial for e.g. content creators who are 
>>> unfamiliar with the language technology, its processes and so on. 
>>> Also, it would allow us to built dashboards and generate reports 
>>> e.g. to show and sort by progression & keep better track of 
>>> multilingual projects.
>>>
>>> Any thoughts?
>>>
>>> Cheers,
>>> Moritz
>>>
>>> Sent from my iPhone
>>>
>>> On 27.04.2012, at 01:14, "David Lewis" <dave.lewis@cs.tcd.ie 
>>> <mailto:dave.lewis@cs.tcd.ie>> wrote:
>>>
>>>> Pedro,
>>>> Yes, the redundancy of process state is one outcome of what I'm 
>>>> proposing here.
>>>>
>>>> The key difference is that the proposal is that the data category 
>>>> indicates the next process that should be performed, rather than 
>>>> indicating the current process in operation. The motivation is that 
>>>> the readiness to undergo a new process step is more useful to a 
>>>> document in a CMS, then knowing the current state that is operating 
>>>> on it.
>>>>
>>>> Complementary to this, provenance indicates that a process is 
>>>> completed, and associated with this records useful information 
>>>> needed to monitor correct or efficient process operation, perhaps 
>>>> as needed to monitor a service level agreement.
>>>>
>>>> Neither process trigger or provenance however actually aim to 
>>>> control process flow. This is a complex topic which therefore is 
>>>> probably out of scope.
>>>>
>>>> What we do need however, is a way of defining  the values to use 
>>>> for referencing processes, i.e. from both the 'request-process' and 
>>>> the process reference in provenance. For this we may want both a 
>>>> default set in the standard, and a way of unambiguously defining 
>>>> these for a particular business case. The key thing in any one case 
>>>> of interoperability is that the interoperating implementations 
>>>> exchange and understand the _same set_ of process values.
>>>>
>>>> let keep the discussion going on the list,
>>>> Dave
>>>>
>>>> On 26/04/2012 15:29, Pedro L. Díez Orzas wrote:
>>>>>
>>>>> Hi David,
>>>>>
>>>>> I need to consider this more carefully.
>>>>>
>>>>> But, what I see is that *process state *is perhaps redundant 
>>>>> with:proofreading state orrevision state, since these can be 
>>>>> values ofprocess state: proofreaded, revised, reviewed, 
>>>>> translated, localized…
>>>>>
>>>>> Best,
>>>>>
>>>>> Pedro
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>> *De:*David Lewis [mailto:dave.lewis@cs.tcd.ie]
>>>>> *Enviado el:* jueves, 26 de abril de 2012 1:52
>>>>> *Para:* public-multilingualweb-lt@w3.org
>>>>> *Asunto:* Re: [all] Discussion on proposed metadata categories: 
>>>>> approvalStatus
>>>>>
>>>>> Hi Moritz,
>>>>> I think you make a very good general point here. It may be a bit 
>>>>> too open ended to specify data categories that hardwire the 
>>>>> completion of a specific step. We would run into the same issues 
>>>>> we have with defining the different process values as we discussed 
>>>>> around process trigger. Also, its not clear to me that all status 
>>>>> flag suggestion for current steps, e.g. legal approval, really 
>>>>> need to be separated from other steps.
>>>>>
>>>>> I think therefore we could generalise this as part of the process 
>>>>> trigger data category as you suggest. This could allow us to 
>>>>> consolidate *approvalStatus*, *cacheStatus*,*legalStaus*, 
>>>>> *proofReading state* and *revision state* (and delegate the 
>>>>> definition of these steps to data values rather than individual 
>>>>> data categories). We can address *cacheStatus*, and at he same 
>>>>> time generalise it to other processes than just translation, by 
>>>>> including the time stamp and a revision flag.
>>>>>
>>>>> Also, I think the priority data category should be included here, 
>>>>> as translation could consist of many different processes in 
>>>>> combination, so it semantics are dependent on which one. At the 
>>>>> same time we may also be interested in defining priorities even 
>>>>> for non translation activities, such as review.
>>>>>
>>>>> *requested-process* (which has the name of the next process requested)
>>>>>
>>>>> *process-ref *(which may allow us to point to an external set of 
>>>>> process definitions used for processRequested if the default value 
>>>>> set is not used)
>>>>>
>>>>> *ready-at* (defines the time the content is ready for the process, 
>>>>> it could be some time in the past, or some time in the future - 
>>>>> this support part of the cacheStatus function)
>>>>>
>>>>> *revised* (yes/no - indicated is this is a different version of 
>>>>> content that was previously marked as ready for the declared process)
>>>>>
>>>>> *priority* (I think for now we should keep this simple and just 
>>>>> have values high/low )
>>>>>
>>>>> *complete-by* (provides a target date-time for completing the process)
>>>>>
>>>>> Any thoughts on this suggestion. Pedro, Ryan, Moritz, Des, I think 
>>>>> this impacts on data categories you have an interest in.
>>>>>
>>>>> Also, DavidF, Pedro, Ryan, do you think this makes *process state* 
>>>>> redundant? As a status flag are we more interested in what process 
>>>>> to do next, rather than which one is finished. At the same time 
>>>>> the provenance data category could tell us which processes have 
>>>>> already finished operating on the content.
>>>>>
>>>>> cheers,
>>>>> Dave
>>>>>
>>>>>
>>>>> On 24/04/2012 11:11, Moritz Hellwig wrote:
>>>>>
>>>>> to identify publication process metadata which might also be 
>>>>> relevant for the LSP. I ran into a couple of questions though.
>>>>>
>>>>> I’ll use approvalStatus as an example (from the requirements 
>>>>> document):
>>>>>
>>>>> >> approvalStatus
>>>>>
>>>>> >> Information about the status of the content in a formal approval 
>>>>> workflow
>>>>>
>>>>> >> Indicates whether the content has been approved for release
>>>>>
>>>>> >> Possible values:
>>>>>
>>>>> >>>> yes
>>>>>
>>>>> >>>> no
>>>>>
>>>>> Approval can have many values which are rarely only “release 
>>>>> yes|no” and they can be client/application-specific. However, none 
>>>>> of these statuses seem to be relevant to the LSP, as they only 
>>>>> precede or succeed the LSP’s processes.
>>>>>
>>>>
>>
>
Received on Monday, 7 May 2012 12:52:00 UTC