RE: [ACTION-79]Consider consolidation of status-related data categories and process trigger from Pedro L. Díez Orzas on 2012-05-16 (public-multilingualweb-lt@w3.org from May 2012)

From: Pedro L. Díez Orzas <pedro.diez@linguaserve.com>
Date: Wed, 16 May 2012 19:22:44 +0200
To: "'David Lewis'" <dave.lewis@cs.tcd.ie>
Cc: <public-multilingualweb-lt@w3.org>
Message-ID: <BF5C186DFB57400DBFE4B06E95FBF519@newlas.local>
Hi Dave:

I tried to connect with you today by Skipe. Maybe, we can talk tomorrow
before meeting. I can see that what you consolidated (Data model:
ready-to-process, process-ref, ready-at, revised, priority, complete-by)
cover what is was in:

*        processRequested. Encodes the actions or workflow item requested

*        dateRequest –value: date and time 

*        dateDelivery –value: date and time 

*        priority –value: numeric. 

Later, you mention: DaveL: consideration need to be given to process
specific attributes in the original version of the processTrigger. This is
becuase in the above form, the next process is made explicit, so assumption
related to these process specific attributes no longer apply. 

I tried to understand what you said, and I can make the following comments
to yours:

o       contentType, values: MIME or custom values - This indicates the
format or the type of the content used in the content in order to apply the
right filter or normalization rules, and the subsequent processes. For
example, to express HTML we could use: “contentType: text/html: consider
consolidation with formatType or languageResource 

o       contentTypeVersion values: version value - This indicates the
version of the contentType. For example, to express HTML 3.0

PEDRO>> It is nothing to do with formatType as it is now (provides
information about the format or service for which the content is produced
(e.g., subtitles, spoken text)) or languageResource (Identified what
language resource(s) are to be used for translation memory…). It is critical
for right processing of for instance every XML content field. contentType
and contentTypeVersion are intended to express if for example the content is
in HTML 4.2, Indesign CS4 or plein text

o       sourceLang – value: standard ISO 639 value - this value indicates
the source language for the current translation requested. It is different
from the sourceLanguage (provenance) Data Category , since this indicates
the language the original source text was and sourceLang indicates the
current source language to be used for the translation that can be different
from the original source - this should be considered as an attribute for
proveance 

o       targetLangs - values: standard ISO values. This value indicates the
target languages for the current translation requested. (Arle: how do we
delimit the list? Is it comma separated?) 

o       pivotLang - value: standard ISO value. Indicates the intermediate
language in the case is needed. Two examples: 1) Going from a source
language to two language variants (eg. into Brazil and Portugal Portuguese),
it is more cost-effective to go to one first (being this first variant a
"pivot" language) and to revise later to the second variant; Going from one
language to another via an intermediate language (eg. from Maltese into
English and from English into Irish, because there is not direct Maltese
into Irish available translation). - consider consolidation with source
language, , i.e. it is an attibute of the source language 

PEDRO >> sourceLang: it is true that this has a lot to do with provenance,
but for the moment source language in provenance is defined as: provides
information concerning what language the original source text was in, not
the source language for the actual translation (independently in which
language was originally written); also pivotLang is not an attribute of
source language, but another language to go through to get the target
languages. And for any localisation processes I definitely would include
target languages. I do not understand this was excluded, that is way
probably I am missing something.

o       contentResultSource –value: yes / no. Indicates the format if the
Localisation chain needs to give back the original - is this necessary as an
attribute here or as a separate attribute 

o       contentResultTarget – value: monolingual, multilingual; indicates if
the resulting translation, in the cases of several target languages, should
be delivered in several monolingual content files or in a single
multilingual content file this would require a more general purpose return
file indicator 

PEDRO >> These two last about I think can fit in “the next process” you
mention which is made explicit.

 

In any case, I would prefer to talk about this tomorrow.

 

All the best,

Pedro

 

  _____  

De: David Lewis [mailto:dave.lewis@cs.tcd.ie] 
Enviado el: martes, 08 de mayo de 2012 3:00
Para: "Pedro L. Díez Orzas"
CC: public-multilingualweb-lt@w3.org
Asunto: Re: [ACTION-79]Consider consolidation of status-related data
categories and process trigger

 

Hi Pedro,
Sorry, I didn't yet fill in the details of how I thought this might work for
cache status, which would simply be:

*	The original content is not saved in the cache (i.e., it is new or
has been updated): (re)translation is needed 

the source document or element would have attribute:

ready-to-process  = cache-source
ready-at = <the time at which it would be ready to cache>

*	The translated content is not saved in the cache (i.e., it has not
been previously translated or has expired): translation is needed 

the translation document or element would have attributes:

ready-to-process = cache-target
ready-at = <the time at which it would be ready to cache>

*	Neither the original nor the translated page are saved in the cache:
both need to be cached 

you could either have both the above, or in cases where the source and
target are in the same file use:

ready-to-process = cache-source-and-target
ready-at = <the time at which it would be ready to cache>

Note, there is a revised flag there that could also be used if useful

So, if I understand this right I think the  readiness attributes would
provide equivalent meta-data. However, if you think this is a distinct use
case, i.e. implementors who would implement the cacheStatus are specifically
only interested in that functionally and would be unlikely to also implement
a more general readiness data category, then definitely we should be
considering a separate data category.

cheers,
Dave


On 07/05/2012 18:32, Pedro L. Díez Orzas wrote: 

Hi Dave,

 

I will look at it very carefully as soon as I can, since they are really
major changes, but a priori I do not understand why to consolidate and to
remove cacheStatus, since for me this is a completely different metadata
than processTrigger, processStatus or other “status” that answers completely
different requirements.

 

As I explained in the notes and definition of cacheStatus, this metadata is
not for localization chain o whatever localisation process, but for real
time translation systems and their caching needs. In this respect I would
put it again as it was (if you want it can called only “cache”, without
“status”) and sorry for any confusion I could produce about it.

 

Best,

Pedro

 

__________________________________

 

Pedro L. Díez Orzas

Presidente Ejecutivo/CEO

Linguaserve Internacionalización de Servicios, S.A.

Tel.: +34 91 761 64 60
Fax: +34 91 542 89 28 

E-mail:  <mailto:pedro.diez@linguaserve.com> pedro.diez@linguaserve.com

www.linguaserve.com <http://www.linguaserve.com/> 

 

«En cumplimiento con lo previsto con los artículos 21 y 22 de la Ley
34/2002, de 11 de julio, de Servicios de la Sociedad de Información y
Comercio Electrónico, le informamos que procederemos al archivo y
tratamiento de sus datos exclusivamente con fines de promoción de los
productos y servicios ofrecidos por LINGUASERVE INTERNACIONALIZACIÓN DE
SERVICIOS, S.A. En caso de que Vdes. no deseen que procedamos al archivo y
tratamiento de los datos proporcionados, o no deseen recibir comunicaciones
comerciales sobre los productos y servicios ofrecidos, comuníquenoslo a
clients@linguaserve.com, y su petición será inmediatamente cumplida.»

 

"According to the provisions set forth in articles 21 and 22 of Law 34/2002
of July 11 regarding Information Society and eCommerce Services, we will
store and use your personal data with the sole purpose of marketing the
products and services offered by LINGUASERVE INTERNACIONALIZACIÓN DE
SERVICIOS, S.A. If you do not wish your personal data to be stored and
handled, or you do not wish to receive further information regarding
products and services offered by our company, please e-mail us to
clients@linguaserve.com. Your request will be processed immediately."

 ____________________________________

 

 

  _____  

De: David Lewis [mailto:dave.lewis@cs.tcd.ie] 
Enviado el: lunes, 07 de mayo de 2012 14:51
Para: public-multilingualweb-lt@w3.org
Asunto: Re: [ACTION-79]Consider consolidation of status-related data
categories and process trigger

 

Hi Pedro, Guys,
Following the previous discussion on the proposal for consolidation around
these data categories I have now made the following changes to the
requirements document.

Pedro, as discussed on Friday's call could you and any other interested
parties examine these changes and flag anything issues on this thread.

1) I have update processTrigger and changed its name to 'readiness' as
previously discussed
 
<http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#readin
ess>
http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#readine
ss

2) I have moved the need for a process model to a new requirement to reflect
its relevance to several of the other data categories, including readiness,
progress-indicator and provenance, and it need for further careful
consideration:
 
<http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#Proces
s_Model>
http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#Process
_Model

3) As part of this consolidation I have removed the data categories of: 
processTrigger, cacheStatus, legalStatus, processState, proofreadingState
and revision state 

4) I've updated the data category tables and the related interests
accordingly

5) I've highlighted issues (in bold below) to consider about the following
properties of the removed processTrigger that are no longer present (as
recorded in the notes for the readiness data category)

*	contentType, values: MIME or custom values - This indicates the
format or the type of the content used in the content in order to apply the
right filter or normalization rules, and the subsequent processes. For
example, to express HTML we could use: “contentType: text/html: consider
consolidation with formatType or languageResource 

>> Not agree, unless formatType refers really to computer format and not
like now: about the format or service for which the content is produced
(e.g., subtitles, spoken text)

*	sourceLang – value: standard ISO 639 value - this value indicates
the source language for the current translation requested. It is different
from the sourceLanguage (provenance) Data Category , since this indicates
the language the original source text was and sourceLang indicates the
current source language to be used for the translation that can be different
from the original source - this should be considered as an attribute for
proveance 
*	contentResultSource –value: yes / no. Indicates the format if the
Localisation chain needs to give back the original - is this necessary as an
attribute here or as a separate attribute 
*	contentResultTarget – value: monolingual, multilingual; indicates if
the resulting translation, in the cases of several target languages, should
be delivered in several monolingual content files or in a single
multilingual content file this would require a more general purpose return
file indicator 
*	pivotLang - value: standard ISO value. Indicates the intermediate
language in the case is needed. Two examples: 1) Going from a source
language to two language variants (eg. into Brazil and Portugal Portuguese),
it is more cost-effective to go to one first (being this first variant a
"pivot" language) and to revise later to the second variant; Going from one
language to another via an intermediate language (eg. from Maltese into
English and from English into Irish, because there is not direct Maltese
into Irish available translation). - consider consolidation with source
language, , i.e. it is an attibute of the source language 


Regards,
Dave

On 04/05/2012 01:46, David Lewis wrote: 

Hi Moritz, guys,
I added this progress-indicator data category to the requirements:
 
<http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#progre
ss-indicator>
http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#progres
s-indicator

Regards,
Dave

On 28/04/2012 22:11, David Lewis wrote: 

Hi Morwitz,
I moved this onto this separate thread related to the relevant consolidation
action.

I think there are two different data categories here.

What you describe is a progress indicator. This would be a common feature on
a lot of CMS-based and crowdsourced translation tools. It would be measured
as the number of segments (or perhaps words) of a document (or a group of
document representing a job) that have been processes as a proportion of the
total that need to be processed.

The other, which is what the current text for 'process state' (
<http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#proces
s_state>
http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#process
_state) specifies, is an indication of  which point in a process sequence
has currently been reached. As discussed, this could be covered by the
processTrigger/readiness data category we are discussing.

Moritz, does this distinction match with your view here? If so then we could
introduce a new 'progress-indicator' data category requirement, and then
continue discussing the consolidation of 'process state' with
processTrigger/readiness.

thanks,
Dave


On 27/04/2012 18:40, Moritz Hellwig wrote: 

Hello,

 

I might make this a separate thread, but since we are already talking about
processState here...

 

There were quite a lot of requests from our editorial team to have something
like

 

processIndicator 

Values integer, 0 to 100

 

Zero would be "LSP process not begun"-ish, 100 would be "Completed". 

 

There are - from our point of view - considerable advantages:

A) we can show a process progress indicator (in whichever visual
representation) that does not require an understanding of what the actual
process phase is on the MT side. 

B) the indicator can be agnostic to the number of processes / stages on the
side of the LSP. If you run a hundred separate processes or feedback loops:
fine by me.

 

This would be beneficial for e.g. content creators who are unfamiliar with
the language technology, its processes and so on. Also, it would allow us to
built dashboards and generate reports e.g. to show and sort by progression &
keep better track of multilingual projects. 

 

Any thoughts?

 

Cheers,

Moritz

Sent from my iPhone


On 27.04.2012, at 01:14, "David Lewis" < <mailto:dave.lewis@cs.tcd.ie>
dave.lewis@cs.tcd.ie> wrote:

Pedro,
Yes, the redundancy of process state is one outcome of what I'm proposing
here.

The key difference is that the proposal is that the data category indicates
the next process that should be performed, rather than indicating the
current process in operation. The motivation is that the readiness to
undergo a new process step is more useful to a document in a CMS, then
knowing the current state that is operating on it.

Complementary to this, provenance indicates that a process is completed, and
associated with this records useful information needed to monitor correct or
efficient process operation, perhaps as needed to monitor a service level
agreement.

Neither process trigger or provenance however actually aim to control
process flow. This is a complex topic which therefore is probably out of
scope. 

What we do need however, is a way of defining  the values to use for
referencing processes, i.e. from both the 'request-process' and the process
reference in provenance. For this we may want both a default set in the
standard, and a way of unambiguously defining these for a particular
business case. The key thing in any one case of interoperability is that the
interoperating implementations exchange and understand the _same set_ of
process values. 

let keep the discussion going on the list,
Dave

On 26/04/2012 15:29, Pedro L. Díez Orzas wrote: 

Hi David,

 

I need to consider this more carefully. 

 

But, what I see is that process state is perhaps redundant with:
proofreading state or revision state, since these can be values of process
state: proofreaded, revised, reviewed, translated, localized…

 

Best,

Pedro

 


  _____  


De: David Lewis [mailto:dave.lewis@cs.tcd.ie] 
Enviado el: jueves, 26 de abril de 2012 1:52
Para: public-multilingualweb-lt@w3.org
Asunto: Re: [all] Discussion on proposed metadata categories: approvalStatus

 

Hi Moritz,
I think you make a very good general point here. It may be a bit too open
ended to specify data categories that hardwire the completion of a specific
step. We would run into the same issues we have with defining the different
process values as we discussed around process trigger. Also, its not clear
to me that all status flag suggestion for current steps, e.g. legal
approval, really need to be separated from other steps.

I think therefore we could generalise this as part of the process trigger
data category as you suggest. This could allow us to consolidate
approvalStatus, cacheStatus, legalStaus, proofReading state and revision
state (and delegate the definition of these steps to data values rather than
individual data categories). We can address cacheStatus, and at he same time
generalise it to other processes than just translation, by including the
time stamp and a revision flag. 

Also, I think the priority data category should be included here, as
translation could consist of many different processes in combination, so it
semantics are dependent on which one. At the same time we may also be
interested in defining priorities even for non translation activities, such
as review.

requested-process (which has the name of the next process requested)

process-ref (which may allow us to point to an external set of process
definitions used for processRequested if the default value set is not used)

ready-at (defines the time the content is ready for the process, it could be
some time in the past, or some time in the future - this support part of the
cacheStatus function)

revised (yes/no - indicated is this is a different version of content that
was previously marked as ready for the declared process)

priority (I think for now we should keep this simple and just have values
high/low )

complete-by (provides a target date-time for completing the process)

Any thoughts on this suggestion. Pedro, Ryan, Moritz, Des, I think this
impacts on data categories you have an interest in.

Also, DavidF, Pedro, Ryan, do you think this makes process state redundant?
As a status flag are we more interested in what process to do next, rather
than which one is finished. At the same time the provenance data category
could tell us which processes have already finished operating on the
content. 

cheers,
Dave


On 24/04/2012 11:11, Moritz Hellwig wrote: 

to identify publication process metadata which might also be relevant for
the LSP. I ran into a couple of questions though.

 

I’ll use approvalStatus as an example (from the requirements document):

>> approvalStatus 

>> Information about the status of the content in a formal approval workflow

>> Indicates whether the content has been approved for release 

>> Possible values:

>>>> yes

>>>> no

 

Approval can have many values which are rarely only “release yes|no” and
they can be client/application-specific. However, none of these statuses
seem to be relevant to the LSP, as they only precede or succeed the LSP’s
processes.
Received on Wednesday, 16 May 2012 17:24:42 UTC