Re: [ACTION-79]Consider consolidation of status-related data categories and process trigger from Felix Sasaki on 2012-05-09 (public-multilingualweb-lt@w3.org from May 2012)

From: Felix Sasaki <fsasaki@w3.org>
Date: Wed, 9 May 2012 08:46:40 +0200
To: Pedro L. Díez Orzas <pedro.diez@linguaserve.com>
Cc: David Lewis <dave.lewis@cs.tcd.ie>, public-multilingualweb-lt@w3.org
Message-ID: <CAL58czq-BM97HN8CcN7mRNH8hQ_ncJyvu-9OBZfWtJO0zfr=tA@mail.gmail.com>
Pedro, all,

I am wondering if this discussion could benefit from input of an HTTP
expert. I have the feeling that the existing HTTP headers might be
sufficient to realize this requirement. Do you mind if I take Yves Lafon
http://www.w3.org/People/all#ylafon
into the loop?

Felix

2012/5/8 Pedro L. Díez Orzas <pedro.diez@linguaserve.com>

> **
>
> Dear Dave,****
>
> ** **
>
> First of all, thank you for the consolidation task, which is hard, complex
> and “risky business” J.****
>
> ** **
>
> I would like to distinguish between cacheStatus and the rest. ****
>
> ** **
>
> About this specific case of cache status, I probably now understand the
> confusion. In you mail of the thread “Re: targetPointer Requirement
> update”, mail 08/05/2012 13:49, you mention “*ii) a realtime translation
> workflow, where content is put on a cache (I prefer perhaps a term like
> 'staging server' to avoid confusion with 'web cache')”. *Instead, the
> data category cacheStatus is not intended for the content in the *staging*or
> *hidden* in the client server, but for the source/translated/both in the
> side of the real time translation server. Actually, I did not considered
> the *staging server *in this, and probably it should be done in the way
> you suggest in your mail. Certainly the confusion was my fault when I
> described as:****
>
> ** **
>
>    - The original content is not saved in the cache (i.e., it is new or
>    has been updated): (re)translation is needed ****
>    - The translated content is not saved in the cache (i.e., it has not
>    been previously translated or has expired): translation is needed ****
>    - Neither the original nor the translated page are saved in the cache:
>    both need to be cached ****
>
> ** **
>
> It refers not the client side or CMS, but to the Real Time Translation
> System (RTTS) , which actually generates the web cache. For example, the
> value for timestamp is not the client who put it, like in ready-at = <the
> time at which it would be ready to cache>, but the RTTS when it does the
> caching. In that respect, the client indicates in the final HTML web page
> the values and whether a page or a part of a page needs to be cached or
> not, and if source, target or both:****
>
> ** **
>
>    - cached - values: yes, no; ****
>    - scope - values: source, target, both ****
>    - timestamp - date and time ****
>
> ** **
>
> In this scenario, the source pages (or parts of pages) are always
> translated in real time, and the translated pages (or parts) can be added
> to the cache to speed up future accesses, but some pages not only does not
> need to be cached, but needs not to be cached obligatory (for example pages
> in private areas, transactional pages of an e-commerce process or a bank…).
> ****
>
> ** **
>
> I cannot tell 100% if *implementors who would implement the cacheStatus
> are specifically only interested in that functionally and would be unlikely
> to also implement a more general readiness data category*, but even If it
> is 50% I would keep it separately, in the same way than other in
> “Internationalization” section. It is really a multilingualWebCache
> metadata in the pages for navigation of the final user. ****
>
> ** **
>
> I hope this helps, and I will try to answer the rest before Thursday’s
> meeting. ****
>
> ** **
>
> Best,****
>
> Pedro****
>
> ** **
>
> ** **
>  ------------------------------
>
> *De:* David Lewis [mailto:dave.lewis@cs.tcd.ie]
> *Enviado el:* martes, 08 de mayo de 2012 3:00
> ***Para****:* "Pedro L. Díez Orzas"
> *CC:* public-multilingualweb-lt@w3.org
>
> *Asunto:* Re: [ACTION-79]Consider consolidation of status-related data
> categories and process trigger
> ****
>
>  ** **
>
> Hi Pedro,
> Sorry, I didn't yet fill in the details of how I thought this might work
> for cache status, which would simply be:****
>
>    - The original content is not saved in the cache (i.e., it is new or
>    has been updated): (re)translation is needed ****
>
> the source document or element would have attribute:****
>
> ready-to-process  = cache-source
> ready-at = <the time at which it would be ready to cache>****
>
>    - The translated content is not saved in the cache (i.e., it has not
>    been previously translated or has expired): translation is needed ****
>
> the translation document or element would have attributes:****
>
> ready-to-process = cache-target
> ready-at = <the time at which it would be ready to cache>****
>
>    - Neither the original nor the translated page are saved in the cache:
>    both need to be cached ****
>
> you could either have both the above, or in cases where the source and
> target are in the same file use:****
>
> ready-to-process = cache-source-and-target
> ready-at = <the time at which it would be ready to cache>****
>
> Note, there is a revised flag there that could also be used if useful
>
> So, if I understand this right I think the  readiness attributes would
> provide equivalent meta-data. However, if you think this is a distinct use
> case, i.e. implementors who would implement the cacheStatus are
> specifically only interested in that functionally and would be unlikely to
> also implement a more general readiness data category, then definitely we
> should be considering a separate data category.
>
> cheers,
> Dave
>
>
> On 07/05/2012 18:32, Pedro L. Díez Orzas wrote: ****
>
> Hi Dave,********
>
> ** ******
>
> I will look at it very carefully as soon as I can, since they are really
> major changes, but a priori I do not understand why to consolidate and to
> remove cacheStatus, since for me this is a completely different metadata
> than processTrigger, processStatus or other “status” that answers
> completely different requirements.********
>
> ** ******
>
> As I explained in the notes and definition of cacheStatus, this metadata
> is not for localization chain o whatever localisation process, but for real
> time translation systems and their caching needs. In this respect I would
> put it again as it was (if you want it can called only “cache”, without
> “status”) and sorry for any confusion I could produce about it.********
>
> ** ******
>
> Best,********
>
> Pedro********
>
>  ********
>
> *__________________________________*****
> ****
>
> * *****
>
> ***Pedro L. Díez Orzas*******
> ****
>
> *****Presidente Ejecutivo/CEO*****
>
> *Linguaserve Internacionalización de Servicios, S.A.*****
>
> *Tel.: +34 91 761 64 60
> Fax: +34 91 542 89 28 *****
>
> *E-mail: **pedro.diez@linguaserve.com*****
> ****
>
> *www.linguaserve.com*****
>
> * *****
>
> «En cumplimiento con lo previsto con los artículos 21 y 22 de la Ley
> 34/2002, de 11 de julio, de Servicios de la Sociedad de **Info**rmación y
> Comercio Electrónico, le informamos que procederemos al archivo y
> tratamiento de sus datos exclusivamente con fines de promoción de los
> productos y servicios ofrecidos por LINGUASERVE INTERNACIONALIZACIÓN DE
> SERVICIOS, S.A. En caso de que Vdes. no deseen que procedamos al archivo y
> tratamiento de los datos proporcionados, o no deseen recibir comunicaciones
> comerciales sobre los productos y servicios ofrecidos, comuníquenoslo a
> clients@linguaserve.com, y su petición será inmediatamente cumplida.»*****
> ***
>
>  ********
>
> "According to the provisions set forth in articles 21 and 22 of Law
> 34/2002 of July 11 regarding **Info**rmation Society and eCommerce
> Services, we will store and use your personal data with the sole purpose of
> marketing the products and services offered by LINGUASERVE
> INTERNACIONALIZACIÓN DE SERVICIOS, S.A. If you do not wish your personal
> data to be stored and handled, or you do not wish to receive further
> information regarding products and services offered by our company, please
> e-mail us to clients@linguaserve.com. Your request will be processed
> immediately."********
>
>  *____________________________________*****
> ****
>
> ** ******
>
> ** ******
>  ------------------------------
>
> *De:* David Lewis [mailto:dave.lewis@cs.tcd.ie <dave.lewis@cs.tcd.ie>]
> *Enviado el:* lunes, 07 de mayo de 2012 14:51
> *Para:* public-multilingualweb-lt@w3.org
> *Asunto:* Re: [ACTION-79]Consider consolidation of status-related data
> categories and process trigger****
>  ****
>
> ** ******
>
> Hi Pedro, Guys,
> Following the previous discussion on the proposal for consolidation around
> these data categories I have now made the following changes to the
> requirements document.
>
> Pedro, as discussed on Friday's call could you and any other interested
> parties examine these changes and flag anything issues on this thread.
>
> 1) I have update processTrigger and changed its name to 'readiness' as
> previously discussed
>
> http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#readiness
>
> 2) I have moved the need for a process model to a new requirement to
> reflect its relevance to several of the other data categories, including
> readiness, progress-indicator and provenance, and it need for further
> careful consideration:
>
> http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#Process_Model
>
> 3) As part of this consolidation I have removed the data categories of:
> processTrigger, cacheStatus, legalStatus, processState, proofreadingState
> and revision state
>
> 4) I've updated the data category tables and the related interests
> accordingly
>
> 5) I've highlighted issues (in bold below) to consider about the following
> properties of the removed processTrigger that are no longer present (as
> recorded in the notes for the readiness data category)********
>
>    - *contentType*, values: MIME or custom values - This indicates the
>    format or the type of the content used in the content in order to apply the
>    right filter or normalization rules, and the subsequent processes. For
>    example, to express HTML we could use: “contentType: text/html: *consider
>    consolidation with formatType or languageResource* ********
>
> >> Not agree, unless formatType refers really to computer format and not
> like now: about the format or service for which the content is produced
> (e.g., subtitles, spoken text)********
>
>    - *sourceLang* – value: standard ISO 639 value - this value indicates
>    the source language for the current translation requested. It is
>    different from the sourceLanguage (provenance) Data Category , since this
>    indicates the language the original source text was and sourceLang
>    indicates the current source language to be used for the translation that
>    can be different from the original source - *this should be considered
>    as an attribute for proveance* ********
>    - *contentResultSource* –value: yes / no. Indicates the format if the
>    Localisation chain needs to give back the original - *is this
>    necessary as an attribute here or as a separate attribute* ********
>    - *contentResultTarget* – value: monolingual, multilingual; indicates
>    if the resulting translation, in the cases of several target languages,
>    should be delivered in several monolingual content files or in a single
>    multilingual content file *this would require a more general purpose
>    return file indicator* ********
>    - *pivotLang* - value: standard ISO value. Indicates the intermediate
>    language in the case is needed. Two examples: 1) Going from a source
>    language to two language variants (eg. into Brazil and Portugal
>    Portuguese), it is more cost-effective to go to one first (being this first
>    variant a "pivot" language) and to revise later to the second variant;
>    Going from one language to another via an intermediate language (eg. from
>    Maltese into English and from English into Irish, because there is not
>    direct Maltese into Irish available translation). - *consider
>    consolidation with source language, , i.e. it is an attibute of the source
>    language* ********
>
>
> Regards,
> Dave
>
> On 04/05/2012 01:46, David Lewis wrote: ********
>
> Hi Moritz, guys,
> I added this progress-indicator data category to the requirements:
>
> http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#progress-indicator
>
> Regards,
> Dave
>
> On 28/04/2012 22:11, David Lewis wrote: ********
>
> Hi Morwitz,
> I moved this onto this separate thread related to the relevant
> consolidation action.
>
> I think there are two different data categories here.
>
> What you describe is a progress indicator. This would be a common feature
> on a lot of CMS-based and crowdsourced translation tools. It would be
> measured as the number of segments (or perhaps words) of a document (or a
> group of document representing a job) that have been processes as a
> proportion of the total that need to be processed.
>
> The other, which is what the current text for 'process state' (
> http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#process_state)
> specifies, is an indication of  which point in a process sequence has
> currently been reached. As discussed, this could be covered by the
> processTrigger/readiness data category we are discussing.
>
> Moritz, does this distinction match with your view here? If so then we
> could introduce a new 'progress-indicator' data category requirement, and
> then continue discussing the consolidation of 'process state' with
> processTrigger/readiness.
>
> thanks,
> Dave
>
>
> On 27/04/2012 18:40, Moritz Hellwig wrote: ********
>
> Hello,********
>
> ** ******
>
> I might make this a separate thread, but since we are already talking
> about processState here...********
>
> ** ******
>
> There were quite a lot of requests from our editorial team to have
> something like********
>
> ** ******
>
> processIndicator ********
>
> Values integer, 0 to 100********
>
> ** ******
>
> Zero would be "LSP process not begun"-ish, 100 would be "Completed". *****
> ***
>
> ** ******
>
> There are - from our point of view - considerable advantages:********
>
> A) we can show a process progress indicator (in whichever visual
> representation) that does not require an understanding of what the actual
> process phase is on the MT side. ********
>
> B) the indicator can be agnostic to the number of processes / stages on
> the side of the LSP. If you run a hundred separate processes or feedback
> loops: fine by me.********
>
> ** ******
>
> This would be beneficial for e.g. content creators who are unfamiliar with
> the language technology, its processes and so on. Also, it would allow us
> to built dashboards and generate reports e.g. to show and sort by
> progression & keep better track of multilingual projects. ********
>
> ** ******
>
> Any thoughts?********
>
> ** ******
>
> Cheers,********
>
> Moritz
>
> Sent from my iPhone********
>
>
> On 27.04.2012, at 01:14, "David Lewis" <dave.lewis@cs.tcd.ie> wrote:******
> **
>
>  Pedro,
> Yes, the redundancy of process state is one outcome of what I'm proposing
> here.
>
> The key difference is that the proposal is that the data category
> indicates the next process that should be performed, rather than indicating
> the current process in operation. The motivation is that the readiness to
> undergo a new process step is more useful to a document in a CMS, then
> knowing the current state that is operating on it.
>
> Complementary to this, provenance indicates that a process is completed,
> and associated with this records useful information needed to monitor
> correct or efficient process operation, perhaps as needed to monitor a
> service level agreement.
>
> Neither process trigger or provenance however actually aim to control
> process flow. This is a complex topic which therefore is probably out of
> scope.
>
> What we do need however, is a way of defining  the values to use for
> referencing processes, i.e. from both the 'request-process' and the process
> reference in provenance. For this we may want both a default set in the
> standard, and a way of unambiguously defining these for a particular
> business case. The key thing in any one case of interoperability is that
> the interoperating implementations exchange and understand the _same set_
> of process values.
>
> let keep the discussion going on the list,
> Dave
>
> On 26/04/2012 15:29, Pedro L. Díez Orzas wrote: ********
>
> **Hi David,**************
>
> ** **********
>
> I need to consider this more carefully. ************
>
> ** **********
>
> But, what I see is that *process state *is perhaps redundant with: proofreading
> state or revision state, since these can be values of process state:
> proofreaded, revised, reviewed, translated, localized…********
> ****
>
> ** **********
>
> Best,************
>
> Pedro************
>
>  ********
> ****
>  ------------------------------
>
> *De:* David Lewis [mailto:dave.lewis@cs.tcd.ie <dave.lewis@cs.tcd.ie>]
> *Enviado el:* jueves, 26 de abril de 2012 1:52
> *Para:* **public-multilingualweb-lt@w3.org**
> *Asunto:* Re: [all] Discussion on proposed metadata categories:
> approvalStatus********
>  ****
>
> ** **********
>
> Hi Moritz,
> I think you make a very good general point here. It may be a bit too open
> ended to specify data categories that hardwire the completion of a specific
> step. We would run into the same issues we have with defining the different
> process values as we discussed around process trigger. Also, its not clear
> to me that all status flag suggestion for current steps, e.g. legal
> approval, really need to be separated from other steps.
>
> I think therefore we could generalise this as part of the process trigger
> data category as you suggest. This could allow us to consolidate *
> approvalStatus*, *cacheStatus*,* legalStaus*, *proofReading state* and *revision
> state* (and delegate the definition of these steps to data values rather
> than individual data categories). We can address *cacheStatus*, and at he
> same time generalise it to other processes than just translation, by
> including the time stamp and a revision flag.
>
> Also, I think the priority data category should be included here, as
> translation could consist of many different processes in combination, so it
> semantics are dependent on which one. At the same time we may also be
> interested in defining priorities even for non translation activities, such
> as review.
>
> *requested-process* (which has the name of the next process requested)
>
> *process-ref *(which may allow us to point to an external set of process
> definitions used for processRequested if the default value set is not used)
>
> *ready-at* (defines the time the content is ready for the process, it
> could be some time in the past, or some time in the future - this support
> part of the cacheStatus function)
>
> *revised* (yes/no - indicated is this is a different version of content
> that was previously marked as ready for the declared process)
>
> *priority* (I think for now we should keep this simple and just have
> values high/low )
>
> *complete-by* (provides a target date-time for completing the process)
>
> Any thoughts on this suggestion. Pedro, Ryan, Moritz, Des, I think this
> impacts on data categories you have an interest in.
>
> Also, DavidF, Pedro, Ryan, do you think this makes *process state*redundant? As a status flag are we more interested in what process to do
> next, rather than which one is finished. At the same time the provenance
> data category could tell us which processes have already finished operating
> on the content.
>
> cheers,
> Dave
>
>
> On 24/04/2012 11:11, Moritz Hellwig wrote: ************
>
> to identify publication process metadata which might also be relevant for
> the LSP. I ran into a couple of questions though.****************
>
> ** **************
>
> I’ll use approvalStatus as an example (from the requirements document):***
> *************
>
> >> approvalStatus ****************
>
> >> Information about the status of the content in a formal approval
> workflow****************
>
> >> Indicates whether the content has been approved for release ***********
> *****
>
> >> Possible values:****************
>
> >>>> yes****************
>
> >>>> no****************
>
> ** **************
>
> Approval can have many values which are rarely only “release yes|no” and
> they can be client/application-specific. However, none of these statuses
> seem to be relevant to the LSP, as they only precede or succeed the LSP’s
> processes.************
>
> ** **********
>
> ** ******
>
> ** ******
>
> ** ******
>
> ** ******
>
> ** **
>



-- 
Felix Sasaki
DFKI / W3C Fellow
Received on Wednesday, 9 May 2012 06:47:09 UTC