W3C home > Mailing lists > Public > public-multilingualweb-lt@w3.org > May 2012

Re: [ACTION-79]Consider consolidation of status-related data categories and process trigger

From: Dave Lewis <dave.lewis@cs.tcd.ie>
Date: Thu, 10 May 2012 01:33:14 +0100
Message-ID: <4FAB0CCA.8090208@cs.tcd.ie>
To: "Pedro L. Díez Orzas" <pedro.diez@linguaserve.com>
CC: 'Arle Lommel' <arle.lommel@dfki.de>, 'Felix Sasaki' <fsasaki@w3.org>, public-multilingualweb-lt@w3.org
Hi Pedro,
Yes, I think that I misunderstood you intention with this data category 
- my apologies.

I was thinking of it as a client side cache, just to help consistent 
version management on the client side. But if I understand this 
correctly, you are talking about a cache associated with the MT service, 
which is useful so that the same string from a source document doesn't 
need be translated again by the MT engine, just looked up from the cache 
- is that right?

I'm of course very happy to discuss this further to make sure we address 
the use case correctly.

cheers,
Dave

On 09/05/2012 10:53, Pedro L. Díez Orzas wrote:
>
> Dear Felix, Arle, Dave, all,
>
> Of course, any expert contribution is always welcome, so go ahead if 
> you think it is helpful.
>
> Nevertheless, I completely agree with Arle and again, as far as we 
> know, the HTTP headers provides information about cache in the client 
> server side (so it could be used for the case that Dave pointed out 
> about /staging server/ in the client side), but it does not provides 
> metadata to indicate to an external real time translation _and 
> publication_ system whether it has to cache a certain web content 
> after being translated and published in real time or a part of the 
> page. This is a content metadata is to be used and to cache by an 
> external system from the client server at several content levels.
>
> In any case, we are already using a lot of time for this, when this 
> metadata is thought for emerging technologies that still not many 
> people use (but we are convinced will do) , and that we can directly 
> manage with our clients (we are actually doing it with real life 
> clients).
>
> I consider more profitable to go ahead with discussions about other 
> data category (like processTrigger or readiness, or other) that are 
> much more extended (localization chains, for instance), so if you 
> think this "cacheStatus" (which category name is probably not the 
> best, and it should be more something that express "indicator from 
> clients whether the web content have to be cached by real time 
> translation and publication systems") it is not clear enough (I think 
> I already explained enough our position about this) let's drop it.
>
> Best,
>
> Pedro
>
> ------------------------------------------------------------------------
>
> *De:*Arle Lommel [mailto:arle.lommel@dfki.de]
> *Enviado el:* miércoles, 09 de mayo de 2012 10:28
> *Para:* Felix Sasaki
> *CC:* Pedro L. Díez Orzas; David Lewis; public-multilingualweb-lt@w3.org
> *Asunto:* Re: [ACTION-79]Consider consolidation of status-related data 
> categories and process trigger
>
> Hi Felix,
>
> I think that with the intended scenario Pedro proposed the HTTP 
> headers would not be granular enough. The cacheStatus could apply as 
> far down as the segment level, although the more likely scenario is 
> for it to apply at either the document level or the equivalent of the 
> DITA topic level. Since a web page could potentially pull multiple 
> topics into one place, the document itself would have a mix of cache 
> statuses depending on the cache status of the objects it references. 
> Perhaps Pedro can clarify, but even if that is the case, I don't think 
> it would hurt for Yves to get involved, so I'd say to go for bringing 
> him in.
>
> -Arle
>
> Sic scripsit Felix Sasaki in May 9, 2012 ad 08:46 :
>
>
>
> Pedro, all,
>
> I am wondering if this discussion could benefit from input of an HTTP 
> expert. I have the feeling that the existing HTTP headers might be 
> sufficient to realize this requirement. Do you mind if I take Yves Lafon
>
> http://www.w3.org/People/all#ylafon
>
> into the loop?
>
> Felix
>
> 2012/5/8 Pedro L. Díez Orzas <pedro.diez@linguaserve.com 
> <mailto:pedro.diez@linguaserve.com>>
>
> Dear Dave,
>
> First of all, thank you for the consolidation task, which is hard, 
> complex and "risky business" J.
>
> I would like to distinguish between cacheStatus and the rest.
>
> About this specific case of cache status, I probably now understand 
> the confusion. In you mail of the thread "Re: targetPointer 
> Requirement update", mail 08/05/2012 13:49, you mention "/ii) a 
> realtime translation workflow, where content is put on a cache (I 
> prefer perhaps a term like 'staging server' to avoid confusion with 
> 'web cache')". /Instead, the data category cacheStatus is not intended 
> for the content in the /staging/ or /hidden/ in the client server, but 
> for the source/translated/both in the side of the real time 
> translation server. Actually, I did not considered the /staging server 
> /in this, and probably it should be done in the way you suggest in 
> your mail. Certainly the confusion was my fault when I described as:
>
>   * The original content is not saved in the cache (i.e., it is new or
>     has been updated): (re)translation is needed
>
>   * The translated content is not saved in the cache (i.e., it has not
>     been previously translated or has expired): translation is needed
>
>   * Neither the original nor the translated page are saved in the
>     cache: both need to be cached
>
> It refers not the client side or CMS, but to the Real Time Translation 
> System (RTTS) , which actually generates the web cache. For example, 
> the value for timestamp is not the client who put it, like in ready-at 
> = <the time at which it would be ready to cache>, but the RTTS when it 
> does the caching. In that respect, the client indicates in the final 
> HTML web page the values and whether a page or a part of a page needs 
> to be cached or not, and if source, target or both:
>
>   * cached - values: yes, no;
>   * scope - values: source, target, both
>   * timestamp - date and time
>
> In this scenario, the source pages (or parts of pages) are always 
> translated in real time, and the translated pages (or parts) can be 
> added to the cache to speed up future accesses, but some pages not 
> only does not need to be cached, but needs not to be cached obligatory 
> (for example pages in private areas, transactional pages of an 
> e-commerce process or a bank...).
>
> I cannot tell 100% if /implementors who would implement the 
> cacheStatus are specifically only interested in that functionally and 
> would be unlikely to also implement a more general readiness data 
> category/, but even If it is 50% I would keep it separately, in the 
> same way than other in "Internationalization" section. It is really a 
> multilingualWebCache metadata in the pages for navigation of the final 
> user.
>
> I hope this helps, and I will try to answer the rest before Thursday's 
> meeting.
>
> Best,
>
> Pedro
>
> ------------------------------------------------------------------------
>
> *De:*David Lewis [mailto:dave.lewis@cs.tcd.ie 
> <mailto:dave.lewis@cs.tcd.ie>]
> *Enviado el:* martes, 08 de mayo de 2012 3:00
> *Para**:* "Pedro L. Díez Orzas"
> *CC:* public-multilingualweb-lt@w3.org 
> <mailto:public-multilingualweb-lt@w3.org>
>
>
> *Asunto:* Re: [ACTION-79]Consider consolidation of status-related data 
> categories and process trigger
>
> Hi Pedro,
> Sorry, I didn't yet fill in the details of how I thought this might 
> work for cache status, which would simply be:
>
>   * The original content is not saved in the cache (i.e., it is new or
>     has been updated): (re)translation is needed
>
> the source document or element would have attribute:
>
> ready-to-process  = cache-source
> ready-at = <the time at which it would be ready to cache>
>
>   * The translated content is not saved in the cache (i.e., it has not
>     been previously translated or has expired): translation is needed
>
> the translation document or element would have attributes:
>
> ready-to-process = cache-target
> ready-at = <the time at which it would be ready to cache>
>
>   * Neither the original nor the translated page are saved in the
>     cache: both need to be cached
>
> you could either have both the above, or in cases where the source and 
> target are in the same file use:
>
> ready-to-process = cache-source-and-target
> ready-at = <the time at which it would be ready to cache>
>
> Note, there is a revised flag there that could also be used if useful
>
> So, if I understand this right I think the  readiness attributes would 
> provide equivalent meta-data. However, if you think this is a distinct 
> use case, i.e. implementors who would implement the cacheStatus are 
> specifically only interested in that functionally and would be 
> unlikely to also implement a more general readiness data category, 
> then definitely we should be considering a separate data category.
>
> cheers,
> Dave
>
>
> On 07/05/2012 18:32, Pedro L. Díez Orzas wrote:
>
> Hi Dave,
>
> I will look at it very carefully as soon as I can, since they are 
> really major changes, but a priori I do not understand why to 
> consolidate and to remove cacheStatus, since for me this is a 
> completely different metadata than processTrigger, processStatus or 
> other "status" that answers completely different requirements.
>
> As I explained in the notes and definition of cacheStatus, this 
> metadata is not for localization chain o whatever localisation 
> process, but for real time translation systems and their caching 
> needs. In this respect I would put it again as it was (if you want it 
> can called only "cache", without "status") and sorry for any confusion 
> I could produce about it.
>
> Best,
>
> Pedro
>
> *__________________________________*
>
> **
>
> *Pedro L. Díez Orzas*
>
> *Presidente Ejecutivo/CEO*
>
> *Linguaserve Internacionalización de Servicios, S.A.*
>
> *Tel.: +34 91 761 64 60 <tel:%2B34%2091%20761%2064%2060>
> Fax: +34 91 542 89 28 <tel:%2B34%2091%20542%2089%2028> *
>
> *E-mail: **pedro.diez@linguaserve.com <mailto:pedro.diez@linguaserve.com>*
>
> *www.linguaserve.com <http://www.linguaserve.com/>*
>
> **
>
> «En cumplimiento con lo previsto con los artículos 21 y 22 de la Ley 
> 34/2002, de 11 de julio, de Servicios de la Sociedad de Información y 
> Comercio Electrónico, le informamos que procederemos al archivo y 
> tratamiento de sus datos exclusivamente con fines de promoción de los 
> productos y servicios ofrecidos por LINGUASERVE INTERNACIONALIZACIÓN 
> DE SERVICIOS, S.A. En caso de que Vdes. no deseen que procedamos al 
> archivo y tratamiento de los datos proporcionados, o no deseen recibir 
> comunicaciones comerciales sobre los productos y servicios ofrecidos, 
> comuníquenoslo a clients@linguaserve.com 
> <mailto:clients@linguaserve.com>, y su petición será inmediatamente 
> cumplida.»
>
> "According to the provisions set forth in articles 21 and 22 of Law 
> 34/2002 of July 11 regarding Information Society and eCommerce 
> Services, we will store and use your personal data with the sole 
> purpose of marketing the products and services offered by LINGUASERVE 
> INTERNACIONALIZACIÓN DE SERVICIOS, S.A. If you do not wish your 
> personal data to be stored and handled, or you do not wish to receive 
> further information regarding products and services offered by our 
> company, please e-mail us to clients@linguaserve.com 
> <mailto:clients@linguaserve.com>. Your request will be processed 
> immediately."
>
> *____________________________________*
>
> ------------------------------------------------------------------------
>
> *De:*David Lewis [mailto:dave.lewis@cs.tcd.ie]
> *Enviado el:* lunes, 07 de mayo de 2012 14:51
> *Para:* public-multilingualweb-lt@w3.org 
> <mailto:public-multilingualweb-lt@w3.org>
> *Asunto:* Re: [ACTION-79]Consider consolidation of status-related data 
> categories and process trigger
>
> Hi Pedro, Guys,
> Following the previous discussion on the proposal for consolidation 
> around these data categories I have now made the following changes to 
> the requirements document.
>
> Pedro, as discussed on Friday's call could you and any other 
> interested parties examine these changes and flag anything issues on 
> this thread.
>
> 1) I have update processTrigger and changed its name to 'readiness' as 
> previously discussed
> http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#readiness
>
> 2) I have moved the need for a process model to a new requirement to 
> reflect its relevance to several of the other data categories, 
> including readiness, progress-indicator and provenance, and it need 
> for further careful consideration:
> http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#Process_Model
>
> 3) As part of this consolidation I have removed the data categories of:
> processTrigger, cacheStatus, legalStatus, processState, 
> proofreadingState and revision state
>
> 4) I've updated the data category tables and the related interests 
> accordingly
>
> 5) I've highlighted issues (in bold below) to consider about the 
> following properties of the removed processTrigger that are no longer 
> present (as recorded in the notes for the readiness data category)
>
>   * /contentType/, values: MIME or custom values - This indicates the
>     format or the type of the content used in the content in order to
>     apply the right filter or normalization rules, and the subsequent
>     processes. For example, to express HTML we could use:
>     "contentType: text/html: *consider consolidation with formatType
>     or languageResource*
>
> >> Not agree, unless formatType refers really to computer format and 
> not like now: about the format or service for which the content is 
> produced (e.g., subtitles, spoken text)
>
>   * /sourceLang/-- value: standard ISO 639 value - this value
>     indicates the source language for the current translation
>     requested. It is different from the sourceLanguage (provenance)
>     Data Category , since this indicates the language the original
>     source text was and sourceLang indicates the current source
>     language to be used for the translation that can be different from
>     the original source - *this should be considered as an attribute
>     for proveance*
>   * /contentResultSource/ --value: yes / no. Indicates the format if
>     the Localisation chain needs to give back the original - *is this
>     necessary as an attribute here or as a separate attribute*
>   * /contentResultTarget/ -- value: monolingual, multilingual;
>     indicates if the resulting translation, in the cases of several
>     target languages, should be delivered in several monolingual
>     content files or in a single multilingual content file *this would
>     require a more general purpose return file indicator*
>   * /pivotLang/ - value: standard ISO value. Indicates the
>     intermediate language in the case is needed. Two examples: 1)
>     Going from a source language to two language variants (eg. into
>     Brazil and Portugal Portuguese), it is more cost-effective to go
>     to one first (being this first variant a "pivot" language) and to
>     revise later to the second variant; Going from one language to
>     another via an intermediate language (eg. from Maltese into
>     English and from English into Irish, because there is not direct
>     Maltese into Irish available translation). - *consider
>     consolidation with source language, , i.e. it is an attibute of
>     the source language*
>
>
> Regards,
> Dave
>
> On 04/05/2012 01:46, David Lewis wrote:
>
> Hi Moritz, guys,
> I added this progress-indicator data category to the requirements:
> http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#progress-indicator
>
> Regards,
> Dave
>
> On 28/04/2012 22:11, David Lewis wrote:
>
> Hi Morwitz,
> I moved this onto this separate thread related to the relevant 
> consolidation action.
>
> I think there are two different data categories here.
>
> What you describe is a progress indicator. This would be a common 
> feature on a lot of CMS-based and crowdsourced translation tools. It 
> would be measured as the number of segments (or perhaps words) of a 
> document (or a group of document representing a job) that have been 
> processes as a proportion of the total that need to be processed.
>
> The other, which is what the current text for 'process state' 
> (http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#process_state) 
> specifies, is an indication of  which point in a process sequence has 
> currently been reached. As discussed, this could be covered by the 
> processTrigger/readiness data category we are discussing.
>
> Moritz, does this distinction match with your view here? If so then we 
> could introduce a new 'progress-indicator' data category requirement, 
> and then continue discussing the consolidation of 'process state' with 
> processTrigger/readiness.
>
> thanks,
> Dave
>
>
> On 27/04/2012 18:40, Moritz Hellwig wrote:
>
> Hello,
>
> I might make this a separate thread, but since we are already talking 
> about processState here...
>
> There were quite a lot of requests from our editorial team to have 
> something like
>
> processIndicator
>
> Values integer, 0 to 100
>
> Zero would be "LSP process not begun"-ish, 100 would be "Completed".
>
> There are - from our point of view - considerable advantages:
>
> A) we can show a process progress indicator (in whichever visual 
> representation) that does not require an understanding of what the 
> actual process phase is on the MT side.
>
> B) the indicator can be agnostic to the number of processes / stages 
> on the side of the LSP. If you run a hundred separate processes or 
> feedback loops: fine by me.
>
> This would be beneficial for e.g. content creators who are unfamiliar 
> with the language technology, its processes and so on. Also, it would 
> allow us to built dashboards and generate reports e.g. to show and 
> sort by progression & keep better track of multilingual projects.
>
> Any thoughts?
>
> Cheers,
>
> Moritz
>
> Sent from my iPhone
>
>
> On 27.04.2012, at 01:14, "David Lewis" <dave.lewis@cs.tcd.ie 
> <mailto:dave.lewis@cs.tcd.ie>> wrote:
>
>> Pedro,
>> Yes, the redundancy of process state is one outcome of what I'm 
>> proposing here.
>>
>> The key difference is that the proposal is that the data category 
>> indicates the next process that should be performed, rather than 
>> indicating the current process in operation. The motivation is that 
>> the readiness to undergo a new process step is more useful to a 
>> document in a CMS, then knowing the current state that is operating 
>> on it.
>>
>> Complementary to this, provenance indicates that a process is 
>> completed, and associated with this records useful information needed 
>> to monitor correct or efficient process operation, perhaps as needed 
>> to monitor a service level agreement.
>>
>> Neither process trigger or provenance however actually aim to control 
>> process flow. This is a complex topic which therefore is probably out 
>> of scope.
>>
>> What we do need however, is a way of defining  the values to use for 
>> referencing processes, i.e. from both the 'request-process' and the 
>> process reference in provenance. For this we may want both a default 
>> set in the standard, and a way of unambiguously defining these for a 
>> particular business case. The key thing in any one case of 
>> interoperability is that the interoperating implementations exchange 
>> and understand the _same set_ of process values.
>>
>> let keep the discussion going on the list,
>> Dave
>>
>> On 26/04/2012 15:29, Pedro L. Díez Orzas wrote:
>>
>> Hi David,
>>
>> I need to consider this more carefully.
>>
>> But, what I see is that *process state *is perhaps redundant 
>> with:proofreading state orrevision state, since these can be values 
>> ofprocess state: proofreaded, revised, reviewed, translated, localized...
>>
>> Best,
>>
>> Pedro
>>
>> ------------------------------------------------------------------------
>>
>> *De:*David Lewis [mailto:dave.lewis@cs.tcd.ie]
>> *Enviado el:* jueves, 26 de abril de 2012 1:52
>> *Para:* public-multilingualweb-lt@w3.org 
>> <mailto:public-multilingualweb-lt@w3.org>
>> *Asunto:* Re: [all] Discussion on proposed metadata categories: 
>> approvalStatus
>>
>> Hi Moritz,
>> I think you make a very good general point here. It may be a bit too 
>> open ended to specify data categories that hardwire the completion of 
>> a specific step. We would run into the same issues we have with 
>> defining the different process values as we discussed around process 
>> trigger. Also, its not clear to me that all status flag suggestion 
>> for current steps, e.g. legal approval, really need to be separated 
>> from other steps.
>>
>> I think therefore we could generalise this as part of the process 
>> trigger data category as you suggest. This could allow us to 
>> consolidate *approvalStatus*, *cacheStatus*,*legalStaus*, 
>> *proofReading state* and *revision state* (and delegate the 
>> definition of these steps to data values rather than individual data 
>> categories). We can address *cacheStatus*, and at he same time 
>> generalise it to other processes than just translation, by including 
>> the time stamp and a revision flag.
>>
>> Also, I think the priority data category should be included here, as 
>> translation could consist of many different processes in combination, 
>> so it semantics are dependent on which one. At the same time we may 
>> also be interested in defining priorities even for non translation 
>> activities, such as review.
>>
>> *requested-process* (which has the name of the next process requested)
>>
>> *process-ref *(which may allow us to point to an external set of 
>> process definitions used for processRequested if the default value 
>> set is not used)
>>
>> *ready-at* (defines the time the content is ready for the process, it 
>> could be some time in the past, or some time in the future - this 
>> support part of the cacheStatus function)
>>
>> *revised* (yes/no - indicated is this is a different version of 
>> content that was previously marked as ready for the declared process)
>>
>> *priority* (I think for now we should keep this simple and just have 
>> values high/low )
>>
>> *complete-by* (provides a target date-time for completing the process)
>>
>> Any thoughts on this suggestion. Pedro, Ryan, Moritz, Des, I think 
>> this impacts on data categories you have an interest in.
>>
>> Also, DavidF, Pedro, Ryan, do you think this makes *process state* 
>> redundant? As a status flag are we more interested in what process to 
>> do next, rather than which one is finished. At the same time the 
>> provenance data category could tell us which processes have already 
>> finished operating on the content.
>>
>> cheers,
>> Dave
>>
>>
>> On 24/04/2012 11:11, Moritz Hellwig wrote:
>>
>> to identify publication process metadata which might also be relevant 
>> for the LSP. I ran into a couple of questions though.
>>
>> I'll use approvalStatus as an example (from the requirements document):
>>
>> >> approvalStatus
>>
>> >> Information about the status of the content in a formal approval 
>> workflow
>>
>> >> Indicates whether the content has been approved for release
>>
>> >> Possible values:
>>
>> >>>> yes
>>
>> >>>> no
>>
>> Approval can have many values which are rarely only "release yes|no" 
>> and they can be client/application-specific. However, none of these 
>> statuses seem to be relevant to the LSP, as they only precede or 
>> succeed the LSP's processes.
>>
>
>
> -- 
> Felix Sasaki
>
> DFKI / W3C Fellow
>
Received on Thursday, 10 May 2012 00:33:50 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:31:44 UTC