Re: [ACTION-79]Consider consolidation of status-related data categories and process trigger from Felix Sasaki on 2012-05-09 (public-multilingualweb-lt@w3.org from May 2012)

From: Felix Sasaki <fsasaki@w3.org>
Date: Wed, 9 May 2012 12:15:28 +0200
To: Pedro L. Díez Orzas <pedro.diez@linguaserve.com>
Cc: Arle Lommel <arle.lommel@dfki.de>, David Lewis <dave.lewis@cs.tcd.ie>, public-multilingualweb-lt@w3.org
Message-ID: <CAL58czq4nX-5ygt0s+RqshKTJhy_VR-_kzwQrEj28ZFTuFeuiQ@mail.gmail.com>
Dear Pedro,

2012/5/9 Pedro L. Díez Orzas <pedro.diez@linguaserve.com>

> **
>
> Dear Felix, Arle, Dave, all,****
>
> ** **
>
> Of course, any expert contribution is always welcome, so go ahead if you
> think it is helpful.
>


It might help avoiding "re-inventing the wheel", see below.


> ****
>
> ** **
>
> Nevertheless, I completely agree with Arle and again, as far as we know,
> the HTTP headers provides information about cache in the client server side
> (so it could be used for the case that Dave pointed out about *staging
> server* in the client side), but it does not provides metadata to
> indicate to an external real time translation *and publication* system
> whether it has to cache a certain web content after being translated and
> published in real time or a part of the page.
>

There might be mechanisms in HTTP to pass that information to the external
system - that's what I would like to confirm with Yves.


> This is a content metadata is to be used and to cache by an external
> system from the client server at several content levels. ****
>
> ** **
>
> In any case, we are already using a lot of time for this, when this
> metadata is thought for emerging technologies that still not many people
> use (but we are convinced will do) , and that we can directly manage with
> our clients (we are actually doing it with real life clients). ****
>
> ** **
>
> I consider more profitable to go ahead with discussions about other data
> category (like processTrigger or readiness, or other) that are much more
> extended (localization chains, for instance), so if you think this
> “cacheStatus” (which category name is probably not the best, and it should
> be more something that express “indicator from clients whether the web
> content have to be cached by real time translation and publication
> systems”) it is not clear enough (I think I already explained enough our
> position about this) let’s drop it.
>

For dropping or not, I would propose that we follow these lines:

1) Will there be people / companies implementing this (at least two)?
2) Are there people and clients in the group regarding this as important?
3) Do we have time to discuss the details?

I am not sure yet about 1). Your input to this is a good indicator that it
is important, i.e. 2). But as you said we spend a lot of time on the
discussion, and other data categories might be more important in terms of
1,2,3.

So should we drop this and spend the time on other things? Whatever we
decide, I want to be sure that you, Pedro, are really happy with the
decision.

Felix

****
>
> ** **
>
> Best,****
>
> Pedro****
>
> ** **
>  ------------------------------
>
> *De:* Arle Lommel [mailto:arle.lommel@dfki.de]
> *Enviado el:* miércoles, 09 de mayo de 2012 10:28
> *Para:* Felix Sasaki
> *CC:* Pedro L. Díez Orzas; David Lewis; public-multilingualweb-lt@w3.org
>
> *Asunto:* Re: [ACTION-79]Consider consolidation of status-related data
> categories and process trigger
> ****
>
>  ** **
>
> Hi Felix,****
>
> ** **
>
> I think that with the intended scenario Pedro proposed the HTTP headers
> would not be granular enough. The cacheStatus could apply as far down as
> the segment level, although the more likely scenario is for it to apply at
> either the document level or the equivalent of the DITA topic level. Since
> a web page could potentially pull multiple topics into one place, the
> document itself would have a mix of cache statuses depending on the cache
> status of the objects it references. Perhaps Pedro can clarify, but even if
> that is the case, I don't think it would hurt for Yves to get involved, so
> I'd say to go for bringing him in.****
>
> ** **
>
> -Arle****
>
> ** **
>
> Sic scripsit Felix Sasaki in May 9, 2012 ad 08:46 :****
>
>
>
> ****
>
> Pedro, all,****
>
> ** **
>
> I am wondering if this discussion could benefit from input of an HTTP
> expert. I have the feeling that the existing HTTP headers might be
> sufficient to realize this requirement. Do you mind if I take Yves Lafon**
> **
>
> http://www.w3.org/People/all#ylafon****
>
> into the loop?****
>
> ** **
>
> Felix****
>
> 2012/5/8 Pedro L. Díez Orzas <pedro.diez@linguaserve.com>****
>
> Dear Dave,****
>
>  ****
>
> First of all, thank you for the consolidation task, which is hard, complex
> and “risky business” J.****
>
>  ****
>
> I would like to distinguish between cacheStatus and the rest. ****
>
>  ****
>
> About this specific case of cache status, I probably now understand the
> confusion. In you mail of the thread “Re: targetPointer Requirement
> update”, mail 08/05/2012 13:49, you mention “*ii) a realtime translation
> workflow, where content is put on a cache (I prefer perhaps a term like
> 'staging server' to avoid confusion with 'web cache')”. *Instead, the
> data category cacheStatus is not intended for the content in the *staging*or
> *hidden* in the client server, but for the source/translated/both in the
> side of the real time translation server. Actually, I did not considered
> the *staging server *in this, and probably it should be done in the way
> you suggest in your mail. Certainly the confusion was my fault when I
> described as:****
>
>  ****
>
>    - The original content is not saved in the cache (i.e., it is new or
>    has been updated): (re)translation is needed ****
>
>
>    - The translated content is not saved in the cache (i.e., it has not
>    been previously translated or has expired): translation is needed ****
>
>
>    - Neither the original nor the translated page are saved in the cache:
>    both need to be cached ****
>
>   ****
>
> It refers not the client side or CMS, but to the Real Time Translation
> System (RTTS) , which actually generates the web cache. For example, the
> value for timestamp is not the client who put it, like in ready-at = <the
> time at which it would be ready to cache>, but the RTTS when it does the
> caching. In that respect, the client indicates in the final HTML web page
> the values and whether a page or a part of a page needs to be cached or
> not, and if source, target or both:****
>
>  ****
>
>    - cached - values: yes, no; ****
>    - scope - values: source, target, both ****
>    - timestamp - date and time ****
>
>  ****
>
> In this scenario, the source pages (or parts of pages) are always
> translated in real time, and the translated pages (or parts) can be added
> to the cache to speed up future accesses, but some pages not only does not
> need to be cached, but needs not to be cached obligatory (for example pages
> in private areas, transactional pages of an e-commerce process or a bank…).
> ****
>
>  ****
>
> I cannot tell 100% if *implementors who would implement the cacheStatus
> are specifically only interested in that functionally and would be unlikely
> to also implement a more general readiness data category*, but even If it
> is 50% I would keep it separately, in the same way than other in
> “Internationalization” section. It is really a multilingualWebCache
> metadata in the pages for navigation of the final user. ****
>
>  ****
>
> I hope this helps, and I will try to answer the rest before Thursday’s
> meeting. ****
>
>  ****
>
> Best,****
>
> Pedro****
>
>  ****
>
>  ****
>  ------------------------------
>
> *De:* David Lewis [mailto:dave.lewis@cs.tcd.ie]
> *Enviado el:* martes, 08 de mayo de 2012 3:00
> ***Para****:* "Pedro L. Díez Orzas"
> *CC:* public-multilingualweb-lt@w3.org****
>
>
> *Asunto:* Re: [ACTION-79]Consider consolidation of status-related data
> categories and process trigger****
>
> ** **
>
>  ****
>
> Hi Pedro,
> Sorry, I didn't yet fill in the details of how I thought this might work
> for cache status, which would simply be:****
>
>    - The original content is not saved in the cache (i.e., it is new or
>    has been updated): (re)translation is needed ****
>
> the source document or element would have attribute:****
>
> ready-to-process  = cache-source
> ready-at = <the time at which it would be ready to cache>****
>
>    - The translated content is not saved in the cache (i.e., it has not
>    been previously translated or has expired): translation is needed ****
>
> the translation document or element would have attributes:****
>
> ready-to-process = cache-target
> ready-at = <the time at which it would be ready to cache>****
>
>    - Neither the original nor the translated page are saved in the cache:
>    both need to be cached ****
>
> you could either have both the above, or in cases where the source and
> target are in the same file use:****
>
> ready-to-process = cache-source-and-target
> ready-at = <the time at which it would be ready to cache>****
>
> Note, there is a revised flag there that could also be used if useful
>
> So, if I understand this right I think the  readiness attributes would
> provide equivalent meta-data. However, if you think this is a distinct use
> case, i.e. implementors who would implement the cacheStatus are
> specifically only interested in that functionally and would be unlikely to
> also implement a more general readiness data category, then definitely we
> should be considering a separate data category.
>
> cheers,
> Dave
>
>
> On 07/05/2012 18:32, Pedro L. Díez Orzas wrote: ****
>
> Hi Dave,****
>
>  ****
>
> I will look at it very carefully as soon as I can, since they are really
> major changes, but a priori I do not understand why to consolidate and to
> remove cacheStatus, since for me this is a completely different metadata
> than processTrigger, processStatus or other “status” that answers
> completely different requirements.****
>
>  ****
>
> As I explained in the notes and definition of cacheStatus, this metadata
> is not for localization chain o whatever localisation process, but for real
> time translation systems and their caching needs. In this respect I would
> put it again as it was (if you want it can called only “cache”, without
> “status”) and sorry for any confusion I could produce about it.****
>
>  ****
>
> Best,****
>
> Pedro****
>
>  ****
>
> *__________________________________*****
>
> * *****
>
> *Pedro L. Díez Orzas*****
>
> *Presidente Ejecutivo/CEO*****
>
> *Linguaserve Internacionalización de Servicios, S.A.*****
>
> *Tel.: +34 91 761 64 60 <%2B34%2091%20761%2064%2060>
> Fax: +34 91 542 89 28 *****
>
> *E-mail: **pedro.diez@linguaserve.com*****
>
> *www.linguaserve.com*****
>
> * *****
>
> «En cumplimiento con lo previsto con los artículos 21 y 22 de la Ley
> 34/2002, de 11 de julio, de Servicios de la Sociedad de Información y
> Comercio Electrónico, le informamos que procederemos al archivo y
> tratamiento de sus datos exclusivamente con fines de promoción de los
> productos y servicios ofrecidos por LINGUASERVE INTERNACIONALIZACIÓN DE
> SERVICIOS, S.A. En caso de que Vdes. no deseen que procedamos al archivo y
> tratamiento de los datos proporcionados, o no deseen recibir comunicaciones
> comerciales sobre los productos y servicios ofrecidos, comuníquenoslo a
> clients@linguaserve.com, y su petición será inmediatamente cumplida.»****
>
>  ****
>
> "According to the provisions set forth in articles 21 and 22 of Law
> 34/2002 of July 11 regarding Information Society and eCommerce Services, we
> will store and use your personal data with the sole purpose of marketing
> the products and services offered by LINGUASERVE INTERNACIONALIZACIÓN DE
> SERVICIOS, S.A. If you do not wish your personal data to be stored and
> handled, or you do not wish to receive further information regarding
> products and services offered by our company, please e-mail us to
> clients@linguaserve.com. Your request will be processed immediately."****
>
>  *____________________________________*****
>
>  ****
>
>  ****
>  ------------------------------
>
> *De:* David Lewis [mailto:dave.lewis@cs.tcd.ie <dave.lewis@cs.tcd.ie>]
> *Enviado el:* lunes, 07 de mayo de 2012 14:51
> *Para:* public-multilingualweb-lt@w3.org
> *Asunto:* Re: [ACTION-79]Consider consolidation of status-related data
> categories and process trigger****
>
>  ****
>
> Hi Pedro, Guys,
> Following the previous discussion on the proposal for consolidation around
> these data categories I have now made the following changes to the
> requirements document.
>
> Pedro, as discussed on Friday's call could you and any other interested
> parties examine these changes and flag anything issues on this thread.
>
> 1) I have update processTrigger and changed its name to 'readiness' as
> previously discussed
>
> http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#readiness
>
> 2) I have moved the need for a process model to a new requirement to
> reflect its relevance to several of the other data categories, including
> readiness, progress-indicator and provenance, and it need for further
> careful consideration:
>
> http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#Process_Model
>
> 3) As part of this consolidation I have removed the data categories of:
> processTrigger, cacheStatus, legalStatus, processState, proofreadingState
> and revision state
>
> 4) I've updated the data category tables and the related interests
> accordingly
>
> 5) I've highlighted issues (in bold below) to consider about the following
> properties of the removed processTrigger that are no longer present (as
> recorded in the notes for the readiness data category)****
>
>    - *contentType*, values: MIME or custom values - This indicates the
>    format or the type of the content used in the content in order to apply the
>    right filter or normalization rules, and the subsequent processes. For
>    example, to express HTML we could use: “contentType: text/html: *consider
>    consolidation with formatType or languageResource* ****
>
> >> Not agree, unless formatType refers really to computer format and not
> like now: about the format or service for which the content is produced
> (e.g., subtitles, spoken text)****
>
>    - *sourceLang* – value: standard ISO 639 value - this value indicates
>    the source language for the current translation requested. It is
>    different from the sourceLanguage (provenance) Data Category , since this
>    indicates the language the original source text was and sourceLang
>    indicates the current source language to be used for the translation that
>    can be different from the original source - *this should be considered
>    as an attribute for proveance* ****
>    - *contentResultSource* –value: yes / no. Indicates the format if the
>    Localisation chain needs to give back the original - *is this
>    necessary as an attribute here or as a separate attribute* ****
>    - *contentResultTarget* – value: monolingual, multilingual; indicates
>    if the resulting translation, in the cases of several target languages,
>    should be delivered in several monolingual content files or in a single
>    multilingual content file *this would require a more general purpose
>    return file indicator* ****
>    - *pivotLang* - value: standard ISO value. Indicates the intermediate
>    language in the case is needed. Two examples: 1) Going from a source
>    language to two language variants (eg. into Brazil and Portugal
>    Portuguese), it is more cost-effective to go to one first (being this first
>    variant a "pivot" language) and to revise later to the second variant;
>    Going from one language to another via an intermediate language (eg. from
>    Maltese into English and from English into Irish, because there is not
>    direct Maltese into Irish available translation). - *consider
>    consolidation with source language, , i.e. it is an attibute of the source
>    language* ****
>
>
> Regards,
> Dave
>
> On 04/05/2012 01:46, David Lewis wrote: ****
>
> Hi Moritz, guys,
> I added this progress-indicator data category to the requirements:
>
> http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#progress-indicator
>
> Regards,
> Dave
>
> On 28/04/2012 22:11, David Lewis wrote: ****
>
> Hi Morwitz,
> I moved this onto this separate thread related to the relevant
> consolidation action.
>
> I think there are two different data categories here.
>
> What you describe is a progress indicator. This would be a common feature
> on a lot of CMS-based and crowdsourced translation tools. It would be
> measured as the number of segments (or perhaps words) of a document (or a
> group of document representing a job) that have been processes as a
> proportion of the total that need to be processed.
>
> The other, which is what the current text for 'process state' (
> http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#process_state)
> specifies, is an indication of  which point in a process sequence has
> currently been reached. As discussed, this could be covered by the
> processTrigger/readiness data category we are discussing.
>
> Moritz, does this distinction match with your view here? If so then we
> could introduce a new 'progress-indicator' data category requirement, and
> then continue discussing the consolidation of 'process state' with
> processTrigger/readiness.
>
> thanks,
> Dave
>
>
> On 27/04/2012 18:40, Moritz Hellwig wrote: ****
>
> Hello,****
>
>  ****
>
> I might make this a separate thread, but since we are already talking
> about processState here...****
>
>  ****
>
> There were quite a lot of requests from our editorial team to have
> something like****
>
>  ****
>
> processIndicator ****
>
> Values integer, 0 to 100****
>
>  ****
>
> Zero would be "LSP process not begun"-ish, 100 would be "Completed". ****
>
>  ****
>
> There are - from our point of view - considerable advantages:****
>
> A) we can show a process progress indicator (in whichever visual
> representation) that does not require an understanding of what the actual
> process phase is on the MT side. ****
>
> B) the indicator can be agnostic to the number of processes / stages on
> the side of the LSP. If you run a hundred separate processes or feedback
> loops: fine by me.****
>
>  ****
>
> This would be beneficial for e.g. content creators who are unfamiliar with
> the language technology, its processes and so on. Also, it would allow us
> to built dashboards and generate reports e.g. to show and sort by
> progression & keep better track of multilingual projects. ****
>
>  ****
>
> Any thoughts?****
>
>  ****
>
> Cheers,****
>
> Moritz
>
> Sent from my iPhone****
>
>
> On 27.04.2012, at 01:14, "David Lewis" <dave.lewis@cs.tcd.ie> wrote:****
>
>  Pedro,
> Yes, the redundancy of process state is one outcome of what I'm proposing
> here.
>
> The key difference is that the proposal is that the data category
> indicates the next process that should be performed, rather than indicating
> the current process in operation. The motivation is that the readiness to
> undergo a new process step is more useful to a document in a CMS, then
> knowing the current state that is operating on it.
>
> Complementary to this, provenance indicates that a process is completed,
> and associated with this records useful information needed to monitor
> correct or efficient process operation, perhaps as needed to monitor a
> service level agreement.
>
> Neither process trigger or provenance however actually aim to control
> process flow. This is a complex topic which therefore is probably out of
> scope.
>
> What we do need however, is a way of defining  the values to use for
> referencing processes, i.e. from both the 'request-process' and the process
> reference in provenance. For this we may want both a default set in the
> standard, and a way of unambiguously defining these for a particular
> business case. The key thing in any one case of interoperability is that
> the interoperating implementations exchange and understand the _same set_
> of process values.
>
> let keep the discussion going on the list,
> Dave
>
> On 26/04/2012 15:29, Pedro L. Díez Orzas wrote: ****
>
> Hi David,****
>
>  ****
>
> I need to consider this more carefully. ****
>
>  ****
>
> But, what I see is that *process state *is perhaps redundant with:proofreading state
> or revision state, since these can be values of process state:
> proofreaded, revised, reviewed, translated, localized…****
>
>  ****
>
> Best,****
>
> Pedro****
>
>  ****
>  ------------------------------
>
> *De:* David Lewis [mailto:dave.lewis@cs.tcd.ie <dave.lewis@cs.tcd.ie>]
> *Enviado el:* jueves, 26 de abril de 2012 1:52
> *Para:* public-multilingualweb-lt@w3.org
> *Asunto:* Re: [all] Discussion on proposed metadata categories:
> approvalStatus****
>
>  ****
>
> Hi Moritz,
> I think you make a very good general point here. It may be a bit too open
> ended to specify data categories that hardwire the completion of a specific
> step. We would run into the same issues we have with defining the different
> process values as we discussed around process trigger. Also, its not clear
> to me that all status flag suggestion for current steps, e.g. legal
> approval, really need to be separated from other steps.
>
> I think therefore we could generalise this as part of the process trigger
> data category as you suggest. This could allow us to consolidate *
> approvalStatus*, *cacheStatus*,* legalStaus*, *proofReading state* and *revision
> state* (and delegate the definition of these steps to data values rather
> than individual data categories). We can address *cacheStatus*, and at he
> same time generalise it to other processes than just translation, by
> including the time stamp and a revision flag.
>
> Also, I think the priority data category should be included here, as
> translation could consist of many different processes in combination, so it
> semantics are dependent on which one. At the same time we may also be
> interested in defining priorities even for non translation activities, such
> as review.
>
> *requested-process* (which has the name of the next process requested)
>
> *process-ref *(which may allow us to point to an external set of process
> definitions used for processRequested if the default value set is not used)
>
> *ready-at* (defines the time the content is ready for the process, it
> could be some time in the past, or some time in the future - this support
> part of the cacheStatus function)
>
> *revised* (yes/no - indicated is this is a different version of content
> that was previously marked as ready for the declared process)
>
> *priority* (I think for now we should keep this simple and just have
> values high/low )
>
> *complete-by* (provides a target date-time for completing the process)
>
> Any thoughts on this suggestion. Pedro, Ryan, Moritz, Des, I think this
> impacts on data categories you have an interest in.
>
> Also, DavidF, Pedro, Ryan, do you think this makes *process state*redundant? As a status flag are we more interested in what process to do
> next, rather than which one is finished. At the same time the provenance
> data category could tell us which processes have already finished operating
> on the content.
>
> cheers,
> Dave
>
>
> On 24/04/2012 11:11, Moritz Hellwig wrote: ****
>
> to identify publication process metadata which might also be relevant for
> the LSP. I ran into a couple of questions though.****
>
>  ****
>
> I’ll use approvalStatus as an example (from the requirements document):***
> *
>
> >> approvalStatus ****
>
> >> Information about the status of the content in a formal approval
> workflow****
>
> >> Indicates whether the content has been approved for release ****
>
> >> Possible values:****
>
> >>>> yes****
>
> >>>> no****
>
>  ****
>
> Approval can have many values which are rarely only “release yes|no” and
> they can be client/application-specific. However, none of these statuses
> seem to be relevant to the LSP, as they only precede or succeed the LSP’s
> processes.****
>
>  ****
>
>  ****
>
>  ****
>
>  ****
>
>  ****
>
>  ****
>
>
>
> ****
>
> ** **
>
> --
> Felix Sasaki****
>
> DFKI / W3C Fellow****
>
> ** **
>
> ** **
>



-- 
Felix Sasaki
DFKI / W3C Fellow
Received on Wednesday, 9 May 2012 10:16:02 UTC