- From: Felix Sasaki <fsasaki@w3.org>
- Date: Sat, 19 Jan 2013 08:50:58 +0100
- To: "Lieske, Christian" <christian.lieske@sap.com>
- CC: "joerg@bioloom.de" <joerg@bioloom.de>, "public-multilingualweb-lt-comments@w3.org" <public-multilingualweb-lt-comments@w3.org>
- Message-ID: <50FA5062.6040305@w3.org>
Hi Christian, all, this is still a personal response, but from what you write below: [I understand your point. I guess that slightly different assumptions/views on MT-related processes exist. The Uses Cases above from my point of view all pertain to “single engine” scenarios. ] I think you express that the current formulation of "Domain" is useful for some MT related processes, but not for all. So I'm inclined to reject the comment as a "new feature request to address new usage scenarios", for reasons and consequences (see "later" tracker product) among others of timing, see also http://lists.w3.org/Archives/Public/public-multilingualweb-lt-comments/2013Jan/0124.html Best, Felix Am 18.01.13 16:50, schrieb Lieske, Christian: > > Hi Felix, Jörg, all, > > Please find some my thoughts (CL>CL>) on the reply below. > > Cheers, > > Christian > > *From:*Felix Sasaki [mailto:fsasaki@w3.org] > *Sent:* Donnerstag, 17. Januar 2013 18:26 > *To:* Lieske, Christian > *Cc:* joerg@bioloom.de; public-multilingualweb-lt-comments@w3.org > *Subject:* Re: [Issue-75] - Domain > > Hi Christian, Jörg, all, > > co-chair hat on: I think the idea of "adding domain information" is > clear, and Pablo said it could be useful for his customer, and Yves > said it could be useful for XLIFF mapping. > http://lists.w3.org/Archives/Public/public-multilingualweb-lt-comments/2013Jan/0053.html > http://lists.w3.org/Archives/Public/public-multilingualweb-lt-comments/2013Jan/0059.html > So we can move this topic to the next stage: who from the implementers > for domain > http://htmlpreview.github.com/?https://raw.github.com/finnle/ITS-2.0-Testsuite/master/its2.0/testSuiteDashboard.html > would implement local domain, and who thinks (this question is > important too) that this is worth a delay? > > Co-chair hat of, and replying to your proposal at > http://lists.w3.org/Archives/Public/public-multilingualweb-lt-comments/2013Jan/0087.html > (replying here so that we have only one thread) > > [ > > CL>>>>> I understand the point. My suggestion would be to refine the requirement for the revised domainMapping that I sketched: the information about the target environment/engine is optional. > CL>>>>> Thus, you could have the following: > CL>>>>> <its:domainRule ... > CL>>>>> domainMapping= > CL>>>>> 'MT-engine-X,"automotive auto, medical medicine, 'criminal law' law, 'property law' law"', > CL>>>>> 'TM-system-Y,"automotive X, 'criminal law' L, 'property law' law"' > CL>>>>> "automotive Z, 'criminal law' C, 'property law' law"' <---- here is the change (no info about the target environment/engine) > CL>>>>> /> > CL>>>>> > CL>>>>> Aside: I am a bit unsure how realistic the scenario "specify domainMapping without knowing the engine/environment" is. > > ] > > Making the engine information optional doesn't solve the problem I > described: > - domainMapping expresses "choose MT-engine-X" > > CL>CL> This is not what I had in mind as semantics for the first > parameter of a list item in the revised “domainMapping”. To me, the > semantics was “If you pass through MT-engine-X, then work with the > following domain information”. > > - it also expresses "map the domain 'automotive' to 'auto' > - later in the workflow there are several engines available: > MT-engine-X, MT-engine-Y > - only MT-engine-Y knows about 'auto', so the "choose MT-engine-X" > information from domainMapping disturbes the workflow > > Wrt to 'I am a bit unsure how realistic the scenario "specify > domainMapping without knowing the engine/environment" is. ': so far it > was helpful for starting work on three implementations (if I count > correctly) using domain information in MT workflows. See > > http://www.w3.org/International/multilingualweb/lt/wiki/Use_cases_-_high_level_summary#Simple_Machine_Translation > http://www.w3.org/International/multilingualweb/lt/wiki/Use_cases_-_high_level_summary#Online_MT_System_Internationalization > http://www.w3.org/International/multilingualweb/lt/wiki/Use_cases_-_high_level_summary#Simple_Segmente_Machine_Translation > > It even has a benefit not to specify the engine: content can be > prepared for processing of all these services. Since there is no need > to acomodate "engine" information, the content can choose freely which > engine works best - based purely on domain information. > > CL>CL> I understand your point. I guess that slightly different > assumptions/views on MT-related processes exist. The Uses Cases above > from my point of view all pertain to “single engine” scenarios. > > CL>CL> In this kind of scenario it is not really necessary to provide > information “this is for engine X”. In a “multi-engine” scenarios, the > situation is different. In order to see why, one first needs to > > CL>CL> acknowledge that at least two flavors of “multi-engine” > scenarios exist: multi-engine in pipeline (e.g. first X, then for > anything below a confidence of 0.5 Y) vs. multi-engine exclusive (e.g. > X for domain “financials”, > > CL>CL> Y for domain “health”. In both scenarios, you need a mechanism > to specify which domain information is for engine X, and which is for > engine Y. > > > So my questions to you, Christian, and to at least above three > implementers would be: do you see implementers processing domain, who > would be willing to contribute to testing the engine information? If > not (again co-chair hat on) we don't have a use case on the group, it > seems, and can't bring such a feature through the standardization process. > > Best, > > Felix > > Am 17.01.13 16:07, schrieb Lieske, Christian: > > Hi Jörg, Felix, all, > > > > Unfortunately, I still don't understand, the current draft doesn't have provisions for > > > > CL>> Global: <its:domainRule selector="/h:html/h:body" its-domain="financials"> > > CL>> Local: <em its-domain="financials">IMF</em> > > > > If we don't have these provisions, we may end up with the messy situation/solution that Jörg sketches. > > > > Cheers, > > Christian > > > > -----Original Message----- > > From: Jörg Schütz [mailto:joerg@bioloom.de] > > Sent: Mittwoch, 16. Januar 2013 15:28 > > To:public-multilingualweb-lt-comments@w3.org <mailto:public-multilingualweb-lt-comments@w3.org> > > Cc:public-multilingualweb-lt-comments@w3.org <mailto:public-multilingualweb-lt-comments@w3.org> > > Subject: Re: [Issue-75] - Domain > > > > Hi Felix, Christian, and all, > > > > ITS should not be hijacked to take over the role of a workflow engine or > > similar application because there might be several consumers of ITS information... > > > > @Christian > [Could you provide one or two examples/proofs for this?] > > > > Here is an outline of my idea (which potentially also hijacks ITS to > > some extend): > > > > Possible ITS Application Scenario to Extend the "Domain" Data Category > > > > (1) Use (general) domain pointing for the broad classification of your > > content (global reach), i.e. employ the domain data categroy. > > (2) In cases where (1) is either too general (broad), or you want to > > further classify only parts of your content (local reach), use the > > disambiguation data category. This includes the further classifying of a > > sequence of strings which do not represent what usually is called a term > > (domain-specific vocabulary) or a multi-word unit (mwu). > > (3) For the term and mwu case use the terminology data category. > > > > Case (3) is applied as described in the ITS 2.0 specification; always > > consider to link to an appropriate authoritative internal or external > > terminology resource or ontology (e.g. Cyc, Snomed, MeSH, etc.) on which > > both producer and consumer have agreed upon (in this sense ITS is also > > part of a contract). > > > > In this scenario, case (2) is a bit trickier because "officially" > > disambiguation is also applied to meaningful string sequences, i.e. a > > word or a mwu, as in the terminology case, but now we extend this data > > category to arbitary elements, for example an entire paragraph, with the > > restriction that the attributes disambigConfidence and particularly > > disambigGranularity have a broader meaning such as the conceptual > > association to a domain's root element or to certain upper model elements. > > > > HTML Example (local) > > ... > > <p><span its-disambig-confidence="0.9" > > > > its-disambig-class-ref="http://snowowl.sample.com/SNOMED_CT_Concept/Pharmaceutical_Product" <http://snowowl.sample.com/SNOMED_CT_Concept/Pharmaceutical_Product>> > > Ambroxol has mucolytic and local-anaesthetic pharmacological effects > > </span>. > > </p> > > ... > > > > Note: In this example, only the disambigClassRef attribute is used to > > account for the "broader" employment of the data category. > > > > This use case scenario might sound like a bootstrap paradox... but this > > is one possibility of using ITS 2.0 ... ;-) > > > > All the best -- Jörg > > > > On Jan 16, 2013, at 14:23 (CET), Felix Sasaki wrote: > > Am 16.01.13 12:15, schrieb Lieske, Christian: > > Hi Felix, Pablo, all, > > > > Please find some my thoughts on the reply below. > > > > Cheers, > > Christian > > > > -----Original Message----- > > From: Felix Sasaki [mailto:fsasaki@w3.org] > > Sent: Mittwoch, 16. Januar 2013 08:07 > > To: Pablo Nieto Caride > > Cc: Lieske, Christian;public-multilingualweb-lt-comments@w3.org <mailto:public-multilingualweb-lt-comments@w3.org> > > Subject: Re: [Issue-75] - Domain > > > > (trying to minimize the number of mails, hence replying to several > > aspects in this mail) > > > > Hi Christian, Pablo, all, > > > > at Christian: you write at > > http://lists.w3.org/Archives/Public/public-multilingualweb-lt-comments/2013Jan/0034.html > > > > that 2b of your comment is resolved. How about 2a? If you are not > > satisfied with the replies in this thread, could you propose a change to > > the spec? > > > > CL>> Currently, I consider 2a as being unresolved. > > CL>> Addressing 2a (capture the information "This is for component X") > > to me does not appear to be straightforward, since > > CL>> you would need to accommodate an addition piece of information. > > One could imagine representations such as > > CL>> <its:domainRule ... > > CL>> domainMapping= > > CL>> 'MT-engine-X,"automotive auto, medical medicine, > > 'criminal law' law, 'property law' law"', > > CL>> 'TM-system-Y,"automotive X, 'criminal law' L, > > 'property law' law"' > > CL>> /> > > > > Such a specification of the engine could lead to conflicting information: > > MT-engine-X has a module for automotive. If however the engine is not > > mentioned in a domain mapping, but a different one (which does not have > > the automotive module): which one to choose? > > It looks like what you add as information (= choosing the engine) is > > something one would do after the domain mapping, not at the same time. > > Otherwise you may run into the conflict described above. > > > > CL>> This, however, is not in line with the current normative text on > > "domain". > > > > Wrt to your proposal below (add a note about 2b to the spec): sure, do > > you want to draft something? The same for 2a (if you don't have a > > specific solution in mind, stating the issue might already be helpful). > > > > CL>> How about the following additional paragraph for the first note > > in (http://www.w3.org/TR/2012/WD-its20-20121206/#domain) for 2b? > > CL>> > > CL>> "domainMapping" even allows "domain" systems/hierarchies to be > > encoded. domainMapping="FIN, 'A A-1 A-1-X'" could for example be used > > to capture the following information: > > > > Would it be OK to re-formulate that sentence above like this: > > [ > > the domainMapping attribute does not itself specify how to encode > > "domain" systems/hierachies. An application using domainMapping hence is > > free to work with application specific hierarchies to capture > > information like: > > ] > > > > It seems this is more in line with the language tag example: it is > > saying that applications can do things that are on purpose underspecified. > > CL>> a. There exists a domain system that includes domains (e.g. A), > > sub-domains (e.g. A-1), and sub-subdomains (e.g. A-1-X) > > CL>> b. Prefer the lowest level in the system (e.g. work with an MT > > engine for A-1-X if available, otherwise work with one for A-1 or even > > A if available) > > CL>> > > CL>> This "power to encode and to interpret" is similar to matching of > > language tags, seehttp://tools.ietf.org/html/rfc4647#section-3.2. > > CL>> "Language tag matching is a tool, and does not by itself specify > > a complete procedure for the use of language tags ... > > CL>> The matching specification itself makes clear that it there are many > > CL>> aspects that are left out for actually using language tags. But > > having no matching at all would be even less interoperability, hence > > the "imperfect" matching scheme. > > > > Best, > > > > Felix > > > > > > Wrt to 1 (local domain): would this also be relevant for other > > implementers of domain (asking again)? > > > > About this one: we have Pablo and Yves saying in separate mails this > > might be of interest - enough to get through the w3c process. But is it > > worth another last call period? > > > > Best, > > > > Felix > > > > > > Best, > > > > Felix > > > > Am 15.01.13 19:32, schrieb Pablo Nieto Caride: > > Hi all, > > > > Felix, I think that a local domain could be interesting, at least WP4 > > client would be happy with that, I don't know what the others think. > > > > Christian, regarding the domain mapping I think that Yves and Felix > > are right, you can implement your own mapping, you can adapt it to > > specific MT if you want, as for the example <its:domainRule > > selector="/h:html/h:body" ... domainMapping="FIN, 'A A-1 A1-A1X'"/>, > > I certain MT Systems can manage the precedence by themselves. > > > > Cheers, > > Pablo. > > Hi, > > > > I wonder if it would be good idea to add the scenario I have provided > > (domain "system") and Felix' information on how to approach it > > (namely similar to language tag matching) to one of the "notes" that > > currently are in place for in the "domain" section. > > Best regards, > > Christian > > > > -----Original Message----- > > From:christian.lieske@sap.com <mailto:christian.lieske@sap.com> > > Sent: Dienstag, 15. Januar 2013 08:10 > > To: 'Felix Sasaki';public-multilingualweb-lt-comments@w3.org <mailto:public-multilingualweb-lt-comments@w3.org> > > Subject: RE: [Issue-75] - Domain > > > > Hi Felix, > > > > I follow your line of thought related to the similarities between > > "domainMapping" and matching of language tags. Thus, it would be OK > > for me to consider 2.b of > > http://lists.w3.org/Archives/Public/public-multilingualweb-lt-comments/2013Jan/0022.html > > closed. > > > > Cheers, > > Christian > > > > -----Original Message----- > > From: Felix Sasaki [mailto:fsasaki@w3.org] > > Sent: Montag, 14. Januar 2013 19:27 > > To:public-multilingualweb-lt-comments@w3.org <mailto:public-multilingualweb-lt-comments@w3.org> > > Subject: Re: [Issue-75] - Domain > > > > Hi Christian, Yves, all, > > > > Am 14.01.13 16:52, schrieb Yves Savourel: > > Hi Christian, all, > > > > > > CL>> It seems as if I didn't manage to my point about this aspect of > > "domain" is clear. > > CL>> Let me to try to provide a remedy by adding to my original > > comment: > > CL>> Something like its-domain="financials" could not just be imagined > > CL>>to work in a global rule (e.g. instead of a pointer); in > > addition, a local use of "domain" > > CL>> could be imagined > > CL>> Global: <its:domainRule selector="/h:html/h:body" > > its-domain="financials"> > > CL>> Local: <em its-domain="financials">IMF</em> > > > > So (If I'm getting this right) you'd like a way to override the > > domain for spans of content? (Since the Dublin Core in HTML doesn't > > let you do that (the subject is define at the document level)). > > > > I think one of the reasons I hear early on was that today it would > > be difficult to make that distinction at the MT level. But I suppose > > MT engine selection is not the only application for domain. Maybe > > others have additional reason why we don't have a local domain? > > Given the implementation driven approach we have made so far I would > > ask: is there an implementation on the horizon that would process > > local domain? > > > > CL>> Why do you think that the scenario that I sketch (multiply domain > > CL>> "systems" used in a processing chain) implies that a standard > > exists? > > CL>> I would rather think that the implication is the other way round: > > CL>> Since there is no standard, there is a need to accommodate > > heterogeneity. > > > > I agree, but so far that has not been part of the scope of ITS. > > > > > > CL>> I guess your point is valid in the sense that one could go for > > CL>> something like <its:domainRule selector="/h:html/h:body" ... > > CL>> domainMapping="FIN, 'A A-1 A1-A1X'"/>. > > CL>> However, this would require that additional information would have > > CL>> to be captured elsewhere (so that for example, the precedence > > CL>> 'A > A-1 > A1-A1X' could be captured). > > > > ITS doesn't prescribe what the right part of the mapping must be or > > how it should be used. > > It's really just a way to allow user-defined mechanisms to be > > connected to the input metadata. > > I suppose it is also beyond the scope of ITS. > > As I understand Christian he does not ask to prescripe a mapping, but > > "to accomodate for heterogeneity": allow people to formulate their own > > mapping. > > > > I think we do that: we don't make the usage of the mapping attribute > > mandatory. It is an optional attribute. If "our" mapping algorithm > > doesn't respond to a specific mapping approach, everybody can implement > > his own mapping. > > > > This is similar to matching of language tags, see > > http://tools.ietf.org/html/rfc4647#section-3.2 > > "Language tag matching is a tool, and does not by itself specify a > > complete procedure for the use of language tags. Such procedures are > > intimately tied to the application protocol in which they occur." > > The matching specification itself makes clear that it there are many > > aspects that are left out for actually using language tags. But having > > no matching at all would be even less interoperability, hence the > > "imperfect" matching scheme. > > > > Best, > > > > Felix > > > > cheers, > > -yves > > > > >
Received on Saturday, 19 January 2013 07:51:26 UTC