- From: Felix Sasaki <fsasaki@w3.org>
- Date: Thu, 17 Jan 2013 18:26:05 +0100
- To: "Lieske, Christian" <christian.lieske@sap.com>
- CC: "joerg@bioloom.de" <joerg@bioloom.de>, "public-multilingualweb-lt-comments@w3.org" <public-multilingualweb-lt-comments@w3.org>
- Message-ID: <50F8342D.2040502@w3.org>
Hi Christian, Jörg, all, co-chair hat on: I think the idea of "adding domain information" is clear, and Pablo said it could be useful for his customer, and Yves said it could be useful for XLIFF mapping. http://lists.w3.org/Archives/Public/public-multilingualweb-lt-comments/2013Jan/0053.html http://lists.w3.org/Archives/Public/public-multilingualweb-lt-comments/2013Jan/0059.html So we can move this topic to the next stage: who from the implementers for domain http://htmlpreview.github.com/?https://raw.github.com/finnle/ITS-2.0-Testsuite/master/its2.0/testSuiteDashboard.html would implement local domain, and who thinks (this question is important too) that this is worth a delay? Co-chair hat of, and replying to your proposal at http://lists.w3.org/Archives/Public/public-multilingualweb-lt-comments/2013Jan/0087.html (replying here so that we have only one thread) [ CL>>>>> I understand the point. My suggestion would be to refine the requirement for the revised domainMapping that I sketched: the information about the target environment/engine is optional. CL>>>>> Thus, you could have the following: CL>>>>> <its:domainRule ... CL>>>>> domainMapping= CL>>>>> 'MT-engine-X,"automotive auto, medical medicine, 'criminal law' law, 'property law' law"', CL>>>>> 'TM-system-Y,"automotive X, 'criminal law' L, 'property law' law"' CL>>>>> "automotive Z, 'criminal law' C, 'property law' law"' <---- here is the change (no info about the target environment/engine) CL>>>>> /> CL>>>>> CL>>>>> Aside: I am a bit unsure how realistic the scenario "specify domainMapping without knowing the engine/environment" is. ] Making the engine information optional doesn't solve the problem I described: - domainMapping expresses "choose MT-engine-X" - it also expresses "map the domain 'automotive' to 'auto' - later in the workflow there are several engines available: MT-engine-X, MT-engine-Y - only MT-engine-Y knows about 'auto', so the "choose MT-engine-X" information from domainMapping disturbes the workflow Wrt to 'I am a bit unsure how realistic the scenario "specify domainMapping without knowing the engine/environment" is. ': so far it was helpful for starting work on three implementations (if I count correctly) using domain information in MT workflows. See http://www.w3.org/International/multilingualweb/lt/wiki/Use_cases_-_high_level_summary#Simple_Machine_Translation http://www.w3.org/International/multilingualweb/lt/wiki/Use_cases_-_high_level_summary#Online_MT_System_Internationalization http://www.w3.org/International/multilingualweb/lt/wiki/Use_cases_-_high_level_summary#Simple_Segmente_Machine_Translation It even has a benefit not to specify the engine: content can be prepared for processing of all these services. Since there is no need to acomodate "engine" information, the content can choose freely which engine works best - based purely on domain information. So my questions to you, Christian, and to at least above three implementers would be: do you see implementers processing domain, who would be willing to contribute to testing the engine information? If not (again co-chair hat on) we don't have a use case on the group, it seems, and can't bring such a feature through the standardization process. Best, Felix Am 17.01.13 16:07, schrieb Lieske, Christian: > Hi Jörg, Felix, all, > > Unfortunately, I still don't understand, the current draft doesn't have provisions for > > CL>> Global: <its:domainRule selector="/h:html/h:body" its-domain="financials"> > CL>> Local: <em its-domain="financials">IMF</em> > > If we don't have these provisions, we may end up with the messy situation/solution that Jörg sketches. > > Cheers, > Christian > > -----Original Message----- > From: Jörg Schütz [mailto:joerg@bioloom.de] > Sent: Mittwoch, 16. Januar 2013 15:28 > To: public-multilingualweb-lt-comments@w3.org > Cc: public-multilingualweb-lt-comments@w3.org > Subject: Re: [Issue-75] - Domain > > Hi Felix, Christian, and all, > > ITS should not be hijacked to take over the role of a workflow engine or > similar application because there might be several consumers of ITS information... > > @Christian > [Could you provide one or two examples/proofs for this?] > > Here is an outline of my idea (which potentially also hijacks ITS to > some extend): > > Possible ITS Application Scenario to Extend the "Domain" Data Category > > (1) Use (general) domain pointing for the broad classification of your > content (global reach), i.e. employ the domain data categroy. > (2) In cases where (1) is either too general (broad), or you want to > further classify only parts of your content (local reach), use the > disambiguation data category. This includes the further classifying of a > sequence of strings which do not represent what usually is called a term > (domain-specific vocabulary) or a multi-word unit (mwu). > (3) For the term and mwu case use the terminology data category. > > Case (3) is applied as described in the ITS 2.0 specification; always > consider to link to an appropriate authoritative internal or external > terminology resource or ontology (e.g. Cyc, Snomed, MeSH, etc.) on which > both producer and consumer have agreed upon (in this sense ITS is also > part of a contract). > > In this scenario, case (2) is a bit trickier because "officially" > disambiguation is also applied to meaningful string sequences, i.e. a > word or a mwu, as in the terminology case, but now we extend this data > category to arbitary elements, for example an entire paragraph, with the > restriction that the attributes disambigConfidence and particularly > disambigGranularity have a broader meaning such as the conceptual > association to a domain's root element or to certain upper model elements. > > HTML Example (local) > ... > <p><span its-disambig-confidence="0.9" > > its-disambig-class-ref="http://snowowl.sample.com/SNOMED_CT_Concept/Pharmaceutical_Product"> > Ambroxol has mucolytic and local-anaesthetic pharmacological effects > </span>. > </p> > ... > > Note: In this example, only the disambigClassRef attribute is used to > account for the "broader" employment of the data category. > > This use case scenario might sound like a bootstrap paradox... but this > is one possibility of using ITS 2.0 ... ;-) > > All the best -- Jörg > > On Jan 16, 2013, at 14:23 (CET), Felix Sasaki wrote: >> Am 16.01.13 12:15, schrieb Lieske, Christian: >>> Hi Felix, Pablo, all, >>> >>> Please find some my thoughts on the reply below. >>> >>> Cheers, >>> Christian >>> >>> -----Original Message----- >>> From: Felix Sasaki [mailto:fsasaki@w3.org] >>> Sent: Mittwoch, 16. Januar 2013 08:07 >>> To: Pablo Nieto Caride >>> Cc: Lieske, Christian; public-multilingualweb-lt-comments@w3.org >>> Subject: Re: [Issue-75] - Domain >>> >>> (trying to minimize the number of mails, hence replying to several >>> aspects in this mail) >>> >>> Hi Christian, Pablo, all, >>> >>> at Christian: you write at >>> http://lists.w3.org/Archives/Public/public-multilingualweb-lt-comments/2013Jan/0034.html >>> >>> that 2b of your comment is resolved. How about 2a? If you are not >>> satisfied with the replies in this thread, could you propose a change to >>> the spec? >>> >>> CL>> Currently, I consider 2a as being unresolved. >>> CL>> Addressing 2a (capture the information "This is for component X") >>> to me does not appear to be straightforward, since >>> CL>> you would need to accommodate an addition piece of information. >>> One could imagine representations such as >>> CL>> <its:domainRule ... >>> CL>> domainMapping= >>> CL>> 'MT-engine-X,"automotive auto, medical medicine, >>> 'criminal law' law, 'property law' law"', >>> CL>> 'TM-system-Y,"automotive X, 'criminal law' L, >>> 'property law' law"' >>> CL>> /> >> Such a specification of the engine could lead to conflicting information: >> MT-engine-X has a module for automotive. If however the engine is not >> mentioned in a domain mapping, but a different one (which does not have >> the automotive module): which one to choose? >> It looks like what you add as information (= choosing the engine) is >> something one would do after the domain mapping, not at the same time. >> Otherwise you may run into the conflict described above. >> >>> CL>> This, however, is not in line with the current normative text on >>> "domain". >>> >>> Wrt to your proposal below (add a note about 2b to the spec): sure, do >>> you want to draft something? The same for 2a (if you don't have a >>> specific solution in mind, stating the issue might already be helpful). >>> >>> CL>> How about the following additional paragraph for the first note >>> in (http://www.w3.org/TR/2012/WD-its20-20121206/#domain) for 2b? >>> CL>> >>> CL>> "domainMapping" even allows "domain" systems/hierarchies to be >>> encoded. domainMapping="FIN, 'A A-1 A-1-X'" could for example be used >>> to capture the following information: >> Would it be OK to re-formulate that sentence above like this: >> [ >> the domainMapping attribute does not itself specify how to encode >> "domain" systems/hierachies. An application using domainMapping hence is >> free to work with application specific hierarchies to capture >> information like: >> ] >> >> It seems this is more in line with the language tag example: it is >> saying that applications can do things that are on purpose underspecified. >>> CL>> a. There exists a domain system that includes domains (e.g. A), >>> sub-domains (e.g. A-1), and sub-subdomains (e.g. A-1-X) >>> CL>> b. Prefer the lowest level in the system (e.g. work with an MT >>> engine for A-1-X if available, otherwise work with one for A-1 or even >>> A if available) >>> CL>> >>> CL>> This "power to encode and to interpret" is similar to matching of >>> language tags, see http://tools.ietf.org/html/rfc4647#section-3.2. >>> CL>> "Language tag matching is a tool, and does not by itself specify >>> a complete procedure for the use of language tags ... >>> CL>> The matching specification itself makes clear that it there are many >>> CL>> aspects that are left out for actually using language tags. But >>> having no matching at all would be even less interoperability, hence >>> the "imperfect" matching scheme. >> Best, >> >> Felix >> >>> Wrt to 1 (local domain): would this also be relevant for other >>> implementers of domain (asking again)? >> About this one: we have Pablo and Yves saying in separate mails this >> might be of interest - enough to get through the w3c process. But is it >> worth another last call period? >> >> Best, >> >> Felix >> >>> Best, >>> >>> Felix >>> >>> Am 15.01.13 19:32, schrieb Pablo Nieto Caride: >>>> Hi all, >>>> >>>> Felix, I think that a local domain could be interesting, at least WP4 >>>> client would be happy with that, I don't know what the others think. >>>> >>>> Christian, regarding the domain mapping I think that Yves and Felix >>>> are right, you can implement your own mapping, you can adapt it to >>>> specific MT if you want, as for the example <its:domainRule >>>> selector="/h:html/h:body" ... domainMapping="FIN, 'A A-1 A1-A1X'"/>, >>>> I certain MT Systems can manage the precedence by themselves. >>>> >>>> Cheers, >>>> Pablo. >>>> Hi, >>>> >>>> I wonder if it would be good idea to add the scenario I have provided >>>> (domain "system") and Felix' information on how to approach it >>>> (namely similar to language tag matching) to one of the "notes" that >>>> currently are in place for in the "domain" section. >>>> Best regards, >>>> Christian >>>> >>>> -----Original Message----- >>>> From: christian.lieske@sap.com >>>> Sent: Dienstag, 15. Januar 2013 08:10 >>>> To: 'Felix Sasaki'; public-multilingualweb-lt-comments@w3.org >>>> Subject: RE: [Issue-75] - Domain >>>> >>>> Hi Felix, >>>> >>>> I follow your line of thought related to the similarities between >>>> "domainMapping" and matching of language tags. Thus, it would be OK >>>> for me to consider 2.b of >>>> http://lists.w3.org/Archives/Public/public-multilingualweb-lt-comments/2013Jan/0022.html >>>> closed. >>>> >>>> Cheers, >>>> Christian >>>> >>>> -----Original Message----- >>>> From: Felix Sasaki [mailto:fsasaki@w3.org] >>>> Sent: Montag, 14. Januar 2013 19:27 >>>> To: public-multilingualweb-lt-comments@w3.org >>>> Subject: Re: [Issue-75] - Domain >>>> >>>> Hi Christian, Yves, all, >>>> >>>> Am 14.01.13 16:52, schrieb Yves Savourel: >>>>> Hi Christian, all, >>>>> >>>>> >>>>> CL>> It seems as if I didn't manage to my point about this aspect of >>>>> "domain" is clear. >>>>> CL>> Let me to try to provide a remedy by adding to my original >>>>> comment: >>>>> CL>> Something like its-domain="financials" could not just be imagined >>>>> CL>>to work in a global rule (e.g. instead of a pointer); in >>>>> addition, a local use of "domain" >>>>> CL>> could be imagined >>>>> CL>> Global: <its:domainRule selector="/h:html/h:body" >>>>> its-domain="financials"> >>>>> CL>> Local: <em its-domain="financials">IMF</em> >>>>> >>>>> So (If I'm getting this right) you'd like a way to override the >>>>> domain for spans of content? (Since the Dublin Core in HTML doesn't >>>>> let you do that (the subject is define at the document level)). >>>>> >>>>> I think one of the reasons I hear early on was that today it would >>>>> be difficult to make that distinction at the MT level. But I suppose >>>>> MT engine selection is not the only application for domain. Maybe >>>>> others have additional reason why we don't have a local domain? >>>> Given the implementation driven approach we have made so far I would >>>> ask: is there an implementation on the horizon that would process >>>> local domain? >>>> >>>>> CL>> Why do you think that the scenario that I sketch (multiply domain >>>>> CL>> "systems" used in a processing chain) implies that a standard >>>>> exists? >>>>> CL>> I would rather think that the implication is the other way round: >>>>> CL>> Since there is no standard, there is a need to accommodate >>>>> heterogeneity. >>>>> >>>>> I agree, but so far that has not been part of the scope of ITS. >>>>> >>>>> >>>>> CL>> I guess your point is valid in the sense that one could go for >>>>> CL>> something like <its:domainRule selector="/h:html/h:body" ... >>>>> CL>> domainMapping="FIN, 'A A-1 A1-A1X'"/>. >>>>> CL>> However, this would require that additional information would have >>>>> CL>> to be captured elsewhere (so that for example, the precedence >>>>> CL>> 'A > A-1 > A1-A1X' could be captured). >>>>> >>>>> ITS doesn't prescribe what the right part of the mapping must be or >>>>> how it should be used. >>>>> It's really just a way to allow user-defined mechanisms to be >>>>> connected to the input metadata. >>>>> I suppose it is also beyond the scope of ITS. >>>> As I understand Christian he does not ask to prescripe a mapping, but >>>> "to accomodate for heterogeneity": allow people to formulate their own >>>> mapping. >>>> >>>> I think we do that: we don't make the usage of the mapping attribute >>>> mandatory. It is an optional attribute. If "our" mapping algorithm >>>> doesn't respond to a specific mapping approach, everybody can implement >>>> his own mapping. >>>> >>>> This is similar to matching of language tags, see >>>> http://tools.ietf.org/html/rfc4647#section-3.2 >>>> "Language tag matching is a tool, and does not by itself specify a >>>> complete procedure for the use of language tags. Such procedures are >>>> intimately tied to the application protocol in which they occur." >>>> The matching specification itself makes clear that it there are many >>>> aspects that are left out for actually using language tags. But having >>>> no matching at all would be even less interoperability, hence the >>>> "imperfect" matching scheme. >>>> >>>> Best, >>>> >>>> Felix >>>> >>>>> cheers, >>>>> -yves >>>>>
Received on Thursday, 17 January 2013 17:26:32 UTC