Re: [Issue-75] - Domain

Hi Christian, Jörg, all,

co-chair hat on: I think the idea of "adding domain information" is 
clear, and Pablo said it could be useful for his customer, and Yves said 
it could be useful for XLIFF mapping.
http://lists.w3.org/Archives/Public/public-multilingualweb-lt-comments/2013Jan/0053.html
http://lists.w3.org/Archives/Public/public-multilingualweb-lt-comments/2013Jan/0059.html
So we can move this topic to the next stage: who from the implementers 
for domain
http://htmlpreview.github.com/?https://raw.github.com/finnle/ITS-2.0-Testsuite/master/its2.0/testSuiteDashboard.html
would implement local domain, and who thinks (this question is important 
too) that this is worth a delay?

Co-chair hat of, and replying to your proposal at
http://lists.w3.org/Archives/Public/public-multilingualweb-lt-comments/2013Jan/0087.html
(replying here so that we have only one thread)

[

CL>>>>> I understand the point. My suggestion would be to refine the requirement for the revised domainMapping that I sketched: the information about the target environment/engine is optional.
CL>>>>> Thus, you could have the following:
CL>>>>> <its:domainRule ...
CL>>>>>  domainMapping=
CL>>>>>   'MT-engine-X,"automotive auto, medical medicine, 'criminal law' law, 'property law' law"',
CL>>>>>    'TM-system-Y,"automotive X, 'criminal law' L, 'property law' law"'
CL>>>>>   "automotive Z, 'criminal law' C, 'property law' law"'  <---- here is the change (no info about the target environment/engine)
CL>>>>> />
CL>>>>> 
CL>>>>> Aside: I am a bit unsure how realistic the scenario "specify domainMapping without knowing the engine/environment" is.

]

Making the engine information optional doesn't solve the problem I 
described:
- domainMapping expresses "choose MT-engine-X"
- it also expresses "map the domain 'automotive' to 'auto'
- later in the workflow there are several engines available: 
MT-engine-X, MT-engine-Y
- only MT-engine-Y knows about 'auto', so the "choose MT-engine-X" 
information from domainMapping disturbes the workflow

Wrt to 'I am a bit unsure how realistic the scenario "specify 
domainMapping without knowing the engine/environment" is. ': so far it 
was helpful for starting work on three implementations (if I count 
correctly) using domain information in MT workflows. See

http://www.w3.org/International/multilingualweb/lt/wiki/Use_cases_-_high_level_summary#Simple_Machine_Translation
http://www.w3.org/International/multilingualweb/lt/wiki/Use_cases_-_high_level_summary#Online_MT_System_Internationalization
http://www.w3.org/International/multilingualweb/lt/wiki/Use_cases_-_high_level_summary#Simple_Segmente_Machine_Translation

It even has a benefit not to specify the engine: content can be prepared 
for processing of all these services. Since there is no need to 
acomodate "engine" information, the content can choose freely which 
engine works best - based purely on domain information.

So my questions to you, Christian, and to at least above three 
implementers would be: do you see implementers processing domain, who 
would be willing to contribute to testing the engine information? If not 
(again co-chair hat on) we don't have a use case on the group, it seems, 
and can't bring such a feature through the standardization process.

Best,

Felix

Am 17.01.13 16:07, schrieb Lieske, Christian:
> Hi Jörg, Felix, all,
>
> Unfortunately, I still don't understand, the current draft doesn't have provisions for
>
> CL>>    Global: <its:domainRule selector="/h:html/h:body" its-domain="financials">
> CL>>    Local: <em its-domain="financials">IMF</em>
>
> If we don't have these provisions, we may end up with the messy situation/solution that Jörg sketches.
>
> Cheers,
> Christian
>
> -----Original Message-----
> From: Jörg Schütz [mailto:joerg@bioloom.de]
> Sent: Mittwoch, 16. Januar 2013 15:28
> To: public-multilingualweb-lt-comments@w3.org
> Cc: public-multilingualweb-lt-comments@w3.org
> Subject: Re: [Issue-75] - Domain
>
> Hi Felix, Christian, and all,
>
> ITS should not be hijacked to take over the role of a workflow engine or
> similar application because there might be several consumers of ITS information...
>
> @Christian > [Could you provide one or two examples/proofs for this?]
>
> Here is an outline of my idea (which potentially also hijacks ITS to
> some extend):
>
> Possible ITS Application Scenario to Extend the "Domain" Data Category
>
> (1) Use (general) domain pointing for the broad classification of your
> content (global reach), i.e. employ the domain data categroy.
> (2) In cases where (1) is either too general (broad), or you want to
> further classify only parts of your content (local reach), use the
> disambiguation data category. This includes the further classifying of a
> sequence of strings which do not represent what usually is called a term
> (domain-specific vocabulary) or a multi-word unit (mwu).
> (3) For the term and mwu case use the terminology data category.
>
> Case (3) is applied as described in the ITS 2.0 specification; always
> consider to link to an appropriate authoritative internal or external
> terminology resource or ontology (e.g. Cyc, Snomed, MeSH, etc.) on which
> both producer and consumer have agreed upon (in this sense ITS is also
> part of a contract).
>
> In this scenario, case (2) is a bit trickier because "officially"
> disambiguation is also applied to meaningful string sequences, i.e. a
> word or a mwu, as in the terminology case, but now we extend this data
> category to arbitary elements, for example an entire paragraph, with the
> restriction that the attributes disambigConfidence and particularly
> disambigGranularity have a broader meaning such as the conceptual
> association to a domain's root element or to certain upper model elements.
>
> HTML Example (local)
> ...
> <p><span its-disambig-confidence="0.9"
>   
> its-disambig-class-ref="http://snowowl.sample.com/SNOMED_CT_Concept/Pharmaceutical_Product">
>      Ambroxol has mucolytic and local-anaesthetic pharmacological effects
>      </span>.
> </p>
> ...
>
> Note: In this example, only the disambigClassRef attribute is used to
> account for the "broader" employment of the data category.
>
> This use case scenario might sound like a bootstrap paradox... but this
> is one possibility of using ITS 2.0 ... ;-)
>
> All the best -- Jörg
>
> On Jan 16, 2013, at 14:23 (CET), Felix Sasaki wrote:
>> Am 16.01.13 12:15, schrieb Lieske, Christian:
>>> Hi Felix, Pablo, all,
>>>
>>> Please find some my thoughts on the reply below.
>>>
>>> Cheers,
>>> Christian
>>>
>>> -----Original Message-----
>>> From: Felix Sasaki [mailto:fsasaki@w3.org]
>>> Sent: Mittwoch, 16. Januar 2013 08:07
>>> To: Pablo Nieto Caride
>>> Cc: Lieske, Christian; public-multilingualweb-lt-comments@w3.org
>>> Subject: Re: [Issue-75] - Domain
>>>
>>> (trying to minimize the number of mails, hence replying to several
>>> aspects in this mail)
>>>
>>> Hi Christian, Pablo, all,
>>>
>>> at Christian: you write at
>>> http://lists.w3.org/Archives/Public/public-multilingualweb-lt-comments/2013Jan/0034.html
>>>
>>> that 2b of your comment is resolved. How about 2a? If you are not
>>> satisfied with the replies in this thread, could you propose a change to
>>> the spec?
>>>
>>> CL>> Currently, I consider 2a as being unresolved.
>>> CL>> Addressing 2a (capture the information "This is for component X")
>>> to me does not appear to be straightforward, since
>>> CL>> you would need to accommodate an addition piece of information.
>>> One could imagine representations such as
>>> CL>>     <its:domainRule ...
>>> CL>>        domainMapping=
>>> CL>>            'MT-engine-X,"automotive auto, medical medicine,
>>> 'criminal law' law, 'property law' law"',
>>> CL>>             'TM-system-Y,"automotive X, 'criminal law' L,
>>> 'property law' law"'
>>> CL>>      />
>> Such a specification of the engine could lead to conflicting information:
>> MT-engine-X has a module for automotive. If however the engine is not
>> mentioned in a domain mapping, but a different one (which does not have
>> the automotive module): which one to choose?
>> It looks like what you add as information (= choosing the engine) is
>> something one would do after the domain mapping, not at the same time.
>> Otherwise you may run into the conflict described above.
>>
>>> CL>> This, however, is not in line with the current normative text on
>>> "domain".
>>>
>>> Wrt to your proposal below (add a note about 2b to the spec): sure, do
>>> you want to draft something? The same for 2a (if you don't have a
>>> specific solution in mind, stating the issue might already be helpful).
>>>
>>> CL>> How about the following additional paragraph for the first note
>>> in (http://www.w3.org/TR/2012/WD-its20-20121206/#domain) for 2b?
>>> CL>>
>>> CL>> "domainMapping" even allows "domain" systems/hierarchies to be
>>> encoded. domainMapping="FIN, 'A A-1 A-1-X'" could for example be used
>>> to capture the following information:
>> Would it be OK to re-formulate that sentence above like this:
>> [
>> the domainMapping attribute does not itself specify how to encode
>> "domain" systems/hierachies. An application using domainMapping hence is
>> free to work with application specific hierarchies to capture
>> information like:
>> ]
>>
>> It seems this is more in line with the language tag example: it is
>> saying that applications can do things that are on purpose underspecified.
>>> CL>> a. There exists a domain system that includes domains (e.g. A),
>>> sub-domains (e.g. A-1), and sub-subdomains (e.g. A-1-X)
>>> CL>> b. Prefer the lowest level in the system (e.g. work with an MT
>>> engine for A-1-X if available, otherwise work with one for A-1 or even
>>> A if available)
>>> CL>>
>>> CL>> This "power to encode and to interpret" is similar to matching of
>>> language tags, see http://tools.ietf.org/html/rfc4647#section-3.2.
>>> CL>> "Language tag matching is a tool, and does not by itself specify
>>> a  complete procedure for the use of language tags ...
>>> CL>> The matching specification itself makes clear that it there are many
>>> CL>> aspects that are left out for actually using language tags. But
>>> having no matching at all would be even less interoperability, hence
>>> the "imperfect" matching scheme.
>> Best,
>>
>> Felix
>>
>>> Wrt to 1 (local domain): would this also be relevant for other
>>> implementers of domain (asking again)?
>> About this one: we have Pablo and Yves saying in separate mails this
>> might be of interest - enough to get through the w3c process. But is it
>> worth another last call period?
>>
>> Best,
>>
>> Felix
>>
>>> Best,
>>>
>>> Felix
>>>
>>> Am 15.01.13 19:32, schrieb Pablo Nieto Caride:
>>>> Hi all,
>>>>
>>>> Felix, I think that a local domain could be interesting, at least WP4
>>>> client would be happy with that, I don't know what the others think.
>>>>
>>>> Christian, regarding the domain mapping I think that Yves and Felix
>>>> are right, you can implement your own mapping, you can adapt it to
>>>> specific MT if you want, as for the example <its:domainRule
>>>> selector="/h:html/h:body" ... domainMapping="FIN, 'A A-1 A1-A1X'"/>,
>>>> I certain MT Systems can manage the precedence by themselves.
>>>>
>>>> Cheers,
>>>> Pablo.
>>>> Hi,
>>>>
>>>> I wonder if it would be good idea to add the scenario I have provided
>>>> (domain "system") and Felix' information on how to approach it
>>>> (namely similar to language tag matching) to one of the "notes" that
>>>> currently are in place for in the "domain" section.
>>>> Best regards,
>>>> Christian
>>>>
>>>> -----Original Message-----
>>>> From: christian.lieske@sap.com
>>>> Sent: Dienstag, 15. Januar 2013 08:10
>>>> To: 'Felix Sasaki'; public-multilingualweb-lt-comments@w3.org
>>>> Subject: RE: [Issue-75] - Domain
>>>>
>>>> Hi Felix,
>>>>
>>>> I follow your line of thought related to the similarities between
>>>> "domainMapping" and matching of language tags. Thus, it would be OK
>>>> for me to consider 2.b of
>>>> http://lists.w3.org/Archives/Public/public-multilingualweb-lt-comments/2013Jan/0022.html
>>>> closed.
>>>>
>>>> Cheers,
>>>> Christian
>>>>
>>>> -----Original Message-----
>>>> From: Felix Sasaki [mailto:fsasaki@w3.org]
>>>> Sent: Montag, 14. Januar 2013 19:27
>>>> To: public-multilingualweb-lt-comments@w3.org
>>>> Subject: Re: [Issue-75] - Domain
>>>>
>>>> Hi Christian, Yves, all,
>>>>
>>>> Am 14.01.13 16:52, schrieb Yves Savourel:
>>>>> Hi Christian, all,
>>>>>
>>>>>
>>>>> CL>> It seems as if I didn't manage to my point about this aspect of
>>>>> "domain" is clear.
>>>>> CL>> Let me to try to provide a remedy by adding to my original
>>>>> comment:
>>>>> CL>> Something like its-domain="financials" could not just be imagined
>>>>> CL>>to work in  a global rule (e.g. instead of a pointer); in
>>>>> addition, a local use of "domain"
>>>>> CL>> could be imagined
>>>>> CL>>    Global: <its:domainRule selector="/h:html/h:body"
>>>>> its-domain="financials">
>>>>> CL>>    Local: <em its-domain="financials">IMF</em>
>>>>>
>>>>> So (If I'm getting this right) you'd like a way to override the
>>>>> domain for spans of content? (Since the Dublin Core in HTML doesn't
>>>>> let you do that (the subject is define at the document level)).
>>>>>
>>>>> I think one of the reasons I hear early on was that today it would
>>>>> be difficult to make that distinction at the MT level. But I suppose
>>>>> MT engine selection is not the only application for domain. Maybe
>>>>> others have additional reason why we don't have a local domain?
>>>> Given the implementation driven approach we have made so far I would
>>>> ask: is there an implementation on the horizon that would process
>>>> local domain?
>>>>
>>>>> CL>> Why do you think that the scenario that I sketch (multiply domain
>>>>> CL>> "systems" used in a processing chain) implies that a standard
>>>>> exists?
>>>>> CL>> I would rather think that the implication is the other way round:
>>>>> CL>> Since there is no standard, there is a need to accommodate
>>>>> heterogeneity.
>>>>>
>>>>> I agree, but so far that has not been part of the scope of ITS.
>>>>>
>>>>>
>>>>> CL>> I guess your point is valid in the sense that one could go for
>>>>> CL>> something like <its:domainRule selector="/h:html/h:body" ...
>>>>> CL>> domainMapping="FIN, 'A A-1 A1-A1X'"/>.
>>>>> CL>> However, this would require that additional information would have
>>>>> CL>> to be captured elsewhere (so that for example, the precedence
>>>>> CL>> 'A > A-1 > A1-A1X' could be captured).
>>>>>
>>>>> ITS doesn't prescribe what the right part of the mapping must be or
>>>>> how it should be used.
>>>>> It's really just a way to allow user-defined mechanisms to be
>>>>> connected to the input metadata.
>>>>> I suppose it is also beyond the scope of ITS.
>>>> As I understand Christian he does not ask to prescripe a mapping, but
>>>> "to accomodate for heterogeneity": allow people to formulate their own
>>>> mapping.
>>>>
>>>> I think we do that: we don't make the usage of the mapping attribute
>>>> mandatory. It is an optional attribute. If "our" mapping algorithm
>>>> doesn't respond to a specific mapping approach, everybody can implement
>>>> his own mapping.
>>>>
>>>> This is similar to matching of language tags, see
>>>> http://tools.ietf.org/html/rfc4647#section-3.2
>>>> "Language tag matching is a tool, and does not by itself specify a
>>>> complete procedure for the use of language tags.  Such procedures are
>>>> intimately tied to the application protocol in which they occur."
>>>> The matching specification itself makes clear that it there are many
>>>> aspects that are left out for actually using language tags. But having
>>>> no matching at all would be even less interoperability, hence the
>>>> "imperfect" matching scheme.
>>>>
>>>> Best,
>>>>
>>>> Felix
>>>>
>>>>> cheers,
>>>>> -yves
>>>>>

Received on Thursday, 17 January 2013 17:26:32 UTC