- From: Lieske, Christian <christian.lieske@sap.com>
- Date: Mon, 21 Jan 2013 10:30:05 +0100
- To: Felix Sasaki <fsasaki@w3.org>
- CC: "joerg@bioloom.de" <joerg@bioloom.de>, "public-multilingualweb-lt-comments@w3.org" <public-multilingualweb-lt-comments@w3.org>
- Message-ID: <8EA44C66E2911C4AB21558F4720695DC60D82196B6@DEWDFECCR01.wdf.sap.corp>
Dear Felix, all,
Please find some thoughts on Felix’ reply below.
Best regards,
Christian
From: Felix Sasaki [mailto:fsasaki@w3.org]
Sent: Samstag, 19. Januar 2013 08:51
To: Lieske, Christian
Cc: joerg@bioloom.de; public-multilingualweb-lt-comments@w3.org
Subject: Re: [Issue-75] - Domain
Hi Christian, all,
this is still a personal response, but from what you write below:
[I understand your point. I guess that slightly different assumptions/views on MT-related processes exist. The Uses Cases above from my point of view all pertain to “single engine” scenarios. ]
I think you express that the current formulation of "Domain" is useful for some MT related processes, but not for all.
CL>> That’s correct.
So I'm inclined to reject the comment as a "new feature request to address new usage scenarios", for reasons and consequences (see "later" tracker product) among others of timing, see also
http://lists.w3.org/Archives/Public/public-multilingualweb-lt-comments/2013Jan/0124.html
CL>> I could understand this. In order to make readers of the ITS 2.0 document aware of the “single engine” focus/restriction, one could consider an informative note.
Best,
Felix
Am 18.01.13 16:50, schrieb Lieske, Christian:
Hi Felix, Jörg, all,
Please find some my thoughts (CL>CL>) on the reply below.
Cheers,
Christian
From: Felix Sasaki [mailto:fsasaki@w3.org]
Sent: Donnerstag, 17. Januar 2013 18:26
To: Lieske, Christian
Cc: joerg@bioloom.de<mailto:joerg@bioloom.de>; public-multilingualweb-lt-comments@w3.org<mailto:public-multilingualweb-lt-comments@w3.org>
Subject: Re: [Issue-75] - Domain
Hi Christian, Jörg, all,
co-chair hat on: I think the idea of "adding domain information" is clear, and Pablo said it could be useful for his customer, and Yves said it could be useful for XLIFF mapping.
http://lists.w3.org/Archives/Public/public-multilingualweb-lt-comments/2013Jan/0053.html
http://lists.w3.org/Archives/Public/public-multilingualweb-lt-comments/2013Jan/0059.html
So we can move this topic to the next stage: who from the implementers for domain
http://htmlpreview.github.com/?https://raw.github.com/finnle/ITS-2.0-Testsuite/master/its2.0/testSuiteDashboard.html
would implement local domain, and who thinks (this question is important too) that this is worth a delay?
Co-chair hat of, and replying to your proposal at
http://lists.w3.org/Archives/Public/public-multilingualweb-lt-comments/2013Jan/0087.html
(replying here so that we have only one thread)
[
CL>>>>> I understand the point. My suggestion would be to refine the requirement for the revised domainMapping that I sketched: the information about the target environment/engine is optional.
CL>>>>> Thus, you could have the following:
CL>>>>> <its:domainRule ...
CL>>>>> domainMapping=
CL>>>>> 'MT-engine-X,"automotive auto, medical medicine, 'criminal law' law, 'property law' law"',
CL>>>>> 'TM-system-Y,"automotive X, 'criminal law' L, 'property law' law"'
CL>>>>> "automotive Z, 'criminal law' C, 'property law' law"' <---- here is the change (no info about the target environment/engine)
CL>>>>> />
CL>>>>>
CL>>>>> Aside: I am a bit unsure how realistic the scenario "specify domainMapping without knowing the engine/environment" is.
]
Making the engine information optional doesn't solve the problem I described:
- domainMapping expresses "choose MT-engine-X"
CL>CL> This is not what I had in mind as semantics for the first parameter of a list item in the revised “domainMapping”. To me, the semantics was “If you pass through MT-engine-X, then work with the following domain information”.
- it also expresses "map the domain 'automotive' to 'auto'
- later in the workflow there are several engines available: MT-engine-X, MT-engine-Y
- only MT-engine-Y knows about 'auto', so the "choose MT-engine-X" information from domainMapping disturbes the workflow
Wrt to 'I am a bit unsure how realistic the scenario "specify domainMapping without knowing the engine/environment" is. ': so far it was helpful for starting work on three implementations (if I count correctly) using domain information in MT workflows. See
http://www.w3.org/International/multilingualweb/lt/wiki/Use_cases_-_high_level_summary#Simple_Machine_Translation
http://www.w3.org/International/multilingualweb/lt/wiki/Use_cases_-_high_level_summary#Online_MT_System_Internationalization
http://www.w3.org/International/multilingualweb/lt/wiki/Use_cases_-_high_level_summary#Simple_Segmente_Machine_Translation
It even has a benefit not to specify the engine: content can be prepared for processing of all these services. Since there is no need to acomodate "engine" information, the content can choose freely which engine works best - based purely on domain information.
CL>CL> I understand your point. I guess that slightly different assumptions/views on MT-related processes exist. The Uses Cases above from my point of view all pertain to “single engine” scenarios.
CL>CL> In this kind of scenario it is not really necessary to provide information “this is for engine X”. In a “multi-engine” scenarios, the situation is different. In order to see why, one first needs to
CL>CL> acknowledge that at least two flavors of “multi-engine” scenarios exist: multi-engine in pipeline (e.g. first X, then for anything below a confidence of 0.5 Y) vs. multi-engine exclusive (e.g. X for domain “financials”,
CL>CL> Y for domain “health”. In both scenarios, you need a mechanism to specify which domain information is for engine X, and which is for engine Y.
So my questions to you, Christian, and to at least above three implementers would be: do you see implementers processing domain, who would be willing to contribute to testing the engine information? If not (again co-chair hat on) we don't have a use case on the group, it seems, and can't bring such a feature through the standardization process.
Best,
Felix
Am 17.01.13 16:07, schrieb Lieske, Christian:
Hi Jörg, Felix, all,
Unfortunately, I still don't understand, the current draft doesn't have provisions for
CL>> Global: <its:domainRule selector="/h:html/h:body" its-domain="financials">
CL>> Local: <em its-domain="financials">IMF</em>
If we don't have these provisions, we may end up with the messy situation/solution that Jörg sketches.
Cheers,
Christian
-----Original Message-----
From: Jörg Schütz [mailto:joerg@bioloom.de]
Sent: Mittwoch, 16. Januar 2013 15:28
To: public-multilingualweb-lt-comments@w3.org<mailto:public-multilingualweb-lt-comments@w3.org>
Cc: public-multilingualweb-lt-comments@w3.org<mailto:public-multilingualweb-lt-comments@w3.org>
Subject: Re: [Issue-75] - Domain
Hi Felix, Christian, and all,
ITS should not be hijacked to take over the role of a workflow engine or
similar application because there might be several consumers of ITS information...
@Christian > [Could you provide one or two examples/proofs for this?]
Here is an outline of my idea (which potentially also hijacks ITS to
some extend):
Possible ITS Application Scenario to Extend the "Domain" Data Category
(1) Use (general) domain pointing for the broad classification of your
content (global reach), i.e. employ the domain data categroy.
(2) In cases where (1) is either too general (broad), or you want to
further classify only parts of your content (local reach), use the
disambiguation data category. This includes the further classifying of a
sequence of strings which do not represent what usually is called a term
(domain-specific vocabulary) or a multi-word unit (mwu).
(3) For the term and mwu case use the terminology data category.
Case (3) is applied as described in the ITS 2.0 specification; always
consider to link to an appropriate authoritative internal or external
terminology resource or ontology (e.g. Cyc, Snomed, MeSH, etc.) on which
both producer and consumer have agreed upon (in this sense ITS is also
part of a contract).
In this scenario, case (2) is a bit trickier because "officially"
disambiguation is also applied to meaningful string sequences, i.e. a
word or a mwu, as in the terminology case, but now we extend this data
category to arbitary elements, for example an entire paragraph, with the
restriction that the attributes disambigConfidence and particularly
disambigGranularity have a broader meaning such as the conceptual
association to a domain's root element or to certain upper model elements.
HTML Example (local)
...
<p><span its-disambig-confidence="0.9"
its-disambig-class-ref="http://snowowl.sample.com/SNOMED_CT_Concept/Pharmaceutical_Product"<http://snowowl.sample.com/SNOMED_CT_Concept/Pharmaceutical_Product>>
Ambroxol has mucolytic and local-anaesthetic pharmacological effects
</span>.
</p>
...
Note: In this example, only the disambigClassRef attribute is used to
account for the "broader" employment of the data category.
This use case scenario might sound like a bootstrap paradox... but this
is one possibility of using ITS 2.0 ... ;-)
All the best -- Jörg
On Jan 16, 2013, at 14:23 (CET), Felix Sasaki wrote:
Am 16.01.13 12:15, schrieb Lieske, Christian:
Hi Felix, Pablo, all,
Please find some my thoughts on the reply below.
Cheers,
Christian
-----Original Message-----
From: Felix Sasaki [mailto:fsasaki@w3.org]
Sent: Mittwoch, 16. Januar 2013 08:07
To: Pablo Nieto Caride
Cc: Lieske, Christian; public-multilingualweb-lt-comments@w3.org<mailto:public-multilingualweb-lt-comments@w3.org>
Subject: Re: [Issue-75] - Domain
(trying to minimize the number of mails, hence replying to several
aspects in this mail)
Hi Christian, Pablo, all,
at Christian: you write at
http://lists.w3.org/Archives/Public/public-multilingualweb-lt-comments/2013Jan/0034.html
that 2b of your comment is resolved. How about 2a? If you are not
satisfied with the replies in this thread, could you propose a change to
the spec?
CL>> Currently, I consider 2a as being unresolved.
CL>> Addressing 2a (capture the information "This is for component X")
to me does not appear to be straightforward, since
CL>> you would need to accommodate an addition piece of information.
One could imagine representations such as
CL>> <its:domainRule ...
CL>> domainMapping=
CL>> 'MT-engine-X,"automotive auto, medical medicine,
'criminal law' law, 'property law' law"',
CL>> 'TM-system-Y,"automotive X, 'criminal law' L,
'property law' law"'
CL>> />
Such a specification of the engine could lead to conflicting information:
MT-engine-X has a module for automotive. If however the engine is not
mentioned in a domain mapping, but a different one (which does not have
the automotive module): which one to choose?
It looks like what you add as information (= choosing the engine) is
something one would do after the domain mapping, not at the same time.
Otherwise you may run into the conflict described above.
CL>> This, however, is not in line with the current normative text on
"domain".
Wrt to your proposal below (add a note about 2b to the spec): sure, do
you want to draft something? The same for 2a (if you don't have a
specific solution in mind, stating the issue might already be helpful).
CL>> How about the following additional paragraph for the first note
in (http://www.w3.org/TR/2012/WD-its20-20121206/#domain) for 2b?
CL>>
CL>> "domainMapping" even allows "domain" systems/hierarchies to be
encoded. domainMapping="FIN, 'A A-1 A-1-X'" could for example be used
to capture the following information:
Would it be OK to re-formulate that sentence above like this:
[
the domainMapping attribute does not itself specify how to encode
"domain" systems/hierachies. An application using domainMapping hence is
free to work with application specific hierarchies to capture
information like:
]
It seems this is more in line with the language tag example: it is
saying that applications can do things that are on purpose underspecified.
CL>> a. There exists a domain system that includes domains (e.g. A),
sub-domains (e.g. A-1), and sub-subdomains (e.g. A-1-X)
CL>> b. Prefer the lowest level in the system (e.g. work with an MT
engine for A-1-X if available, otherwise work with one for A-1 or even
A if available)
CL>>
CL>> This "power to encode and to interpret" is similar to matching of
language tags, see http://tools.ietf.org/html/rfc4647#section-3.2.
CL>> "Language tag matching is a tool, and does not by itself specify
a complete procedure for the use of language tags ...
CL>> The matching specification itself makes clear that it there are many
CL>> aspects that are left out for actually using language tags. But
having no matching at all would be even less interoperability, hence
the "imperfect" matching scheme.
Best,
Felix
Wrt to 1 (local domain): would this also be relevant for other
implementers of domain (asking again)?
About this one: we have Pablo and Yves saying in separate mails this
might be of interest - enough to get through the w3c process. But is it
worth another last call period?
Best,
Felix
Best,
Felix
Am 15.01.13 19:32, schrieb Pablo Nieto Caride:
Hi all,
Felix, I think that a local domain could be interesting, at least WP4
client would be happy with that, I don't know what the others think.
Christian, regarding the domain mapping I think that Yves and Felix
are right, you can implement your own mapping, you can adapt it to
specific MT if you want, as for the example <its:domainRule
selector="/h:html/h:body" ... domainMapping="FIN, 'A A-1 A1-A1X'"/>,
I certain MT Systems can manage the precedence by themselves.
Cheers,
Pablo.
Hi,
I wonder if it would be good idea to add the scenario I have provided
(domain "system") and Felix' information on how to approach it
(namely similar to language tag matching) to one of the "notes" that
currently are in place for in the "domain" section.
Best regards,
Christian
-----Original Message-----
From: christian.lieske@sap.com<mailto:christian.lieske@sap.com>
Sent: Dienstag, 15. Januar 2013 08:10
To: 'Felix Sasaki'; public-multilingualweb-lt-comments@w3.org<mailto:public-multilingualweb-lt-comments@w3.org>
Subject: RE: [Issue-75] - Domain
Hi Felix,
I follow your line of thought related to the similarities between
"domainMapping" and matching of language tags. Thus, it would be OK
for me to consider 2.b of
http://lists.w3.org/Archives/Public/public-multilingualweb-lt-comments/2013Jan/0022.html
closed.
Cheers,
Christian
-----Original Message-----
From: Felix Sasaki [mailto:fsasaki@w3.org]
Sent: Montag, 14. Januar 2013 19:27
To: public-multilingualweb-lt-comments@w3.org<mailto:public-multilingualweb-lt-comments@w3.org>
Subject: Re: [Issue-75] - Domain
Hi Christian, Yves, all,
Am 14.01.13 16:52, schrieb Yves Savourel:
Hi Christian, all,
CL>> It seems as if I didn't manage to my point about this aspect of
"domain" is clear.
CL>> Let me to try to provide a remedy by adding to my original
comment:
CL>> Something like its-domain="financials" could not just be imagined
CL>>to work in a global rule (e.g. instead of a pointer); in
addition, a local use of "domain"
CL>> could be imagined
CL>> Global: <its:domainRule selector="/h:html/h:body"
its-domain="financials">
CL>> Local: <em its-domain="financials">IMF</em>
So (If I'm getting this right) you'd like a way to override the
domain for spans of content? (Since the Dublin Core in HTML doesn't
let you do that (the subject is define at the document level)).
I think one of the reasons I hear early on was that today it would
be difficult to make that distinction at the MT level. But I suppose
MT engine selection is not the only application for domain. Maybe
others have additional reason why we don't have a local domain?
Given the implementation driven approach we have made so far I would
ask: is there an implementation on the horizon that would process
local domain?
CL>> Why do you think that the scenario that I sketch (multiply domain
CL>> "systems" used in a processing chain) implies that a standard
exists?
CL>> I would rather think that the implication is the other way round:
CL>> Since there is no standard, there is a need to accommodate
heterogeneity.
I agree, but so far that has not been part of the scope of ITS.
CL>> I guess your point is valid in the sense that one could go for
CL>> something like <its:domainRule selector="/h:html/h:body" ...
CL>> domainMapping="FIN, 'A A-1 A1-A1X'"/>.
CL>> However, this would require that additional information would have
CL>> to be captured elsewhere (so that for example, the precedence
CL>> 'A > A-1 > A1-A1X' could be captured).
ITS doesn't prescribe what the right part of the mapping must be or
how it should be used.
It's really just a way to allow user-defined mechanisms to be
connected to the input metadata.
I suppose it is also beyond the scope of ITS.
As I understand Christian he does not ask to prescripe a mapping, but
"to accomodate for heterogeneity": allow people to formulate their own
mapping.
I think we do that: we don't make the usage of the mapping attribute
mandatory. It is an optional attribute. If "our" mapping algorithm
doesn't respond to a specific mapping approach, everybody can implement
his own mapping.
This is similar to matching of language tags, see
http://tools.ietf.org/html/rfc4647#section-3.2
"Language tag matching is a tool, and does not by itself specify a
complete procedure for the use of language tags. Such procedures are
intimately tied to the application protocol in which they occur."
The matching specification itself makes clear that it there are many
aspects that are left out for actually using language tags. But having
no matching at all would be even less interoperability, hence the
"imperfect" matching scheme.
Best,
Felix
cheers,
-yves
Received on Monday, 21 January 2013 09:30:36 UTC