- From: Felix Sasaki <fsasaki@w3.org>
- Date: Fri, 21 Sep 2012 10:28:16 +0200
- To: public-multilingualweb-lt@w3.org
- Message-ID: <CAL58czqAQLyUqrusRM0_Ov=MyxpSktaoJw5hjUi-1W9-Av8zfQ@mail.gmail.com>
Hi all again, to give a concrete example for the envisaged solution with a separate tool block, I have created examples. The attached zip file contains - tool-info.xsd: OLIF XML Schema fragment for specifying tool information - tool-info-its.xsd: a modification of tool-info.xsd. To be able to make use of existing tool description, I renamed elements so that they are not specific to terminology extraction and left out various elements, and moved the "WorkflowProcessInfo" as part of the tool info. - xml.xsd: needed for XML attributes - an example file tool-its-info-example.xml which contains tool specifications for Enrycher, two MT tools and Okapi. All tool specifications allow for identifying the relevant data categories. In that way it becomes explicit that e.g. a certain MT tool is relevant for mt-confidence. - the tool specifications have "id" attributes, e.g. "t-2" for "bing" translator. Yves' requirement of referring to tool info from a piece of XLIFF could be realized by referring to the ID attribute. - If we want to refer to this from HTML inline, we could have again an its-* block with span element, e.g. <span its-tool-name="enrycher">...</span>. Comments welcome. Above can also serve as a way to do action-194 "Work on issue-42, provide examples and template for various data categories". Best, Felix 2012/9/21 Felix Sasaki <fsasaki@w3.org> > One correction to the below: > > > "The question is maybe: do you need XPath to specify that in XLIFF, and > would it be OK for this information to be orthogonal? If not, there is no > need to influence selection precedence." > > This should have been "If this is the case, there is no need to influence > selection precedence." > > Also, apologies for repeating the "orthogonal proposal" so often :) > > Best, > > Felix > > > 2012/9/21 Felix Sasaki <fsasaki@w3.org> > >> Hi Yves, Dave, >> >> thanks for your feedback. We are probably stuck, since I still won't >> agree with partial overriding and also not with combining data categories. >> A few comments below. >> >> 2012/9/21 Yves Savourel <ysavourel@enlaso.com> >> >>> Hi Felix, all, >>> >>> The solution of some tool information being declared outside the usual >>> selector mechanism may work for one set of instances of a given data >>> category, but not several. For example: >>> >>> In an XLIFF document I'd like to use the its:mtConfidenceScore attribute >>> for each <m:match> element that holds a translation candidate for an entry >>> (This is a likely real-life case, not just a random example). You can have >>> multiple matches per entry, and they are very likely to be from different >>> engines. Having a document-level tool information does not work. In this >>> case we do need to have the tool information per entry. >>> >> >> Tool information is really orthogonal to data categories IMO - for the >> content itself (like in MT) and for ITS annotations you may want to >> express: who produced this? For ITS annotations, this is urgently needed >> for disambiguation. But actually each data category can be produced by a >> tool, and it would be useful to capture that information in an orthogonal >> manner IMO. >> >> The question is maybe: do you need XPath to specify that in XLIFF, and >> would it be OK for this information to be orthogonal? If not, there is no >> need to influence selection precedence. >> >> >>> >>> >>> Looking again at the "partial override" solution I don't think pointers >>> are a problem. They just tell where to get the information to apply to the >>> node, as far as overriding goes it's no different than setting directly the >>> information. >>> >> >> >> We can continue the discussion on partial override, but I would suggest >> to stop it. I can "promise" - as said before - that I would (formally) >> object against this, and this is very unlikely to change. The backwards >> compatibility, the ambiguity wrt the intention of the data category author >> (e.g. "is the 'alert' type of a locnote intended or not?"), and esp. the >> constraints about pieces of information are an issue. Such constraints are >> a different beast than pointers or standoff markup. With constraints I mean >> what we say e.g. with loc quality issues: "exactly one of the following, >> none or one of the following" or in other areas we say "optionally". With >> these mutually exclusive and other options the complexity rises: if there >> are two mutually exclusive items, one at a node and one inherited, which >> one takes precedence? Sure there can be answers, but these are data >> category specific and much more complex than "if a value doesn't exist on a >> node take the inherited one". >> >> With partial inheritance I would need to re-engineer my implementation, >> and very likely I wouldn't do that but rather drop the implementation >> completely. Even if that is no theoretical issue as you had mentioned in a >> mail before, I think it is a valid concern. >> >> So my proposal to move forward would be to re-iterate the orthogonal >> character of tool information: it seems tool information is really the only >> case there the partial overriding or the data category combination (see >> comment below) have a strong case. >> >> Wrt to Dave's proposal from >> >> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Sep/0124.html >> >> [ >> >> However, if we relax this assumption in a controlled way we can simply >> avoid partial override by designing certain data categories to be used in >> _combination_ (the subtle difference to a single data category >> being'compound'). In this event we can then either just live with the fact >> that there may be one or to data categories that may impart conformance >> individually even though they are not useful by themselves, or we add them >> as a specific exclusion to the single-data-category-for-conformance rules. >> ] >> >> In my experience, if you introduce a feature once (combination of data >> categories), people will develop use case over time to have it in other >> areas to. So that will influence conformance a lot and in essence destroy a >> basic design principle from ITS 1.0: "It adopts the use of data categories >> to define *discrete* units of functionality". >> >> So we are probably stuck. >> >> My proposal to move forward would be to see if we agree on the orthogonal >> character of the tool information and define it for all data categories, as >> a separate piece of information and with the effect of a separate >> conformance clause, but with no effect on selection mechanisms. If we have >> that agreement we can explore how to accomodate Yves' requirement to attach >> the information not to a whole document, but to parts of it. Maybe even >> XPath is not needed for that. At least that is what Declan and Tadej said >> for mtConfidence and disambiguation on the call. >> >> Best, >> >> Felix >> >> >> >>> The case of the stand-off markup is specific (so far) to Localization >>> Quality Issue. I haven't thought yet about what that implies for the >>> "partial override" but it's likely that there are ways to specify what is >>> done in those cases. >>> The bottom line is that all those local/global/standoff attributes >>> specify information and are applies in a given order: we've got to be able >>> to know if the information ABC exists or not when we apply the next rule, >>> and therefore be able to keep the current value or override it depending on >>> whether the next rule re-define that information or not. >>> >>> Cheers, >>> -ys >>> >>> >>> >>> >>> >> >> >> -- >> Felix Sasaki >> DFKI / W3C Fellow >> >> > > > -- > Felix Sasaki > DFKI / W3C Fellow > > -- Felix Sasaki DFKI / W3C Fellow
Attachments
- application/zip attachment: tool-its-info.zip
Received on Friday, 21 September 2012 08:28:41 UTC