- From: Felix Sasaki <fsasaki@w3.org>
- Date: Mon, 8 Oct 2012 22:41:09 +0200
- To: Dave Lewis <dave.lewis@cs.tcd.ie>
- Cc: "public-multilingualweb-lt@w3.org" <public-multilingualweb-lt@w3.org>
- Message-ID: <CAL58czpZ74EjMZSbi-GE-EhL=q3dZgXUMUF=PNamDnxnGjiArA@mail.gmail.com>
Hi Dave, all, Am Montag, 8. Oktober 2012 schrieb Dave Lewis : > Hi Felix, Yves, guys, > I think we need to look at this on a case by case basis - hence this > slightly long post. > > The data categories that Felix identifies for consideration are all > essentially providential, i.e. that are recording the outcomes of some > process on the textual content of the document. So they are most likely to > be applied with local rules (often requiring a span insertion). > > However, in some cases global rules may be useful for applying rules to > several nodes at once. For instance using translation provenance revision > agent global rule to say all nodes with class="legal" were postedited by > translator A and all with class="technical" by translator B. Exceptions > (e.g. due to re-postediting) can then be easily implemented by local > selection overrides. > > However, this provenance characteristic means they are more likely to be > annotating a static document, i.e. a document submitted to the localsiation > workflow (including disambiguation on the source). Therefore the ability of > global rules to provide a convenient way of annotating sets of nodes with a > single ITS declaration is not so vulnerable to changes in the document > structure that renders the selectors inaccurate. > > In contrast the internationalsiation/instructional data categories > (Domain, LocaleFilter, external resource, target pointer, preserve space, > allowed characters, storage size, id value) are applied in a setting where > the document may still be under structural and content change - so using > global rule to assign value to specific sets of nodes is not good practice > there. > > I can see the use of global rules for convenient node selection (and for > attribute selection as Yves points out) as being useful therefore for > QualityIssue, Quality Precis, transRevisionAgentProveance, > transAgentProvenance and standoff provenance. > > I don't see global rules as useful in this respect for disambiguation, > text analysis and mtconfidence score. These will always be applied to > specific terms or segments, and may involve values generated with some > context awareness so the same words/phrases won't have the same values, and > also will often not have an existing selectable element enclosing it - > making global selectors less useful (apart form perhaps the XLIFF case). > > I find the issue of using global rule for associating ITS semantics with > existing mark-up a bit more difficult to ponder, as my knowledge of that > existing mark-up is perhaps complete. However, I'd observe that for these > provenance related data categories, they are applied in processes within > LSP or clients expecting results from LSPs. So with the exception of XLIFF > cases, rather than adding complexity of global rules, we could take a more > assertive stance that implementations should adopt ITS, rather than support > co-existence with other (as yet unknown) mark-up. though i don't > necessarily advocate this, I'd point out that it might have more chance of > success than for the 'internationalization' data categories, where we have > to deal with variety and volatility in mark-up caused by people who won't > be easily persuaded by pleas for standardization to support downstream > localization processes. > > The result of this line of reasoning would be to actually _remove_ the > pointer attributes from global rules _and_ local selectors for: > QualityIssue, Quality Precis, transRevisionAgentProveance, > transAgentProvenance and standoff provenance, as well as for > disambiguation, text analysis and mtconfidence - even while we otherwise > keep global rules for the first group to support convenient node selection. > Just for clarification: what do you mean by "local selectors" in "_remove_ the pointer attributes from global rules _and_ local selectors"? Also, wrt to "support convenient node selection": for e.g. "quality issue", we currently have this example for a global rule: <its:locQualityIssueRule selector="//span[@id='q1']" locQualityIssueType="typographical" locQualityIssueComent="Sentence without capitalization" locQualityIssueSeverity="50"/> Is it really convenient to have such a rule? You will need masses of them, for each "id" attribute value one. Who would ever write such rules? About your argument about having global rules: "For instance using translation provenance revision agent global rule to say all nodes with class="legal" were postedited by translator A and all with class="technical" by translator B. Exceptions (e.g. due to re-postediting) can then be easily implemented by local selection overrides. " Is it realistic that the chunks that are postedited are easily identified by certain class attributes? If yes then the "chunk" mechanism has a use case - but you could also have editing support that automatically inserts the related local markup in the chunk touched. My main point is maybe: will we get people to write such rules, except the "per document format" case Yves' mentioned? It is nice to have the mechanism, but global rules writing is a huge effort and needs quite some knowledge about XPath etc. Wrt to ITS and XLIFF, you wrote "So with the exception of XLIFF cases": that case meant global rules + pointers, no? Or how would you realize XLIFF+ITS then? Best, Felix > Apologies for the long post, > Dave > > > > > > > On 04/10/2012 12:42, Felix Sasaki wrote: > > Hi Yves, all again ... > > I thought about this again, esp. your argument "The capability to map > existing markup in other vocabulary that have the same functionality as the > ITS data categories". > > For ITS 1.0, we specified best practices for such mappings, e.g. XHTML > http://www.w3.org/TR/2006/WD-xml-i18n-bp-20060518/#its-plus-xhtml10 > > http://www.w3.org/TR/2006/WD-xml-i18n-bp-20060518/EX-relating-its-plus-xhtml-1.xml > > However, we only used data categories with small & fixed sets of values: > translate, term, dir, withintext. > Now, for the data categories we are discussing here, mostly there are open > value sets. Only QualityIssue has "qualityIssueType" and Disambiguation > "disambigType", but these are not really useful without other, "open" value > fields / attributes. > > So it seems that mapping to existing vocabulares with global rules for > QualityIssue, Quality Precis, Disambiguation, mtConfidence, text analysis > annotation, provenance > only makes sense with pointer attributes, like here > > http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#EX-locQualityIssue-global-2 > > So there may be the options: > 1) remove the non pointer attributes, e.g. "locQualitIssueyComment", and > keep the pointer attributes, e.g. "locQualitIssueyCommentPointer" > 2) keep everything as is > 3) remove global rules completely. > > Thoughts? At the end the question is: who would implement 1) or 2), or > is everybody happy with 3)? > > Felix > > > 2012/10/2 Felix Sasaki <fsasaki@w3.org> > > Hi Yves, all, > > good points. It seems that both aspects may be interrelated. You could > resolve the "title" attribute issue by first converting HTML to XLIFF. Here > you make use of the mtConfidence score at a "target" element (I think). > Of course using ITS mtConfidence score at XLIFF "target" doesn't work > (your second point). But maybe that's no important use case, only for > locQualityIssueRef? > > About "we don't care" ... it seems that the main format for adding this > metadata is XLIFF. So there is no need to have an XPath based mechanism to > accomodate many formats, but a mapping between ITS metadata to XLIFF. I > table could do too ... > > Felix > > > 2012/10/2 Yves Savourel <ysavourel@enlaso.com> > > Hi Felix, all, > > > > That make sense for the information part: all the data categories of the > second list basically need to be specified locally to be truly useable. > > > > < > > -- Felix Sasaki DFKI / W3C Fellow
Received on Monday, 8 October 2012 20:41:34 UTC