- From: Dave Lewis <dave.lewis@cs.tcd.ie>
- Date: Mon, 08 Oct 2012 14:08:18 +0100
- To: public-multilingualweb-lt@w3.org
- Message-ID: <5072D042.4070006@cs.tcd.ie>
Hi Felix, Yves, guys, I think we need to look at this on a case by case basis - hence this slightly long post. The data categories that Felix identifies for consideration are all essentially providential, i.e. that are recording the outcomes of some process on the textual content of the document. So they are most likely to be applied with local rules (often requiring a span insertion). However, in some cases global rules may be useful for applying rules to several nodes at once. For instance using translation provenance revision agent global rule to say all nodes with class="legal" were postedited by translator A and all with class="technical" by translator B. Exceptions (e.g. due to re-postediting) can then be easily implemented by local selection overrides. However, this provenance characteristic means they are more likely to be annotating a static document, i.e. a document submitted to the localsiation workflow (including disambiguation on the source). Therefore the ability of global rules to provide a convenient way of annotating sets of nodes with a single ITS declaration is not so vulnerable to changes in the document structure that renders the selectors inaccurate. In contrast the internationalsiation/instructional data categories (Domain, LocaleFilter, external resource, target pointer, preserve space, allowed characters, storage size, id value) are applied in a setting where the document may still be under structural and content change - so using global rule to assign value to specific sets of nodes is not good practice there. I can see the use of global rules for convenient node selection (and for attribute selection as Yves points out) as being useful therefore for QualityIssue, Quality Precis, transRevisionAgentProveance, transAgentProvenance and standoff provenance. I don't see global rules as useful in this respect for disambiguation, text analysis and mtconfidence score. These will always be applied to specific terms or segments, and may involve values generated with some context awareness so the same words/phrases won't have the same values, and also will often not have an existing selectable element enclosing it - making global selectors less useful (apart form perhaps the XLIFF case). I find the issue of using global rule for associating ITS semantics with existing mark-up a bit more difficult to ponder, as my knowledge of that existing mark-up is perhaps complete. However, I'd observe that for these provenance related data categories, they are applied in processes within LSP or clients expecting results from LSPs. So with the exception of XLIFF cases, rather than adding complexity of global rules, we could take a more assertive stance that implementations should adopt ITS, rather than support co-existence with other (as yet unknown) mark-up. though i don't necessarily advocate this, I'd point out that it might have more chance of success than for the 'internationalization' data categories, where we have to deal with variety and volatility in mark-up caused by people who won't be easily persuaded by pleas for standardization to support downstream localization processes. The result of this line of reasoning would be to actually _remove_ the pointer attributes from global rules _and_ local selectors for: QualityIssue, Quality Precis, transRevisionAgentProveance, transAgentProvenance and standoff provenance, as well as for disambiguation, text analysis and mtconfidence - even while we otherwise keep global rules for the first group to support convenient node selection. Apologies for the long post, Dave On 04/10/2012 12:42, Felix Sasaki wrote: > Hi Yves, all again ... > > I thought about this again, esp. your argument "The capability to map > existing markup in other vocabulary that have the same functionality > as the ITS data categories". > > For ITS 1.0, we specified best practices for such mappings, e.g. XHTML > http://www.w3.org/TR/2006/WD-xml-i18n-bp-20060518/#its-plus-xhtml10 > http://www.w3.org/TR/2006/WD-xml-i18n-bp-20060518/EX-relating-its-plus-xhtml-1.xml > > However, we only used data categories with small & fixed sets of > values: translate, term, dir, withintext. > Now, for the data categories we are discussing here, mostly there are > open value sets. Only QualityIssue has "qualityIssueType" > and Disambiguation "disambigType", but these are not really useful > without other, "open" value fields / attributes. > > So it seems that mapping to existing vocabulares with global rules for > QualityIssue, Quality Precis, Disambiguation, mtConfidence, text > analysis annotation, provenance > only makes sense with pointer attributes, like here > http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#EX-locQualityIssue-global-2 > > So there may be the options: > 1) remove the non pointer attributes, e.g. "locQualitIssueyComment", > and keep the pointer attributes, e.g. "locQualitIssueyCommentPointer" > 2) keep everything as is > 3) remove global rules completely. > > Thoughts? At the end the question is: who would implement 1) or 2), or > is everybody happy with 3)? > > Felix > > > 2012/10/2 Felix Sasaki <fsasaki@w3.org <mailto:fsasaki@w3.org>> > > Hi Yves, all, > > good points. It seems that both aspects may be interrelated. You > could resolve the "title" attribute issue by first converting HTML > to XLIFF. Here you make use of the mtConfidence score at a > "target" element (I think). > Of course using ITS mtConfidence score at XLIFF "target" doesn't > work (your second point). But maybe that's no important use case, > only for locQualityIssueRef? > > About "we don't care" ... it seems that the main format for adding > this metadata is XLIFF. So there is no need to have an XPath based > mechanism to accomodate many formats, but a mapping between ITS > metadata to XLIFF. I table could do too ... > > Felix > > > 2012/10/2 Yves Savourel <ysavourel@enlaso.com > <mailto:ysavourel@enlaso.com>> > > Hi Felix, all, > > That make sense for the information part: all the data > categories of the second list basically need to be specified > locally to be truly useable. > > But it would also remove two capabilities: > > -The capability to associate those data category with > attribute values. We all know it’s not recommended to have > translatable attributes, but then how do you associate > mtConfidence with the HTML5 title attribute for example? Maybe > the answer is “you don’t”. I’m just pointing the potential issue. > > -The capability to map existing markup in other vocabulary > that have the same functionality as the ITS data categories. > Granted: it’s unlikely to be a real use case in many cases: > I’d be surprised to see the same semantics as Disambiguation > anywhere else. But there is at least one case where a pointer > would be handy: the pointer for the attribute that points to > the standoff markup for Localization Quality Issue in XLIFF > 2.0. We cannot use its:locQualityIssuesRef because <mrk> > doesn’t allow non-XLIFF attributes. It means an ITS-only-aware > tool would not be able to see to associate a localization > quality issue with the content it pertains to. But maybe we > don’t care. > > -yves > > *From:*Felix Sasaki [mailto:fsasaki@w3.org > <mailto:fsasaki@w3.org>] > *Sent:* Tuesday, October 02, 2012 5:57 AM > *To:* public-multilingualweb-lt@w3.org > <mailto:public-multilingualweb-lt@w3.org> > *Subject:* issue-51 too many global rules > > Hi all, > > as an input to issue-51, to the "global rules" part. I went > through the new data categories. Below are some proposals. In > the cases there the "main function of global rules, to define > stable information about a document format" (adapted from > Yves' mail), I propose to drop global rules. > > - Domain, LocaleFilter, external resource, target pointer, > preserve space, allowed characters, storage size, id value: > keep global rules as is. > > - QualityIssue, Quality Precis, Disambiguation, mtConfidence, > text analysis annotation, provenance: drop global rules. It > seems that rules here don't fulfill the main function > mentioned above. That is also related to the aspect of adding > a closed set of metadata values, like "yes" or "no" for > translate. That makes sense for a document format, e.g. "all > code elements are translatable". But it doesn't make sense for > the six data categories: they don't add a closed set but > rather open sets of values, e.g. mtConfidence score =0.5. > These will probably not be specific to a document format. > > Thoughts? > > Felix > > -- > Felix Sasaki > > DFKI / W3C Fellow > > > > > -- > Felix Sasaki > DFKI / W3C Fellow > > > > > -- > Felix Sasaki > DFKI / W3C Fellow >
Received on Monday, 8 October 2012 13:04:53 UTC