Re: issue-51 too many global rules from Dave Lewis on 2012-10-08 (public-multilingualweb-lt@w3.org from October 2012)

From: Dave Lewis <dave.lewis@cs.tcd.ie>
Date: Mon, 08 Oct 2012 14:08:18 +0100
To: public-multilingualweb-lt@w3.org
Message-ID: <5072D042.4070006@cs.tcd.ie>
Hi Felix, Yves, guys,
I think we need to look at this on a case by case basis - hence this 
slightly long post.

The data categories that Felix identifies for consideration are all 
essentially providential, i.e. that are recording the outcomes of some 
process on the textual content of the document. So they are most likely 
to be applied with local rules (often requiring a span insertion).

However, in some cases global rules may be useful for applying rules to 
several nodes at once. For instance using translation provenance 
revision agent global rule to say all nodes with class="legal" were 
postedited by translator A and all with class="technical" by translator 
B. Exceptions (e.g. due to re-postediting) can then be easily 
implemented by local selection overrides.

However, this provenance characteristic means they are more likely to be 
annotating a static document, i.e. a document submitted to the 
localsiation workflow (including disambiguation on the source). 
Therefore the ability of global rules to provide a convenient way of 
annotating sets of nodes with a single ITS declaration is not so 
vulnerable to changes in the document structure that renders the 
selectors inaccurate.

In contrast the internationalsiation/instructional data categories 
(Domain, LocaleFilter, external resource, target pointer, preserve 
space, allowed characters, storage size, id value) are applied in a 
setting where the document may still be under structural and content 
change - so using global rule to assign value to specific sets of nodes 
is not good practice there.

I can see the use of global rules for convenient node selection (and for 
attribute selection as Yves points out) as being useful therefore for  
QualityIssue, Quality Precis, transRevisionAgentProveance, 
transAgentProvenance and standoff provenance.

I don't see global rules as useful in this respect for disambiguation, 
text analysis and mtconfidence score. These will always be applied to 
specific terms or segments, and may involve values generated with some 
context awareness so the same words/phrases won't have the same values, 
and also will often not have an existing selectable element enclosing it 
- making global selectors less useful (apart form perhaps the XLIFF case).

I find the issue of using global rule for associating ITS semantics with 
existing mark-up a bit more difficult to ponder, as my knowledge of that 
existing mark-up is perhaps complete. However, I'd observe that for 
these provenance related data categories, they are applied in processes 
within LSP or clients expecting results from LSPs. So with the exception 
of XLIFF cases, rather than adding complexity of global rules, we could 
take a more assertive stance that implementations should adopt ITS, 
rather than support co-existence with other (as yet unknown) mark-up. 
though i don't necessarily advocate this, I'd point out that it might 
have more chance of success  than for the 'internationalization' data 
categories, where we have to deal with variety and volatility in mark-up 
caused by people who won't be easily persuaded by pleas for 
standardization to support downstream localization processes.

The result of this line of reasoning would be to actually _remove_ the 
pointer attributes from global rules _and_ local selectors for: 
QualityIssue, Quality Precis, transRevisionAgentProveance, 
transAgentProvenance and standoff provenance, as well as for 
disambiguation, text analysis and mtconfidence - even while we otherwise 
keep global rules for the first group to support convenient node selection.

Apologies for the long post,
Dave






On 04/10/2012 12:42, Felix Sasaki wrote:
> Hi Yves, all again ...
>
> I thought about this again, esp. your argument "The capability to map 
> existing markup in other vocabulary that have the same functionality 
> as the ITS data categories".
>
> For ITS 1.0, we specified best practices for such mappings, e.g. XHTML
> http://www.w3.org/TR/2006/WD-xml-i18n-bp-20060518/#its-plus-xhtml10
> http://www.w3.org/TR/2006/WD-xml-i18n-bp-20060518/EX-relating-its-plus-xhtml-1.xml
>
> However, we only used data categories with small & fixed sets of 
> values: translate, term, dir, withintext.
> Now, for the data categories we are discussing here, mostly there are 
> open value sets. Only QualityIssue has "qualityIssueType" 
> and Disambiguation "disambigType", but these are not really useful 
> without other, "open" value fields / attributes.
>
> So it seems that mapping to existing vocabulares with global rules for
> QualityIssue, Quality Precis, Disambiguation, mtConfidence, text 
> analysis annotation, provenance
> only makes sense with pointer attributes, like here
> http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#EX-locQualityIssue-global-2
>
> So there may be the options:
> 1) remove the non pointer attributes, e.g. "locQualitIssueyComment", 
> and keep the pointer attributes, e.g. "locQualitIssueyCommentPointer"
> 2) keep everything as is
> 3) remove global rules completely.
>
> Thoughts? At the end the question is: who would implement 1) or 2), or 
> is everybody happy with 3)?
>
> Felix
>
>
> 2012/10/2 Felix Sasaki <fsasaki@w3.org <mailto:fsasaki@w3.org>>
>
>     Hi Yves, all,
>
>     good points. It seems that both aspects may be interrelated. You
>     could resolve the "title" attribute issue by first converting HTML
>     to XLIFF. Here you make use of the mtConfidence score at a
>     "target" element (I think).
>     Of course using ITS mtConfidence score at XLIFF "target" doesn't
>     work (your second point). But maybe that's no important use case,
>     only for locQualityIssueRef?
>
>     About "we don't care" ... it seems that the main format for adding
>     this metadata is XLIFF. So there is no need to have an XPath based
>     mechanism to accomodate many formats, but a mapping between ITS
>     metadata to XLIFF. I table could do too ...
>
>     Felix
>
>
>     2012/10/2 Yves Savourel <ysavourel@enlaso.com
>     <mailto:ysavourel@enlaso.com>>
>
>         Hi Felix, all,
>
>         That make sense for the information part: all the data
>         categories of the second list basically need to be specified
>         locally to be truly useable.
>
>         But it would also remove two capabilities:
>
>         -The capability to associate those data category with
>         attribute values. We all know it’s not recommended to have
>         translatable attributes, but then how do you associate
>         mtConfidence with the HTML5 title attribute for example? Maybe
>         the answer is “you don’t”. I’m just pointing the potential issue.
>
>         -The capability to map existing markup in other vocabulary
>         that have the same functionality as the ITS data categories.
>         Granted: it’s unlikely to be a real use case in many cases:
>         I’d be surprised to see the same semantics as Disambiguation
>         anywhere else. But there is at least one case where a pointer
>         would be handy: the pointer for the attribute that points to
>         the standoff markup for Localization Quality Issue in XLIFF
>         2.0. We cannot use its:locQualityIssuesRef because <mrk>
>         doesn’t allow non-XLIFF attributes. It means an ITS-only-aware
>         tool would not be able to see to associate a localization
>         quality issue with the content it pertains to. But maybe we
>         don’t care.
>
>         -yves
>
>         *From:*Felix Sasaki [mailto:fsasaki@w3.org
>         <mailto:fsasaki@w3.org>]
>         *Sent:* Tuesday, October 02, 2012 5:57 AM
>         *To:* public-multilingualweb-lt@w3.org
>         <mailto:public-multilingualweb-lt@w3.org>
>         *Subject:* issue-51 too many global rules
>
>         Hi all,
>
>         as an input to issue-51, to the "global rules" part. I went
>         through the new data categories. Below are some proposals. In
>         the cases there the "main function of global rules, to define
>         stable information about a document format" (adapted from
>         Yves' mail), I propose to drop global rules.
>
>         - Domain, LocaleFilter, external resource, target pointer,
>         preserve space, allowed characters, storage size, id value:
>         keep global rules as is.
>
>         - QualityIssue, Quality Precis, Disambiguation, mtConfidence,
>         text analysis annotation, provenance: drop global rules. It
>         seems that rules here don't fulfill the main function
>         mentioned above. That is also related to the aspect of adding
>         a closed set of metadata values, like "yes" or "no" for
>         translate. That makes sense for a document format, e.g. "all
>         code elements are translatable". But it doesn't make sense for
>         the six data categories: they don't add a closed set but
>         rather open sets of values, e.g. mtConfidence score =0.5.
>         These will probably not be specific to a document format.
>
>         Thoughts?
>
>         Felix
>
>         -- 
>         Felix Sasaki
>
>         DFKI / W3C Fellow
>
>
>
>
>     -- 
>     Felix Sasaki
>     DFKI / W3C Fellow
>
>
>
>
> -- 
> Felix Sasaki
> DFKI / W3C Fellow
>
Received on Monday, 8 October 2012 13:04:53 UTC