Re: issue-51 too many global rules from Felix Sasaki on 2012-10-08 (public-multilingualweb-lt@w3.org from October 2012)

From: Felix Sasaki <fsasaki@w3.org>
Date: Mon, 8 Oct 2012 22:41:09 +0200
To: Dave Lewis <dave.lewis@cs.tcd.ie>
Cc: "public-multilingualweb-lt@w3.org" <public-multilingualweb-lt@w3.org>
Message-ID: <CAL58czpZ74EjMZSbi-GE-EhL=q3dZgXUMUF=PNamDnxnGjiArA@mail.gmail.com>
Hi Dave, all,

Am Montag, 8. Oktober 2012 schrieb Dave Lewis :

>  Hi Felix, Yves, guys,
> I think we need to look at this on a case by case basis - hence this
> slightly long post.
>
> The data categories that Felix identifies for consideration are all
> essentially providential, i.e. that are recording the outcomes of some
> process on the textual content of the document. So they are most likely to
> be applied with local rules (often requiring a span insertion).
>
> However, in some cases global rules may be useful for applying rules to
> several nodes at once. For instance using translation provenance revision
> agent global rule to say all nodes with class="legal" were postedited by
> translator A and all with class="technical" by translator B. Exceptions
> (e.g. due to re-postediting) can then be easily implemented by local
> selection overrides.
>
> However, this provenance characteristic means they are more likely to be
> annotating a static document, i.e. a document submitted to the localsiation
> workflow (including disambiguation on the source). Therefore the ability of
> global rules to provide a convenient way of annotating sets of nodes with a
> single ITS declaration is not so vulnerable to changes in the document
> structure that renders the selectors inaccurate.
>
> In contrast the internationalsiation/instructional data categories
> (Domain, LocaleFilter, external resource, target pointer, preserve space,
> allowed characters, storage size, id value) are applied in a setting where
> the document may still be under structural and content change - so using
> global rule to assign value to specific sets of nodes is not good practice
> there.
>
> I can see the use of global rules for convenient node selection (and for
> attribute selection as Yves points out) as being useful therefore for
> QualityIssue, Quality Precis, transRevisionAgentProveance,
> transAgentProvenance and standoff provenance.
>
> I don't see global rules as useful in this respect for disambiguation,
> text analysis and mtconfidence score. These will always be applied to
> specific terms or segments, and may involve values generated with some
> context awareness so the same words/phrases won't have the same values, and
> also will often not have an existing selectable element enclosing it -
> making global selectors less useful (apart form perhaps the XLIFF case).
>
> I find the issue of using global rule for associating ITS semantics with
> existing mark-up a bit more difficult to ponder, as my knowledge of that
> existing mark-up is perhaps complete. However, I'd observe that for these
> provenance related data categories, they are applied in processes within
> LSP or clients expecting results from LSPs. So with the exception of XLIFF
> cases, rather than adding complexity of global rules, we could take a more
> assertive stance that implementations should adopt ITS, rather than support
> co-existence with other (as yet unknown) mark-up. though i don't
> necessarily advocate this, I'd point out that it might have more chance of
> success  than for the 'internationalization' data categories, where we have
> to deal with variety and volatility in mark-up caused by people who won't
> be easily persuaded by pleas for standardization to support downstream
> localization processes.
>
> The result of this line of reasoning would be to actually _remove_ the
> pointer attributes from global rules _and_ local selectors for:
> QualityIssue, Quality Precis, transRevisionAgentProveance,
> transAgentProvenance and standoff provenance, as well as for
> disambiguation, text analysis and mtconfidence - even while we otherwise
> keep global rules for the first group to support convenient node selection.
>


Just for clarification: what do you mean by "local selectors" in "_remove_
the pointer attributes from global rules _and_ local selectors"?

Also, wrt to "support convenient node selection": for e.g. "quality issue",
we currently have this example for a global rule:

<its:locQualityIssueRule selector="//span[@id='q1']"
locQualityIssueType="typographical" locQualityIssueComent="Sentence without
capitalization" locQualityIssueSeverity="50"/>

Is it really convenient to have such a rule? You will need masses of them,
for each "id" attribute value one. Who would ever write such rules?
About your argument about having global rules:
"For instance using translation provenance revision agent global rule to
say all nodes with class="legal" were postedited by translator A and all
with class="technical" by translator B. Exceptions (e.g. due to
re-postediting) can then be easily implemented by local selection
overrides. "
Is it realistic that the chunks that are postedited are easily identified
by certain class attributes? If yes then the "chunk" mechanism has a use
case - but you could also have editing support that automatically inserts
 the related local markup in the chunk touched.

My main point is maybe: will we get people to write such rules, except the
"per document format" case Yves' mentioned? It is nice to have the
mechanism, but global rules writing is a huge effort and needs quite some
knowledge about XPath etc.

Wrt to ITS and XLIFF, you wrote "So with the exception of XLIFF cases":
that case meant global rules + pointers, no? Or how would you realize
XLIFF+ITS then?

Best,

Felix



> Apologies for the long post,
> Dave
>
>
>
>
>
>
> On 04/10/2012 12:42, Felix Sasaki wrote:
>
> Hi Yves, all again ...
>
>  I thought about this again, esp. your argument "The capability to map
> existing markup in other vocabulary that have the same functionality as the
> ITS data categories".
>
>  For ITS 1.0, we specified best practices for such mappings, e.g. XHTML
> http://www.w3.org/TR/2006/WD-xml-i18n-bp-20060518/#its-plus-xhtml10
>
> http://www.w3.org/TR/2006/WD-xml-i18n-bp-20060518/EX-relating-its-plus-xhtml-1.xml
>
>  However, we only used data categories with small & fixed sets of values:
> translate, term, dir, withintext.
> Now, for the data categories we are discussing here, mostly there are open
> value sets. Only QualityIssue has "qualityIssueType" and Disambiguation
> "disambigType", but these are not really useful without other, "open" value
> fields / attributes.
>
>  So it seems that mapping to existing vocabulares with global rules for
> QualityIssue, Quality Precis, Disambiguation, mtConfidence, text analysis
> annotation, provenance
> only makes sense with pointer attributes, like here
>
> http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#EX-locQualityIssue-global-2
>
>  So there may be the options:
> 1) remove the non pointer attributes, e.g. "locQualitIssueyComment", and
> keep the pointer attributes, e.g. "locQualitIssueyCommentPointer"
> 2) keep everything as is
> 3) remove global rules completely.
>
>  Thoughts? At the end the question is: who would implement 1) or 2), or
> is everybody happy with 3)?
>
>  Felix
>
>
> 2012/10/2 Felix Sasaki <fsasaki@w3.org>
>
> Hi Yves, all,
>
>  good points. It seems that both aspects may be interrelated. You could
> resolve the "title" attribute issue by first converting HTML to XLIFF. Here
> you make use of the mtConfidence score at a "target" element (I think).
> Of course using ITS mtConfidence score at XLIFF "target" doesn't work
> (your second point). But maybe that's no important use case, only for
> locQualityIssueRef?
>
>  About "we don't care" ... it seems that the main format for adding this
> metadata is XLIFF. So there is no need to have an XPath based mechanism to
> accomodate many formats, but a mapping between ITS metadata to XLIFF. I
> table could do too ...
>
>  Felix
>
>
> 2012/10/2 Yves Savourel <ysavourel@enlaso.com>
>
>  Hi Felix, all,
>
>
>
> That make sense for the information part: all the data categories of the
> second list basically need to be specified locally to be truly useable.
>
>
>
> <
>
>

-- 
Felix Sasaki
DFKI / W3C Fellow
Received on Monday, 8 October 2012 20:41:34 UTC