Re: issue-51 too many global rules from Felix Sasaki on 2012-10-23 (public-multilingualweb-lt@w3.org from October 2012)

From: Felix Sasaki <fsasaki@w3.org>
Date: Tue, 23 Oct 2012 12:33:43 +0200
To: "public-multilingualweb-lt@w3.org" <public-multilingualweb-lt@w3.org>
Message-ID: <CAL58czpGa7S+R=vGm_6M=T=b+CE=9JCx23HK-c135JpuzHBd6A@mail.gmail.com>
Hi Dave, all,

the "too many global rules" thread got lost since two weeks ago. Given that
there was no input on my last mail I would continue to propose to drop
global rules, as described at
http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Oct/0021.html
Any thoughts?

Thanks,

Felix

2012/10/8 Felix Sasaki <fsasaki@w3.org>

> Hi Dave, all,
>
> Am Montag, 8. Oktober 2012 schrieb Dave Lewis :
>
>  Hi Felix, Yves, guys,
>> I think we need to look at this on a case by case basis - hence this
>> slightly long post.
>>
>> The data categories that Felix identifies for consideration are all
>> essentially providential, i.e. that are recording the outcomes of some
>> process on the textual content of the document. So they are most likely to
>> be applied with local rules (often requiring a span insertion).
>>
>> However, in some cases global rules may be useful for applying rules to
>> several nodes at once. For instance using translation provenance revision
>> agent global rule to say all nodes with class="legal" were postedited by
>> translator A and all with class="technical" by translator B. Exceptions
>> (e.g. due to re-postediting) can then be easily implemented by local
>> selection overrides.
>>
>> However, this provenance characteristic means they are more likely to be
>> annotating a static document, i.e. a document submitted to the localsiation
>> workflow (including disambiguation on the source). Therefore the ability of
>> global rules to provide a convenient way of annotating sets of nodes with a
>> single ITS declaration is not so vulnerable to changes in the document
>> structure that renders the selectors inaccurate.
>>
>> In contrast the internationalsiation/instructional data categories
>> (Domain, LocaleFilter, external resource, target pointer, preserve space,
>> allowed characters, storage size, id value) are applied in a setting where
>> the document may still be under structural and content change - so using
>> global rule to assign value to specific sets of nodes is not good practice
>> there.
>>
>> I can see the use of global rules for convenient node selection (and for
>> attribute selection as Yves points out) as being useful therefore for
>> QualityIssue, Quality Precis, transRevisionAgentProveance,
>> transAgentProvenance and standoff provenance.
>>
>> I don't see global rules as useful in this respect for disambiguation,
>> text analysis and mtconfidence score. These will always be applied to
>> specific terms or segments, and may involve values generated with some
>> context awareness so the same words/phrases won't have the same values, and
>> also will often not have an existing selectable element enclosing it -
>> making global selectors less useful (apart form perhaps the XLIFF case).
>>
>> I find the issue of using global rule for associating ITS semantics with
>> existing mark-up a bit more difficult to ponder, as my knowledge of that
>> existing mark-up is perhaps complete. However, I'd observe that for these
>> provenance related data categories, they are applied in processes within
>> LSP or clients expecting results from LSPs. So with the exception of XLIFF
>> cases, rather than adding complexity of global rules, we could take a more
>> assertive stance that implementations should adopt ITS, rather than support
>> co-existence with other (as yet unknown) mark-up. though i don't
>> necessarily advocate this, I'd point out that it might have more chance of
>> success  than for the 'internationalization' data categories, where we have
>> to deal with variety and volatility in mark-up caused by people who won't
>> be easily persuaded by pleas for standardization to support downstream
>> localization processes.
>>
>> The result of this line of reasoning would be to actually _remove_ the
>> pointer attributes from global rules _and_ local selectors for:
>> QualityIssue, Quality Precis, transRevisionAgentProveance,
>> transAgentProvenance and standoff provenance, as well as for
>> disambiguation, text analysis and mtconfidence - even while we otherwise
>> keep global rules for the first group to support convenient node selection.
>>
>
>
> Just for clarification: what do you mean by "local selectors" in "_remove_
> the pointer attributes from global rules _and_ local selectors"?
>
> Also, wrt to "support convenient node selection": for e.g. "quality
> issue", we currently have this example for a global rule:
>
> <its:locQualityIssueRule selector="//span[@id='q1']"
> locQualityIssueType="typographical" locQualityIssueComent="Sentence without
> capitalization" locQualityIssueSeverity="50"/>
>
> Is it really convenient to have such a rule? You will need masses of them,
> for each "id" attribute value one. Who would ever write such rules?
> About your argument about having global rules:
> "For instance using translation provenance revision agent global rule to
> say all nodes with class="legal" were postedited by translator A and all
> with class="technical" by translator B. Exceptions (e.g. due to
> re-postediting) can then be easily implemented by local selection
> overrides. "
> Is it realistic that the chunks that are postedited are easily identified
> by certain class attributes? If yes then the "chunk" mechanism has a use
> case - but you could also have editing support that automatically inserts
>  the related local markup in the chunk touched.
>
> My main point is maybe: will we get people to write such rules, except the
> "per document format" case Yves' mentioned? It is nice to have the
> mechanism, but global rules writing is a huge effort and needs quite some
> knowledge about XPath etc.
>
> Wrt to ITS and XLIFF, you wrote "So with the exception of XLIFF cases":
> that case meant global rules + pointers, no? Or how would you realize
> XLIFF+ITS then?
>
> Best,
>
> Felix
>
>
>
>> Apologies for the long post,
>> Dave
>>
>>
>>
>>
>>
>>
>> On 04/10/2012 12:42, Felix Sasaki wrote:
>>
>> Hi Yves, all again ...
>>
>>  I thought about this again, esp. your argument "The capability to map
>> existing markup in other vocabulary that have the same functionality as the
>> ITS data categories".
>>
>>  For ITS 1.0, we specified best practices for such mappings, e.g. XHTML
>> http://www.w3.org/TR/2006/WD-xml-i18n-bp-20060518/#its-plus-xhtml10
>>
>> http://www.w3.org/TR/2006/WD-xml-i18n-bp-20060518/EX-relating-its-plus-xhtml-1.xml
>>
>>  However, we only used data categories with small & fixed sets of
>> values: translate, term, dir, withintext.
>> Now, for the data categories we are discussing here, mostly there are
>> open value sets. Only QualityIssue has "qualityIssueType"
>> and Disambiguation "disambigType", but these are not really useful without
>> other, "open" value fields / attributes.
>>
>>  So it seems that mapping to existing vocabulares with global rules for
>> QualityIssue, Quality Precis, Disambiguation, mtConfidence, text analysis
>> annotation, provenance
>> only makes sense with pointer attributes, like here
>>
>> http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#EX-locQualityIssue-global-2
>>
>>  So there may be the options:
>> 1) remove the non pointer attributes, e.g. "locQualitIssueyComment", and
>> keep the pointer attributes, e.g. "locQualitIssueyCommentPointer"
>> 2) keep everything as is
>> 3) remove global rules completely.
>>
>>  Thoughts? At the end the question is: who would implement 1) or 2), or
>> is everybody happy with 3)?
>>
>>  Felix
>>
>>
>> 2012/10/2 Felix Sasaki <fsasaki@w3.org>
>>
>> Hi Yves, all,
>>
>>  good points. It seems that both aspects may be interrelated. You could
>> resolve the "title" attribute issue by first converting HTML to XLIFF. Here
>> you make use of the mtConfidence score at a "target" element (I think).
>> Of course using ITS mtConfidence score at XLIFF "target" doesn't work
>> (your second point). But maybe that's no important use case, only for
>> locQualityIssueRef?
>>
>>  About "we don't care" ... it seems that the main format for adding this
>> metadata is XLIFF. So there is no need to have an XPath based mechanism to
>> accomodate many formats, but a mapping between ITS metadata to XLIFF. I
>> table could do too ...
>>
>>  Felix
>>
>>
>> 2012/10/2 Yves Savourel <ysavourel@enlaso.com>
>>
>>  Hi Felix, all,
>>
>>
>>
>> That make sense for the information part: all the data categories of the
>> second list basically need to be specified locally to be truly useable.
>>
>>
>>
>> <
>>
>>
>
> --
> Felix Sasaki
> DFKI / W3C Fellow
>
>


-- 
Felix Sasaki
DFKI / W3C Fellow
Received on Tuesday, 23 October 2012 10:34:14 UTC