- From: Felix Sasaki <fsasaki@w3.org>
- Date: Tue, 23 Oct 2012 12:33:43 +0200
- To: "public-multilingualweb-lt@w3.org" <public-multilingualweb-lt@w3.org>
- Message-ID: <CAL58czpGa7S+R=vGm_6M=T=b+CE=9JCx23HK-c135JpuzHBd6A@mail.gmail.com>
Hi Dave, all, the "too many global rules" thread got lost since two weeks ago. Given that there was no input on my last mail I would continue to propose to drop global rules, as described at http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Oct/0021.html Any thoughts? Thanks, Felix 2012/10/8 Felix Sasaki <fsasaki@w3.org> > Hi Dave, all, > > Am Montag, 8. Oktober 2012 schrieb Dave Lewis : > > Hi Felix, Yves, guys, >> I think we need to look at this on a case by case basis - hence this >> slightly long post. >> >> The data categories that Felix identifies for consideration are all >> essentially providential, i.e. that are recording the outcomes of some >> process on the textual content of the document. So they are most likely to >> be applied with local rules (often requiring a span insertion). >> >> However, in some cases global rules may be useful for applying rules to >> several nodes at once. For instance using translation provenance revision >> agent global rule to say all nodes with class="legal" were postedited by >> translator A and all with class="technical" by translator B. Exceptions >> (e.g. due to re-postediting) can then be easily implemented by local >> selection overrides. >> >> However, this provenance characteristic means they are more likely to be >> annotating a static document, i.e. a document submitted to the localsiation >> workflow (including disambiguation on the source). Therefore the ability of >> global rules to provide a convenient way of annotating sets of nodes with a >> single ITS declaration is not so vulnerable to changes in the document >> structure that renders the selectors inaccurate. >> >> In contrast the internationalsiation/instructional data categories >> (Domain, LocaleFilter, external resource, target pointer, preserve space, >> allowed characters, storage size, id value) are applied in a setting where >> the document may still be under structural and content change - so using >> global rule to assign value to specific sets of nodes is not good practice >> there. >> >> I can see the use of global rules for convenient node selection (and for >> attribute selection as Yves points out) as being useful therefore for >> QualityIssue, Quality Precis, transRevisionAgentProveance, >> transAgentProvenance and standoff provenance. >> >> I don't see global rules as useful in this respect for disambiguation, >> text analysis and mtconfidence score. These will always be applied to >> specific terms or segments, and may involve values generated with some >> context awareness so the same words/phrases won't have the same values, and >> also will often not have an existing selectable element enclosing it - >> making global selectors less useful (apart form perhaps the XLIFF case). >> >> I find the issue of using global rule for associating ITS semantics with >> existing mark-up a bit more difficult to ponder, as my knowledge of that >> existing mark-up is perhaps complete. However, I'd observe that for these >> provenance related data categories, they are applied in processes within >> LSP or clients expecting results from LSPs. So with the exception of XLIFF >> cases, rather than adding complexity of global rules, we could take a more >> assertive stance that implementations should adopt ITS, rather than support >> co-existence with other (as yet unknown) mark-up. though i don't >> necessarily advocate this, I'd point out that it might have more chance of >> success than for the 'internationalization' data categories, where we have >> to deal with variety and volatility in mark-up caused by people who won't >> be easily persuaded by pleas for standardization to support downstream >> localization processes. >> >> The result of this line of reasoning would be to actually _remove_ the >> pointer attributes from global rules _and_ local selectors for: >> QualityIssue, Quality Precis, transRevisionAgentProveance, >> transAgentProvenance and standoff provenance, as well as for >> disambiguation, text analysis and mtconfidence - even while we otherwise >> keep global rules for the first group to support convenient node selection. >> > > > Just for clarification: what do you mean by "local selectors" in "_remove_ > the pointer attributes from global rules _and_ local selectors"? > > Also, wrt to "support convenient node selection": for e.g. "quality > issue", we currently have this example for a global rule: > > <its:locQualityIssueRule selector="//span[@id='q1']" > locQualityIssueType="typographical" locQualityIssueComent="Sentence without > capitalization" locQualityIssueSeverity="50"/> > > Is it really convenient to have such a rule? You will need masses of them, > for each "id" attribute value one. Who would ever write such rules? > About your argument about having global rules: > "For instance using translation provenance revision agent global rule to > say all nodes with class="legal" were postedited by translator A and all > with class="technical" by translator B. Exceptions (e.g. due to > re-postediting) can then be easily implemented by local selection > overrides. " > Is it realistic that the chunks that are postedited are easily identified > by certain class attributes? If yes then the "chunk" mechanism has a use > case - but you could also have editing support that automatically inserts > the related local markup in the chunk touched. > > My main point is maybe: will we get people to write such rules, except the > "per document format" case Yves' mentioned? It is nice to have the > mechanism, but global rules writing is a huge effort and needs quite some > knowledge about XPath etc. > > Wrt to ITS and XLIFF, you wrote "So with the exception of XLIFF cases": > that case meant global rules + pointers, no? Or how would you realize > XLIFF+ITS then? > > Best, > > Felix > > > >> Apologies for the long post, >> Dave >> >> >> >> >> >> >> On 04/10/2012 12:42, Felix Sasaki wrote: >> >> Hi Yves, all again ... >> >> I thought about this again, esp. your argument "The capability to map >> existing markup in other vocabulary that have the same functionality as the >> ITS data categories". >> >> For ITS 1.0, we specified best practices for such mappings, e.g. XHTML >> http://www.w3.org/TR/2006/WD-xml-i18n-bp-20060518/#its-plus-xhtml10 >> >> http://www.w3.org/TR/2006/WD-xml-i18n-bp-20060518/EX-relating-its-plus-xhtml-1.xml >> >> However, we only used data categories with small & fixed sets of >> values: translate, term, dir, withintext. >> Now, for the data categories we are discussing here, mostly there are >> open value sets. Only QualityIssue has "qualityIssueType" >> and Disambiguation "disambigType", but these are not really useful without >> other, "open" value fields / attributes. >> >> So it seems that mapping to existing vocabulares with global rules for >> QualityIssue, Quality Precis, Disambiguation, mtConfidence, text analysis >> annotation, provenance >> only makes sense with pointer attributes, like here >> >> http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#EX-locQualityIssue-global-2 >> >> So there may be the options: >> 1) remove the non pointer attributes, e.g. "locQualitIssueyComment", and >> keep the pointer attributes, e.g. "locQualitIssueyCommentPointer" >> 2) keep everything as is >> 3) remove global rules completely. >> >> Thoughts? At the end the question is: who would implement 1) or 2), or >> is everybody happy with 3)? >> >> Felix >> >> >> 2012/10/2 Felix Sasaki <fsasaki@w3.org> >> >> Hi Yves, all, >> >> good points. It seems that both aspects may be interrelated. You could >> resolve the "title" attribute issue by first converting HTML to XLIFF. Here >> you make use of the mtConfidence score at a "target" element (I think). >> Of course using ITS mtConfidence score at XLIFF "target" doesn't work >> (your second point). But maybe that's no important use case, only for >> locQualityIssueRef? >> >> About "we don't care" ... it seems that the main format for adding this >> metadata is XLIFF. So there is no need to have an XPath based mechanism to >> accomodate many formats, but a mapping between ITS metadata to XLIFF. I >> table could do too ... >> >> Felix >> >> >> 2012/10/2 Yves Savourel <ysavourel@enlaso.com> >> >> Hi Felix, all, >> >> >> >> That make sense for the information part: all the data categories of the >> second list basically need to be specified locally to be truly useable. >> >> >> >> < >> >> > > -- > Felix Sasaki > DFKI / W3C Fellow > > -- Felix Sasaki DFKI / W3C Fellow
Received on Tuesday, 23 October 2012 10:34:14 UTC