ITS f2f Mandelieu 2006 Minutes from Felix Sasaki on 2006-03-01 (public-i18n-its@w3.org from January to March 2006)

From: Felix Sasaki <fsasaki@w3.org>
Date: Wed, 01 Mar 2006 20:24:51 +0900
To: public-i18n-its@w3.org
Message-ID: <44058483.9090500@w3.org>
Hi all,

This is the summary of the of ITS f2f, Mandelieu, February / March 2006.
It encompasses our change proposals, and mostly "cleared" minutes (some
topics which I found hard to summarize are here "as is").

Cheers,

Felix

---------------------------------------------
action items
---------------------------------------------

The action items from the f2f and the call yesterday:

- all: discuss change proposals from f2f within the next two weeks. I
would propose that we give us a deadline for this discussion, e.g. until
the ITS call on 15 March. If we don't agree on a proposal, we just
should drop it. On Friday this week I will make an bugzilla entry for
each proposal which needs more discussion.
- action: for editor's of the techniques document: give examples how to
use its:locInfoRef (see below)
- action: decide if we need the distinction between "alert" and
"description" for localization information.
- action: Richard to describe an additional level of conformance for Ruby.
- action: Felix to update bugzilla with open issues, "ITS 1.1 / 2.0"
proposals, the change proposals above
- action: Christian and Felix need to update their result of conformance
discussion in the spec.
- action: All to think about f2f (April & June), see mail from Yves

I have marked all change proposals with "proposal-xx". There are 09
proposals. The discussion from yesterday's call is marked as "Discussion
during the call on Tuesday:". Could you please until Friday go trough
the proposals and write a mail with s.t. like
proposal-01: agree
proposal-02: agree
...
or instead of "agree": "needs more discussion", or some comment with
agreement.

Note: most of the proposals concern syntactic simplifications or "making
clearer" of global rules, they add *no* new functionality. The main
functionality related proposal is proposal-05.

---------------------------------------------
Simplification and clarification of global markup for selection
---------------------------------------------

************
proposal-01: have only one data category per <documentRule> element
************
We observed that there is no need for selector attributes with data
category specific names in a global position. We propose to define that
"each <documentRule> element is used for only *one* data category at at
time. Hence, we can simplify the definition of <documentRule> from

<documentRule> contains data category + various data category specific
selector Attributes

to

<documentRule> contains one data category attribute + *the* its:selector
attribute.

Examples:

1. translatability: <its:documentRule its:selector="//p"
its:translate="yes"/>
2. localization information: <its:documentRule its:selector="//*"
its:dir="ltr"/>
3. terminology: <its:documentRule its:selector="//qterm" its:term="yes"/>
4. directionality: <its:documentRule its:selector="//*" its:dir="rtl"/>
5. ruby: <its:documentRule its:selector="/body/img[1]/@alt"
its:rubyText="Some ruby text"/>


************
proposal-02: use instead of <documentRule> elements with data category
specific names
************
We would propose to use instead of <its:documentRule> a set of elements:
for each data category *one* element. Example:

1. translatability: <its:translateRule its:selector="//p"
its:translate="yes"/>
2. localization information: see below.
3. terminology: <its:termRule its:selector="//qterm" its:term="yes"/>
4. directionality: <its:dirRule its:selector="//*" its:dir="rtl"/>
5. ruby: <its:rubyRule its:selector="/body/img[1]/@alt"
its:rubyText="Some ruby text"/>

In this way, it is easier to validate global rules (e.g. make sure that
@its:translate only occurs at the <its:translateRule> element).


************
proposal-03: create a child element <locInfo> for global localization
information
************
For the expression of localization information in global rules, we would
prose an element with a child element:

<its:locInfoRule its:selector="//p">
<its:locInfo>Some localization information</its:locInfo>
</its:locInfoRule>

In this way, we avoid natural language text as attribute content, at
least for global rules (that was the case before with @its:locInfo).

************
proposal-04: have an attribute @its:locInfoRef for localization
information globally / locally
************

In addition to having the localization information in local position in
an attribute (bad for translatability!), we propose to have an attribute
@its:locInfoRef which contains a URI. This allows for very different
usage scenarios: localization information can be in the data base, in an
external xml file, on a web site, in the same document. action: to give
examples in the techniques document how to use this. Example:

<text>
<joke its:locInfoRef="http://www.example.com/klingons#humor">Three man
went to a pup: an Klingon, ...
</joke>
</text>
the URI in @its:locInfoRef is resolved to: "In Star Track, Klingons are
known for having no sense of humor. (note: germans might be more
appropriate here)"


************
proposal summary 01-04:
************

the new conent model for documentRule, reflecting proposals 01-04, is:

documentRule =
{translate | (locInfo,locInfotype?(maybe optional)) | (term,termRef?) |
dir | ruby}

translate = element translateRule { attribute selector {...}, attribute
translate {"yes"|"no"}}
locInfo = element locInfoRule { attribute selector {...}, attribute
locInfoRef { xsd:anyURI}?, element locInfo { text }}
term = element termRule { attribute selector {...}, attribute term {"yes"} }
termRef = element termRefRule { attribute selector {...}, attribute
termRef {xsd:anyURI} }
dir = element dir { attribute selector {...}, attribute dir
{"ltr"|"rtl"|"lro"|"rlo"}}


************
proposal-05: Separating the tasks of globally identifying+adding ITS
information to XML nodes, versus globally identifying+mapping data
categories to XML nodes. Having a set of "map" attributes (e.g.
@its:translateMap") for the mapping task.
************

Background: Yves required at some point the @its:locInfoContent
attribute, to be able to refer to existing localization information in a
document, rather than "adding" this information to the document. We
would propose to generalize this requirement and distingush between:

- identifying+adding information to  nodes in an XML document (which all
existing global rules, except @locInfoContent do)
- identifying+mapping data categories to nodes in an XML document (e.g.
saying "this existing node is mapped to the localization information
data category", or "this node has the 'meaning' of the localization
information data category").

Purpose of mapping: ITS data categories are used to "normalize" a
document, that is to say "this kind of existing markup has the meaning
of this ITS data category". Mapping makes meaning explicit, but does not
add information.

Example of the need to separate mapping and adding information:
<its:documentRule its:dir="ltr" its:dirSelector="//*[@dir='ltr']"/> is
not mapping, it opens the door for errors (via the repitition of "ltr"
in both attribute values)/>

instead, we propose for mapping separate mapping attributes, one for
each part of a data category:

<its:dirRule its:selector="//*" its:dirMap="@dir"/>

The attribute for mapping contains a relative location path (relative to
the nodes which are selected by the its:selector attribute). It would
not be enough to have only one XPath in the selector attribute, as in
the case below:

<span class="ruby">...
<its:documentRule selector="//span" rubyTextMap="@class='rt'"> (the span
elements are identified by the XPath expression n the selector
attribute. The mapping to its:rubyText is done via the XPath expression
in the rubyTextMap attribute)


Benefit: People who already have ITS related markup in their schema
(e.g. "translate" attribute in DITA, ruby in opendoc), can be convinced
to adopt ITS not by changing their schema, but by making the semantics
of their existing markup declarations clear with the separate
documentRule element. Example:

<its:documentRule its:selector="//*"
its:translateMap="@dita:translate"/> (saying "dita:translate" has the
semantics of "its:translate")
<its:documentRule its:selector="//odf:ruby" its:rubyMap="."/> (saying
"the odf ruby element has the semantics of the its:ruby element")

Wide spread adoption of ITS becomes easier.

Influence of this change to the working draft: We propose to change the
description of the general mechanisms for global rules (i.e. integrate
the difference "adding information" versus "mapping"), and show with
*non-normative examples* for each data category in the data category
sections, how these mechanisms can be used. Input to the non-normative
examples: see below.

The following is a "go trough all data categories", to see how this
proposal works. Note: the markup change proposal to have different
element names for global rules is not implemented below, so you still
have e.g. its:documentRule instead of its:translateRule.

---------------------------------------------
Single data categories: Translatability
---------------------------------------------

- Scenario: there is no translatability information in the document.
Tasks for ITS: identify and add information. Example:
<p>...</p>
<documentRule its:selector="//p" its:translate="yes" />

- Scenario: there is no translatability information in the document, but
a different element you identify. Tasks for ITS: identify and add
information. Example:

<p class="translate">... </p>
<documentRule its:selector="//p[class='translate']" its:translate="yes" />

- Scenario: there are the same values, but a different name which does
not match ITS. Tasks for ITS: identify, map, add information. Example:

<p translation="yes">...</p>
<p translation="no">...</p>
<documentRule its:selector="//p" its:translateMap="@translation"/>

Discussion during the call on Tuesday:

[[Yves: why do you still have the selector? Why not a translateMap
attribute only?
Richard: translateMap allows you the specification of several attributes
Christian: no more qualified names? Richard: qualified names are still
possible
.. as for mapping:
.. mapping only works only if the semantics are really identical
Sebastian: it makes a specific assertation that it is really identical
.. useful e.g. if you just want to use the elements / attributes in your
own namespace
.. as it stands, you can make formally clear that this is identical
.. processing of "mapping" does not mean adding extra nodes
Christian: if my host vocabulary has an attribute 'translation'
.. if we have a discrepancy with values?
Richard: that is the next scenario:]]

- Scenario: there are different attribute names and values, but with
same semantics as ITS. Tasks for ITS: identify, map, add information.
Example:

<p translation="true">...</p>
<p translation="false">...</p>
<documentRule its:selector="//p[@translation='true']" its:translate="yes"/>
<documentRule its:selector="//p[@translation='false']" its:translate="no"/>

if we only want to identify and add information, we would have:

<p translate="true">...</p>
<p translate="false">...</p>
<documentRule its:selector="//p[@translate='true']" its:translate="yes"/>
<documentRule its:selector="//p[@translate='false']" its:translate="no" />

benefit for adding in this case: an ITS aware editor could use the
information to be able to process the non-ITS markup in the same way as
ITS markup, e.g. highlighting translatable text.


- Scenario: there is a different vocabulary, with different values and
different semantics. Tasks for ITS: identify, add information, but no
mapping. Example:

<p translation="true">...</p>
<p translation="false">...</p>
<p translation="maybe">...</p>
<documentRule its:selector="//p[@translation='true']" its:translate="yes" />
<documentRule its:selector="//p[@translation='false']" its:translate="no" />
<documentRule its:selector="//p[@translation='maybe']"
its:translate="yes" /> (has to be decided whether 'maybe' should be
'yes' or 'no')

This scenario works not if the values cannot be enumerated. Examples:

<p translation="0.2">...</p>
<p translation="0.235">...</p>
<p translation="0.9">...</p>
<p translation="If I have time">...</p>


- Scenario: People want to say "my markup relates to an ITS data
category", but they do not want to use ITS values. Tasks for ITS:
identify, map. Example:

<xyz:p xzy:translate="yes">...</xzy:p>
<xyz:p xzy:translate="no">...</xzy:p>
<documentRule its:selector="//xyz:p" its:translateMap="@xyz:translate"/>
(this means "xyz:translate has the meaning of the translatability data
category; I 'trust' that the values of xyz:translate fit as well")

Benefit: There is no need to "pollute" your namespace with ITS markup.
This usage of mapping just passes the information via the map attributes.

---------------------------------------------
Single data categories: Localization Information
---------------------------------------------

- General examples:
ex. locInfo 1:
YR_QUERY(year, month)
DNote:
Only the words inside the parentheses should be translated.  Leave the
rest in upper case.

ex. locInfo 2:
Shift
DNote:
This refers to Image Shift.  A single word has  been used because of
space restrictions.

ex. locInfo 3:
enabled
This refers to 'stapler options'.

- Scenario: there is no localization information in the document. Tasks
for ITS: identify, add information. Example: a rule which says "identify
the jokes".

<joke>three klingons ....</joke>
<its:documentRule its:select="//joke"
its:locInfoRef="http://www.myExample.com/klingon#humor"/>

- Scenario: the localization information is available as an attribute
value in the instance. Tasks for ITS: identify, map. Example:
<its:documentRule selector"//joke" its:locInfoMap="@note"/>
<text>
<joke note="In Star Track, Klingons are known for having no sense of
humor. (note: germans might be more appropriate here)">Three man went to
a pup: an Klingon
</joke>
</text>

- Scenario: there is no localization information in the document. Tasks
for ITS: add the information in the instance. Examples:
- if there is no localization information in the instance:
<text>
<joke its:locInfo="In Star Track, Klingons are known for having no sense
of humor. (note: germans might be more appropriate here)">Three man went
to a pup: an Klingon
</joke>
</text>

or

<text>
<p its:locInfoRef="http://www.example.com/klingon#humor">Three man went
to a pup: an Klingon
</p>
</text>

- Scenario: "same values, but different name in existing vocabulary
which does not match ITS". This scenario from the translation data does
not apply, because there are no enumerated lists of values with
localization information.

- Scenario: there is an existing locInfoRef attribute. Tasks for ITS:
identify, map. Example:
<xyz:p xyz:dnote="someURI">..
<its:documentRule its:selection="//xyz:p" its:locInfoRefMap="@xyz:dnote"/>

action: Distinction between "alert" and "description": still to be
discussed, if we need a mapping here.


---------------------------------------------
Single data categories: Directionality
---------------------------------------------

- Scenario: no directionality information at all, but we can isolate
elements with directionality information (e.g. an <arabic> element).
Tasks for ITS: identify, add information. Examples:

<arabic>...</arabic>
<its:documentRule its:selector="//arabic" its:dir="rtl"/>

<its:documentRule its:selector"//span[@xml:lang='ar']" its:dir="rtl"/>
<span xml:lang="ar">...

- Scenario: there is already directionality information in the document,
but not in the ITS namespace. Tasks for ITS: identify, add information.
Example:

<bdo dir="rtl"> ...</bdo>
<its:documentRule its:selector"//bdo[@dir['rtl']]" its:dir="rlo"/> (case
for XHTML 1 or e.g. old version of xmlspec)

<someElement dir="rtl">...</someElement>
<its:documentRule its:selector"//*[[@dir['rtl']]" its:dir="rtl"/>
Sebastian: resolve this by order of documentRule elements
.. or say "//*[not(self::bdo)][@dir['rtl']]"


- other scenarios: follow.


---------------------------------------------
Single data categories: Ruby
---------------------------------------------

************
proposal-06: Use the existing conformance levels of W3C Ruby, and have
one additional one.
************

- On conformance:
We agreed to refer to the W3C ruby specification and to cite its
existing level (simple and complex ruby markup) of conformance.

We propose to have another level of conformance (working title
"intermediate ruby markup"), which will be contributed by Richard.
action: Richard to describe an additional level of conformance.

Example from opendoc:

<odf:ruby>
     <odf:rubyBase>W3C</odf:rubyBase>
     <odf:rubyText>World Wide Web Consortium</odf:rubyText>
    </odf:ruby>

Example of simple ruby from the W3C ruby spec:

<its:ruby>
     <its:rb>W3C</its:rb>
     <its:rt>World Wide Web Consortium</its:rt>
    </its:ruby>

We can use the mapping mechanism to describe that this both has the same
meaning:

<its:documentRule its:selector="//odf:ruby" its:rubyMap="."/>
<its:documentRule its:selector="//odf:rubyBase" its:rubyBaseMap="."/>
<its:documentRule its:selector="//odf:rubyText" its:rubyTextMap="."/>

The global rules attributes for ruby stay as they are, that is:
<its:documentRule its:rubyText="World Wide Web Consortium"
    its:selector="/body/img[1]/@alt"/>


Mapping to different realizations of ruby:

<span class="ruby">...
<its:documentRule selector="//span" rubyTextMap="@class='rt'"> (takes
the value of the span, not of the attribute)
<span class="rb">...
<span class="rt">...

<p rubytext="s.t.">...
<its:documentRule selector="//p/@rubytext" rubyTextMap=".">


---------------------------------------------
Terminology data category
---------------------------------------------

TBD. TODO.

-----------------------------------------------------
Visit from Paul Nelson and Markus Scherer (Microsoft)
-----------------------------------------------------


- feedback from Paul Nelson:
- dir is not necessary for existing formats. Sebastian: new format could
just pull in the tag. Paul, Markus: why not using HTML directly? felix:
we don't try to invente s.t. new, only cite excisting practice. Paul:
often people use global styleing . Sebastian: a mixed document with
arabic, English. It is not explicit that directionality should be taken
into account. Could use "dir" for that purpose, like "xml:id".

Paul: for translation, if you would translate pieces of a string, e.g.
"filename.jpg". You would have a pattern called "filepattern":

<its:documentRule its:selector="//scr/text()[match(.,*.jpg)]"
its:translate="yes"/>

imagefile.jpg . Paul: people have to be prevented from translating ".jpg".

<entry its:translate="yes">filename<its:span
its:translate="no">.jpg</its:span></entry>

Paul: aiming at documents? Felix: textual documents, software related
documents. Paul: example RSS feed needs some regular expression to
figure out what is being translated. Felix: not specify the spec to one
version of XPath, but say "XPath and its successor".

Markus: if you map the selectors back to css? So that people see how it
works? non normative? Markus: yes, so that people just see how it works.
Sebastian: is it possible to replace xpath with css? Markus: you should
give input to us (CSS working group); it might be difficult to target
every attribute. xpath is good for tools, css is good if you just want
to translate a given page.
So give an example how a CSS stylesheet would look like that has the
functionality you want to achieve (*not* creating additional
documentRule elemens in CSS).

two user scenarios: having ad hoc localizability information for a web
page (with CSS), versus having information for a tool (with XPath).

Felix: was does MS about this topic?

Paul: you have html like dialogues, where everything is parsed on an ID
basis. Then external file which processes the expression. Where's a lot
of software which does it already. Sebastian: So real life is "give
absolute IDs"? Paul: yes, an author does not track that. Markus: so
complex selector mechanisms are not necessary? Paul: what I see what
other vendors are doing, yes. Felix: ITS is for the engineers who have
to adopt a great variety of formats with no (Id or other) localizability
information. Paul: yes, for that user scenario ITS seems to be quite
useful. And there is a large open source effort to have a standard
localization process, e.g. in development countries.


----------------------
Test suite discussion
----------------------

- Sebastian's implementation of the proposal for mapping (see above).

    <documentRules xmlns="http://www.w3.org/2005/11/its">
      <ns its:prefix="t" its:uri="http://www.tei-c.org/ns/1.0"/>
      <documentRule its:translate="yes"
		    its:selector="//t:body/t:p[1]/*"/>

      <documentRule its:translate="no"
		    its:selector="//t:body/t:p"/>
      <documentRule
	  its:selector="//tei:p[starts-with(@rend,'translate(')]"
	
its:translateMap="substring-before(substring-after(@rend,'translate('),')')"/>
    </documentRules>

  </teiHeader>
  <text>
    <body>
      <p rend="normal">Hello <hi>world</hi></p>
      <p rend="special">Goodbye</p>

      <p its:translate="yes">translate me</p>
      <p>Don't translate me</p>
      <p rend="translate(yes)">I want to be be translated</p>

Sebastian: we need to specify the precedence between inherited and
mapping. Example: precedence between its:translate="yes" and
rend="translate(yes)". Need to say that if something is mapped, it then
has also to be processed like an attribute with that semantics.


************
proposal-07: add to the precedence rules a rule for "Selections
inherited from other local usages of ITS markup"
************

- Felix implementation (without mapping yet):

Question: Where to put precedence of inherited local information:

   1.

      Implicit local selection in instance documents (data category
attributes on a specific element)
   2.

      Local selections in instance documents (using a documentRules element)
   3.

      Global selections in an external file (using a documentRules element)
   4.

      Global selections in a schema, expressed with a documentRules element
   5.

      Selections expressed with schemaRule (See also the note in Section
5.1.2: Global, Rule-based Selection)
   6.

> Selections inherited from other local usages of ITS markup

      Selections via defaults for data categories, see Section 6.1:
Position and Default Selections of Data Categories

Sebastian: implementation is very expensive, since all xpath expressions
have to be generated again and again. Comparing two node sets and see if
there is some overlap is the best way of doing it.

Sebastian: Error in my implementation: if you have a rule saying s.t.
about translatability and directionality: That would produce 2 templates

Sebastian: Why this output?
Felix: to make basic conformance clear
Sebastian: my implementation: could have differnet modes, one for each
data category . Felix: that is not as expensive as my way, since you
only have to go trough the doc one time.

----------------
Test Files
---------------
Sebastian: other data categories in your implementation?
Felix: only dir and translate.
Sebastian: Felix's approach is easier to check.
Felix: it check's whether information comes from local, global or
inheritance.
.. do we need to check whether information comes from?
Sebastian: no.
.. XLIFF from Yves is  possibly the most easiest example.
.. if we publish the implementations, we need to prove that they do the
same thing
.. we just discovered that inheritance is right in each of our
implementations
.. what to do after last call?
Felix: we go to canidate recommendation
.. and then we have implementations and tests, we can reach proposed
recomendation

Sebastian: why make mapping explicit?
.. it does not give new power?
Felix: mapping gives you a possibility to make semantics clear, useful
e.g. for editing ruby from different sources
Sebastian: you can specify "meaning" of markup, without the need to
extract it.
.. it is also necessary to distingush mapping existing "locInfoContent"
(in Yves old terminology) versus adding locInfo to a node.
.. we made a conclusion to allow it everywhere.
Felix: how about child elements versus attributes?
Sebastian: allow to reinforce the structure in all schema languages
.. e.g. the new restriction, that a documentRule can have only one data
category, easier to check with a rule.

---------------------------
conference call
---------------------------
http://www.w3.org/Guide/1998/08/teleconference-calendar.html#s_2031

- action items

christian: to make bug on relation between markup and data category
.. done
felix: bring xml:lang question to the core group
.. made a request, this is done,
felix: close bug ..., write a mail to Francois
.. the same for other bugs
yves: bug 2890:
.. not open, so closed somehow
.. that is the terminology thing
felix: done
yves: action 2808, felix to write s.t. and close the comment
felix: done
yves: tag set editor's to integrate discussion results into spec
.. the changes we discussed ...
action: Yves to enter bug on section three
felix: how about giving the last word on word smithing to english speakers?
christian: would be great

- next f2f:
Richard: 30 / 31 of may I have a f2f:

Yves: the f2f without last call?
Felix: general way of working is more important
Yves: how about doing this in England?
week of the 5th of June would be fine?
Richard: fine the whole week
Yves: so let's find this week tentivale

Yves: would it make sense to have a f2f before Mai?
Richard: what is the purpose of this f2f?
.. it is o.k. to finalize the draft
.. or to discuss the last call

Yves: how about April?
Sebastian: first week of April is holiday
Felix: how about 18, 19, 20
Richard: I maybe in Shangai
Yves: fine for me
Richard: It might take some time until I know
Sebastian: we could have the meeting at his office


************
proposal-08: add xml:lang as a data category, to be able to map various
language attriutes to xml:lang
************

Richard: do we need an its:lang attribute to say "this is the same as
xml:lang"?
Sebastian: it is easier to provide "its:langMap"
.. to give the bridge
.. it is like ruby, where we don't give new markup, but map to existing
markup
Richard: e.g. for translation tools, it would be useful
.. this is an very extensible mechanism
Felix: we should do things also thinking of time
Richard: but let's have xml:lang
.. by just refering to the spec.


<its:langRule its:selector="//*" its:langMap="@myLangAttribute"/> says:
"@myLangAttribute" has the meaning of xml:lang.
Sebastian: this asserts that people use the same values as xml:lang.


------------------------
Structure of the working draft
------------------------

Felix: How about just adding a new subsection on "adding (or s.t. else?)
versus mapping"
Richard: basic content of that section would be:
Felix: describing the difference between the two methodologies, and
giving non-normative examples

Richard: for me it is two types say:

************
proposal-09: Give up the schemaRule element
************

Proposal from Felix: the schemaRule element gives no new functionality
compared to global rules and is in some cases not possible to process
(see RELAX NG example below).

<element="p">
<its:documentRule> instead of schemaRule
<its:schemaRule=".."
..</element>

"//p"

p in footnote != p in div
"//footnote/p" "//div/p"


element div = p1?, p2

p1 = element p { attribute id?, text }[translate="yes"]
p2 = element p { text }[translate="no"]
Received on Wednesday, 1 March 2006 11:25:33 UTC