Re: On conformance from Felix Sasaki on 2006-02-17 (public-i18n-its@w3.org from January to March 2006)

From: Felix Sasaki <fsasaki@w3.org>
Date: Fri, 17 Feb 2006 22:01:13 +0900
To: "Lieske, Christian" <christian.lieske@sap.com>
Cc: public-i18n-its@w3.org
Message-ID: <43F5C919.6060001@w3.org>
Hi Christian,

Many thanks for this. I will comment below.

I would like to propose two "products of the ITS tagset":
1) the ITS schema declarations
2) an ITS processor

- audience of 1): schema designers and specification developers (e.g.
within W3C). Purpose: provide a schema module for their "host schema" (a
term we should define). The schema module is a set of elements and
attributes for internationalization and localization purposes. Example
of host schemas: DocBook, DITA, XML Spec, HTML, SVG, MathMl, Schema for
XML Schema / for RELAX NG, ...

- audience of 2): content authors, localization tool developers.
Purpose: to be able to process ITS markup (which is based on the schema
defined as product 1), but applicable *without* the schema) in specific
positions (schema document, global, XML instance local) to select
information in XML documents.

- conformance to 1): a host schema is conform if it contains all ITS
declarations (I'm not here specific about the places in the schema).
Current implementations: TEI and XML Spec.

- conformance to 2): an ITS processor is conform if it processes ITS
markup, that is: the processor must be able to "identify" nodes in XML
instance documents, to which ITS markup is related to. Current
implementations: Sebastian, Yves, me.

"identify" can lead to further processes, e.g.
i) visualizing identified nodes (Sebastian's implementation of 2)
ii) extracting text out of identified nodes (Yves implementation, I guess)
iii) extracting references to identified nodes (my implementation of 2)
...
- a combination of these, e.g. in an tool which highlights e.g.
translatable text (i), allows for creating XLIFF (ii), or extracts
references to nodes for further processing (iii), e.g. to be sent to a
web service.

a remark on "bidirectional text, ruby annotations, language/localization
information, switch for language alternatives, indication of glyph
variants" (all mentioned in our charter): The ITS tagset should not
define specific conformance for these, but should cite their specs (e.g.
the ruby TR for ruby, XHTML 2 for directionality, RFC 3066bis for
language identification, ...). Showing people which technology they
should rely on, would fulfill our goal (see charter) to help people
avoiding "reinvent the wheel".

I see three variants of 2):
2 a): an ITS processor is conform if it processes all selection
mechanisms and precedence definitions, including defaults for a given
data category (possibly, but not necessarily all data categories).
2 b): an ITS processor if conform if processes one or several types of
selection for one data categories (or possibly more).
2 c): an ITS processor is conform if it processes everything.

Both 2a) and 2b) would map directly to parts of the test suite matrix
Yves has already developed, in a complementary manner. 2c) would be the
complete test suite matrix.


A general comment:
As for our tag set deliverable, we are among others, chartered to
develop a set of elements and attributes that can be used with new
Schemas to support the internationalization and localization of
documents. That is: we are (not only, but also) chartered to produce 1)
above.
One of our audience mentioned in the charter are "Developers who create
formats based on XML". These are *not* tool developers for *processing*
ITS, which in my view are mentioned separately in the charter:
"localization industry, particularly where automated translation tool
technology is deployed."

If we don't have 1) as a separate product, we don't follow our charter,
and we loose an important audience: specification and schema developers.

The good thing about 1): as mentioned in the charter, 1) is "a set of
'ready-made' elements and attributes, that had been designed using state
of the art knowledge about internationalization and localization needs,
that developers could include in the format they are developing.". In
the next years, I will review a lot of drafts from SVG, MathMl, ... I
can easily convince the specification developers to claim conformance
against 1), if they create *normative modularizations*: ITS+MathML,
ITS+SVG, ... . This would be a benefit for implementors of 2), because
they could expect a wide range of schemas to encompass ITS declarations.
Of course, no SVG/MathML/DocBook etc. tool would "understand" the
semantics of the ITS markup, but they would allow the markup to exist.
Processing of the markup is of course done by the ITS processor.

A comment on OpenDocument below.

Lieske, Christian wrote:
> Hi there,
> 
> I found the following on
> http://www.oasis-open.org/committees/download.php/12572/OpenDocument-v1.
> 0-os.pdf
> 
> ---
> Appendix D.Core Features Sets (Non Normative)
> 
> The OpenDocument specification does not specify which elements and
> attributes conforming
> application must, should, or may support. 

from the view of ITS, I would describe "support" as "identifies" (see my
terminology above). In this way, we could go the same way as
OpenDocument, and only have to decide between 2 a) and 2 b). I guess the
general decision is 2a/b versus 2c).
Reading Yves proposal "s.t. is conformance if it implements successfully
at least one data category", sounds like 2 b).
Regarding Christian's proposal below to define levels or profiles: my
impression is that this is very difficult, due to the great variety of
audiences for ITS. As for CSS and XLS which you mentioned below, the
audiences are much more homogeneous: the question is "more or less",
whereas in ITS the question is: "this way or that way".

Cheers,

Felix


> The intention behind this is
> to ensure that the
> OpenDocument specification can be used by as many implementations as
> possible, even if
> these applications do not support some or many of the elements and
> attributes defined in this
> specification. Viewer applications for instance may not support all
> editing relates elements and
> attributes (like change tracking), other application may support only
> the content related elements
> and attributes, but none of the style related ones.
> Even typical office applications may only support a subset of the
> elements and attributes defined
> in this specification. They may for instance not support lists within
> text boxes or may not support
> some of the language related element and attributes. 
> ---
> 
> Best regards,
> Christian
> 
> -----Original Message-----
> From: Felix Sasaki [mailto:fsasaki@w3.org] 
> Sent: Mittwoch, 15. Februar 2006 14:06
> To: Lieske, Christian
> Cc: public-i18n-its@w3.org
> Subject: Re: On conformance
> 
> Hi Christian, all,
> 
> Lieske, Christian wrote:
>> Hello everyone,
>>
>> Please find my comments below (starting with "CL>").
>>
>> Best regards,
>> Christian 
>>
>> -----Original Message-----
>> From: public-i18n-its-request@w3.org
>> [mailto:public-i18n-its-request@w3.org] On Behalf Of Yves Savourel
>> Sent: Sonntag, 12. Februar 2006 06:51
>> To: public-i18n-its@w3.org
>> Subject: RE: On conformance
>>
>>
>> Hi Christian, Felix, and all,
>>
>>> So I think you should provide all tests which you 
>>> think which are necessary, not only the ones for 
>>> "terminology". This might be a very complicated task,
>>> *if* you assume a lot of conformance levels, and 
>>> even conformance specific conformance criteria to a 
>>> single data category.
>> Our data categories are quite divers: Ruby as little do to with
>> translatability for example. This means it probably make sense for
>> the applications that will implement ITS to provide support for only
>> some of the data categories.
>>
>> CL> Or only provide _limited_ support (cf. the discussion on
>> insitu/dislocated).
> 
> sorry, this has nothing to do with conformance, but the terminology here
> would now be local (in instance documents) versus global (with
> documentRules).
> 
>> For example a translation tool would implement the translatability
> data
>> and localization information categories but completely
>> ignore terminology.
>>
>> CL> I am not sure that all translation tools would do that.
>>
>>  Therefore I think we have to test the 6 data categories separately (I
>> think <its:span> is something different
>> and can be tested with along with all the in situ cases).
>>
>> >From the "rules location" viewpoint we have: in XML DTD, in XML
> Schema,
>> in RELAX NG, external dislocated, internal dislocated, and
>> in situ... 6 cases. In addition, I think it's important to also have
>> test cases for each data category where all the different
>> "rules locations" are combined. So 7 cases.
>>
>> This gives us the following matrix:
>> http://www.w3.org/International/its/tests/#Summary
>>
>> Which is ... 42 cases overall (although there maybe a few cases less
> as
>> not all types of rules location apply to all data
>> categories).
>>
>> I think it's important that we provide at least one standalone test
> case
>> for each of these combinations. It is quite a bit of work,
>> but it is probably the only way to ensure ITS is sound. 
>>
>> As far as "processors" *compliance*. I think we don't have to define a
>> level for each case. Maybe we can say that an application is
>> ITS compliant when it implement sucessfully at least one of the data
>> categories(?) and that it should state which one(s) with any
>> compliance claim.
>>
>> CL> I like Yves' approach of distinguishing between test cases and
>> conformance/compliance. From
>> CL> my point of view test cases can help with the following:
>> CL>
>> CL> 1. verify that the framework adequately addresses an issue
>> CL> 2. possibly help with the definition of conformance
>> CL> 3. testing conformance
>> CL>
>> CL> I think that the design of the test suite (that is the collection
> of
>> test cases instrumented with
>> CL> input, output, id etc.) which Yves has drafted is very promising.
>> CL>
>> CL> I am still not sure about the granularity of conformance we should
>> be aiming at. 
> 
> I agree that it would be possible to disconnect the discussion on tests
> from conformance. But IMO it would be good to relate the one to the
> other as close as possible. Yves proposal to say "s.t. is conformance if
> it implements successfully at least one data category" is a very direct
> relation between tests and conformance.
> 
> Possible pros and
>> CL> cons for a fine grained granularity could be the following:
>> CL>
>> CL> pro: may yield many conformant implementations since only a
> limited
>> number
>> CL> 	of features would have to be implemented and thus effort for
>> implementation might be low
> 
> with the proposal from Yves to say "s.t. is conformance if it implements
> successfully at least one data category", you would have the same
> effect, without having the need for fine grained conformance levels.
> 
>> CL> cons: may yield confusion amongst tool users/buyers since they
>> cannot easily know that a
>> CL>	conformant tool really fits their i18n/l10n requirements
>> CL>
>> CL> One approach to come up with a more coarse grained granularity of
>> course could
>> CL> start from clustering/partioning features, and basing conformance
> on
>> clusters. Example:
>> CL>
>> CL> Definition for Cluster A
>> CL>
>> CL>	 - data categories 'ruby' and 'directionality'
>> CL> 	 - only local rules
>> CL>
>> CL>  Conformance Clause
>> CL>    
>> CL>    - An implementation of this standard is profile-1 conformant if
>> it implements all
>> CL>      features defined in Cluster A
> 
> Could you make a suggestion what clusters you would suppose for ITS?
> 
>> CL>
>> CL> This seems to be an approach taken by other standards (they seem
> to
>> use terms like
>> CL> "level", or "profile"). CSS 1 from my understanding for example
> had
>> two clusters:
>> CL> core features and extended features (see
>> http://www.w3.org/TR/CSS1#css1-conformance).
>> CL> XSL-FO has three (called "basic", "extended" and "complete"; see
>> http://www.w3.org/TR/xsl/slice8.html#conform)
>> CL> It defines for each feature (objects and properties), whether a
>> conformance level
>> CL> requires its implementation or not (see
>> http://www.w3.org/TR/xsl/sliceB.html#FO-summary,
>> CL> http://www.w3.org/TR/xsl/sliceC.html#property-index).
> 
> looking at http://www.w3.org/TR/xsl/slice8.html#conform , I have the
> feeling the XSL conformance is very close to what is currently in the
> ITS draft. In XSL you have three levels basic, extended and complete,
> which subsume each other. In the current ITS draft, you have two levels,
> which subsume each other.
> 
>> CL>
>> CL> Following this line of thinking, we would need to decide on two
>> things with regard to conformance:
>> CL>
>> CL>	1. Do we go for several different types of conformance?
>> CL>   2. How do we possibly partition data categories, support for
>> selection mechanisms etc. to arrive at different types?
> 
> I would propose to go Yves path to say "s.t. is conformant if it
> implements just one data category. This would be one level of
> conformance, though.
> In addition (or rather 'below' that), I would propose to say s.t. about
> conformance of schemas, because we have to say s.t. to the audience
> "schema developers". In the ITS draft, currently schema conformance is
> mixed with data category conformance. I would propose to separate this.
> 
> - Felix
> 
>> We still have to decide if we want to allow processors that implement
>> only in-situ rules to be compilant or not. We need to decide
>> this soon.
>>
>> For the test cases, based on Felix and Christian's ideas, maybe we
> could
>> have something for each data category that look like this:
>>
>> 1. In schema
>> 	1.1 XML DTD
>> 	1.2 XML Schema
>> 	1.3 RELAX NG
>> 2. Dislocated
>> 	2.2 External to the document
>> 	2.3 Within the document
>> 3. In situ
>> 4. Combination of all cases
>>
>> For each of these lines we would have:
>>
>> - The description of the test. (With a reference to the clause in the
>> specification).
>>
>> At least one test set that would have:
>>
>> - An "Input files" entry with the list of all the input files
> required,
>> for example a source XML document and a document containing
>> dislocated rules.
>>
>> - An "Expected Result" entry with a document hand-made (or at least
>> hand-checked) that describes the expected output.
>>
>> - Zero, one or more result files generated from the various
>> implementations we will have. (and hopefully will will have at least
> one
>> example of for each case).
>>
>> See the translatability data category for an example:
>> http://www.w3.org/International/its/tests/#Trans_DislocatedExternal
>> (I'm missing still the clause references)
>>
>> It would probably be good to have several test sets in some cases, for
>> example; avec namespaces, without namespace, etc.
>>
>> In addition to decide if this is a good approach and how it can be
>> improved, we should also maybe make the general layout easier to
>> manipulate, for instance by having the Test Suite document broken down
>> in several files (one per data category) so several people
>> can work on different parts at the same time. Maybe the result
> document
>> should be integrated within the test suite document to make
>> it easier to look at, etc.
>>
>> For the test implementation we should try to make them generic enough
> so
>> they can be used regardless of the input files.
>>
>> ...I am sure you have plenty of ideas.
>>
>> Cheers,
>> -yves
>>
>>
>>
> 
> 
>
Received on Friday, 17 February 2006 13:01:34 UTC