RE: I18N-ISSUE-246: Clarify character encoding behavior when calculating storage size [ITS-20]

It's not a new normative statement if normative language isn't intended. I would avoid putting a normative statement into a "note". Generally, when I'm spec writing, I avoid the Magic Normative Words unless I mean them normatively. So in this case I read the proposed note text as meaning:

> In order to be able to evaluate a Storage Size constraint an application
> has to be able to encode...

Which is an example of "anti-normative" writing:

MAY -> can
SHOULD -> ought
SHOULD NOT -> ought not, avoid
MUST -> has to
MUST NOT -> can't, don't
RECOMMENDED -> really good idea

Addison

> -----Original Message-----
> From: Arle Lommel [mailto:arle.lommel@dfki.de]
> Sent: Tuesday, March 19, 2013 8:22 AM
> To: Stephan Walter
> Cc: Norbert Lindenberg; Felix Sasaki; Yves Savourel; public-multilingualweb-lt-
> comments@w3.org; 'www-international'
> Subject: Re: I18N-ISSUE-246: Clarify character encoding behavior when
> calculating storage size [ITS-20]
> 
> Stephan,
> 
> I think this sound good. However, as it adds a MUST statement, will it impact us
> because it could be seen as a new normative statement? (I think it is rather a
> clarification of intent, but just want to check on it.)
> 
> -Arle
> 
> On 2013 Mar 19, at 06:41 , Stephan Walter <stephan.walter@cocomore.com>
> wrote:
> 
> > Hello,
> >
> > coming back to  the issue of error handling when storage size is processed.
> Would you agree to adding the following note to the definition of the data
> category as a resolution:
> >
> > NOTE:  In order to be able to evaluate a Storage Size constraint an application
> must be able to encode the content of the selected nodes in the specified
> character encoding. An application that evaluates Storage Size but does not
> support the specified character encoding must report this as an error. If the
> selected nodes contain characters that the specified character encoding cannot
> represent, the processor must also report this as an error. The application
> evaluating the Storage Size constraint is not necessarily the ITS processor itself.
> The constraint may rather be evaluated by applications consuming the ITS
> encoded data in later steps. In such cases the above requirement pertains to
> those ITS consuming applications.
> >
> > Best regards
> > Stephan
> >
> > -----Ursprüngliche Nachricht-----
> > Von: Norbert Lindenberg [mailto:w3@norbertlindenberg.com]
> > Gesendet: Donnerstag, 28. Februar 2013 08:26
> > An: Felix Sasaki
> > Cc: Norbert Lindenberg; Yves Savourel; public-multilingualweb-lt-
> comments@w3.org; 'www-international'
> > Betreff: Re: I18N-ISSUE-246: Clarify character encoding behavior when
> > calculating storage size [ITS-20]
> >
> >
> > On Feb 27, 2013, at 14:55 , Felix Sasaki wrote:
> >
> >> Hi Yves, Norbert, all,
> >>
> >> Am 27.02.13 13:50, schrieb Yves Savourel:
> >>> Hi Norbert,
> >>>
> >>>>> Note also that we have no way to check conformance of the
> >>>>> applications using the ITS data for such mandatory support: ITS
> >>>>> processors just pass the data along, they don't act on them (in
> >>>>> the case of this data category).
> >>>> So who does actually act if a string is too long to fit into the
> >>>> specified storage?
> >>> There is certainly the case of applications that do process ITS markup and
> apply it to the content directly: For example a JavaScript in an HTML5 page. But
> there are also applications that use an ITS processor to feed the content and
> the ITS information to a distinct system where the information is then applied.
> They correspond, for example, to the "Localization Workflow Managers"
> described in the "potential users of ITS"[1].
> >>>
> >>> So I think it's important to make the distinction between the 'ITS processor'
> which act on the markup, and the 'consumer of ITS information' (for lack of a
> better name) that applies the ITS information. Both can be the same
> application, but they may also be separate ones.
> >>>
> >>> This means a storage-size constraint can be applied completely
> >>> outside the original XML/HTML5 document with tools that have no
> >>> relations with the ITS processor itself, or with XML/HTML5 for that
> >>> matter. Examples of such applications are localization quality
> >>> checking tools (like CheckMate, XBench, QA-Distiller, etc.)
> >>>
> >>> This is why, from my viewpoint, requiring the 'consumer of ITS information'
> to support UTF-8 is not important. And I was looking at the case for consumers
> that don't have a need for UTF-8, and whether we should really foist on them
> such a requirement.
> 
> >>>
> >>> To answer your question "Do you really want to let systems that can
> represent less than 1% of Unicode advertise themselves as ITS 2.0
> conformant?": Why not? If the context where they are utilized is using only 1%
> of Unicode, why should they be forced to support more? I see many customers
> that never work outside of Latin-1.
> >>>
> >>> This said, supporting UTF-8 is very easy nowadays and promoting its
> support is a good thing too. So in the interest of moving forward and of
> promoting better internationalization, I see no problem requiring the consumer
> of storage-size to support UTF-8.
> >>>
> >>> The only thing that bother me a little is that such conformance as well as
> the parts about handling errors, apply to the consumer of the ITS information,
> not really the ITS processor, and I'm not sure the scope of our tests can cover
> that.
> >>>
> >>>
> >>> With regards to the error handling:
> >>>
> >>>> It could be as simple as "If an ITS processor doesn't support the
> >>>> specified character encoding, it must report this as an error and
> >>>> terminate processing. If the selected nodes contain characters that
> >>>> the specified character encoding cannot represent, the processor
> >>>> must report this as an error and terminate processing." Or you
> >>>> could try and be nice in the second case and specify a fallback
> >>>> strategy, e.g., by saying that the first replacement character
> >>>> among U+FFFD,
> >>>> U+003F,
> >>>> U+FF1F that can be represented in the specified character encoding
> >>>> must be used instead of any character that can't.
> >>> I would favor a more practical behavior:
> >>>
> >>> "If the application applying the information doesn't support the specified
> character encoding, it must report this as an error.
> >>
> >> One important aspect of above sentence is that - as Yves pointed out - the
> "must" would be a lower case "must". That is, this will be no testable
> assertation of the ITS 2.0 specification, even if the spec says "the consumer
> must support UTF-8". In that sense, we might even put that requirement into a
> note, to make clear that from the ITS 2.0 point of view this is rather guidance
> than a normative statement. Would that work for you too, Norbert?
> >
> > While it's much better if assertions can be and are tested, keep in mind that a
> test suite generally can't prove that a system fully conforms to a spec - it can
> only show in some cases that it doesn't. And even if this requirement isn't
> testable by software, there's still the test of looking into the developer's eyes
> and asking "does your system support UTF-8?". Notes are not requirements, so
> turning this into a note would remove the basis for asking the question.
> >
> > Norbert
> >
> >
> >
> 

Received on Tuesday, 19 March 2013 15:49:36 UTC