localization properties (was: RE: Other possible areas to cover under ITS)

[I just renamed this thread to "localization properties".
I started writing the mail a week or so ago, I hope it's
still useful.]

At 05:51 05/03/02, Yves Savourel wrote:
 >
 >> Can we say then that ITS does not try to cover the localisation
 >> properties case (separate file, with information that can be
 >> applied uniformly across the entire XML instance file)? And focus
 >> only on l10n directives (embedded information in XML instance file)?
 >
 >Yes. As Martin mentionned, that is specified in the charter.

I very much agree, except for the word 'case' at the end of Masaki's
first sentence. The WG is definitely not chartered to work on
localization properties. But this should not mean that the WG
should not try to, to some extent, cover the "localization property
*case*". I could for example immagine something like the following
being included in the guidelines we produce: "If you design
(your application specific) markup, try to do it so that e.g.
different elements are used for translatable and non-translatable
text, if you can make this match reasonably well with your application
needs."

In other words, we should give whatever advice we can to schema
designers so that the resulting schema is well internationalized
and documents can easily be localized. In some cases, that may
be because the application-specific markup covers these needs,
and this application-specific markup can be picked up by
localization properties. In other cases, it will be because
the schema uses the tags and attributes we provide, and these
can be picked up directly by a tool (we can immagine that
the tool will have built-in localization properties for what
we define).


Masaki also characterizes the "localization property case" as
a "separate file, with information that can be applied uniformly
across the entire XML instance file". Localization properties
are definitely separate from the primary document. But I'm not
totally sure about "uniformly across the entire instance".

First, it should usually be a set of instances using the same schema.
If one has to redesign/rewrite localization properties for each instance,
then that might be an indication that our guidelines have failed.
(of course there may be cases where there are two or more
different localization property sets applied to the same schema;
an example I can immagine, based on my limited experience in
the localization industry, would be that for a document containing
UI text and explanatory text (not in itself necessarily a good
idea), most of the text is translated into some major languages,
whereas only the UI part is translated into some other languages.

The other point is that localization properties may be designed
so that they are more selective (e.g. using something like XPath
or CSS selectors) and apply less uniformly to a document. Simple
examples might be that the contents of certain elements is translated
in the body of the document, but not in the header.


 >However, we probably have to think about schemas (like XSD). One can imagine to
 >have our tag set used there (for translating the doumentation inside the schema
 >for example).

That is a very good use case example, but should not be a special case,
because it's text that may need to be translated like any other text.


 >But a schema seems also somewhat of a logical place to specify
 >"properties" associated with elements and attributes.

That's a very interesting idea. In general, I would say that how to
embedd localization properties into a schema should be worked on
by whatever group takes care of localization properties. But it's
definitely an area where we should make sure we are well coordinated.


 >It's actually already the
 >case: For example, Martin mentioned a mechanism to define what characters are
 >allowed in a given element, there is also existing ways to set length
 >limitation, etc. I'm not sure yet how much we should worry about that aspect of
 >"embedded properties".

I don't think I have thought this specific example through.
The following criteria come to mind for me:
1) static vs. dynamic: some requirements or issues can be seen or
    defined in a static way. Length would be such an exmple. I can
    just say that the resulting field should not be longer than some
    value. A tool can try to pick that up in any way it wants. But
    it can also be easily tested before or after localization, independent
    of the actual localization that went on.
    Other issues are more dynamic, e.g. whether something gets
    translated or not. There are probably ways to check the result
    automatically, but that's getting into more heuristics that
    we may want to.
2) schema vs. (document) instance vs. individual document or item,...
3) generic technology vs. specific technology: If something is
    already in XML Schema, or we think that's the place it belongs,
    because it's much more general than internationalization/localization,
    then we should reuse it, or work with the XML Schema WG to get
    it in there.


 >Maybe you could concentrate first to identify the different issues, regardless
 >whether their solutions will need to be specified inside the document instance
 >or outside as a general property (or both). It's probably reasonable to assume
 >that only few, if any, need to be solve only as a general property.

Yes. And even for those where they are always solved via properties,
we may want to give advice to DTD designers.

BTW, I was planning to use this mail also to say how unhappy I was with
the term "localization properties". But somehow I made the connection
with CSS properties, and the term now makes sense to me. Maybe that
parallel with CSS properties was always obvious for some of you, but
I think it would be good to call it out for those who don't get it
that quickly (like me).


Regards,    Martin. 

Received on Sunday, 6 March 2005 16:39:33 UTC