W3C home > Mailing lists > Public > public-i18n-its@w3.org > April to June 2007

Comments on TMX 2 Proposal from the ITS Working Group

From: Yves Savourel <ysavourel@translate.com>
Date: Fri, 1 Jun 2007 14:47:22 -0600
To: <tmx_imp@lists.lisa-open.org>
Cc: <public-i18n-its@w3.org>
Message-ID: <00d301c7a48e$0dc58bf0$9b05a8c0@BREIZH>

Dear OSCAR committee members,

Please, find below several comments that the W3C Internationalization tag set Working Group would like to make regarding the TMX 2.0
Proposal.

We worked from the document:
TMX 2.0 Specification Draft
OSCAR Working Draft - March 28, 2007
http://www.lisa.org/standards/tmx/tmx2/



=== 1) Integrating ITS in TMX

While we note that foreign attributes can be assigned to all elements of the TMX 2.0 proposal, we feel that it would help the
implementers to list the attributes related to internationalization and there possible locations, just like xml:lang and xml:spaces
are listed. This is especially important since the section 1.1 states:

"Applications that depend on TMX format for exchanging Translation Memory data are not required to understand and support non-TMX
elements or attributes. A TMX application can safely ignore foreign elements or attributes present in a TMX document."

We think the its:dir attribute should be added to the <prop>, <note>, <tuv>, <g>, and <hi> elements.



=== 2) Replacing <prop> element by foreign attribute

In the description of the <prop> element, the 2.0 proposal states that a foreign attribute should be used instead of <prop>.

This may not be always possible: The content of the <prop> element could be text and requires language-related metadata (i.e.
xml:lang, or its:dir). Applying such metadata cannot be done if the text is moved to an attribute.

We would recommend to either keep <prop> or allow it to be replaced by either a foreign attribute or a foreign element.



=== 3) ITS Rules for TMX

We would recommend LISA to provide an ITS rules file along with the TMX specification. This rules file would specify the different
internationalization aspects of the TMX document and allow generic ITS-aware tools to be able to process it. For example, a
spell-checker could be used to verify the text of the different translation units.

* What elements or conditional constructs do not contain text.

* What elements are "within text" or "nested".

* If there are localization-type notes associated to some elements

* Etc.

Here are some examples of ITS rules pertaining to TMX:

<its:translateRule selector="tmx:tmx" translate="no"/>

<its:translateRule selector="//tmx:seg" translate="yes"/>

<its:translateRule selector="//tmx:bpt|//tmx:ept|//tmx:ph" translate="no"/>

<its:translateRule selector="//tmx:sub" translate="yes"/>

<its:withinTextRule selector="//tmx:g|//tmx:x|//tmx:bpt//tmx:ept//tmx:hi//tmx:ph" withinText="yes"/>

<its:withinTextRule selector="//tmx:sub" withinText="nested"/>

<its:locNoteRule selector="//tmx:seg" locNotePointer="../tmx:note"/>



=== 4) RFC4646

Maybe the RFC4646 should be replaced with a reference to BCP47. Something to check with the RFC4646 authors.



=== 5) "3rd party standard"

The section 1.1 reads "...third party standards for date/time and language codes". We are not sure talking about for example BCP47
as "third party standard" is quite correct.



=== 6) XML-compliant/compliance

There is a section on "conformance" in the XML specification, but not on "compliance". Section 1.1 is titled "XML Compliance".



=== 7) Case sensitive (minor)

Section 1.1 has the sentence "Since XML syntax is case sensitive, any XML application must define casing conventions." 

We do not think the second part of the statement is useful and would recommend to remove it.



=== 8) Rewording (minor).

"...it may be necessary to extend TMX vocabulary using XML Namespaces."

Could possibly reworded:

"Additional extensibility can be implemented by means of the mechanisms defined in XML Namespaces."



=== 9) Namespace URI

"The namespace for TMX 2.0 is defined as "http://www.lisa.org/tmx20"

We recommend to change it to:

"The namespace URI for TMX 2.0 is defined as "http://www.lisa.org/tmx20"?



=== 10) Encoding

The section 1.2 seems to indicate that TMX files cannot be encoded in anything but UTF-16, UTF-8 or ISO-646.

As TMX is an XML document, we feel there are no reason to limit the encoding of TMX to these three encodings. We would recommend to
simply allow normal XML encoding system to be applied, with possibly a mention that UTF-8 is a good choice in general.



=== 11) Extensions validation

Section 1.3 reads: "All foreign elements and attributes added to a TMX file must be defined using an XML Schema. All XML Schemas
declared in a TMX document must be made available to permit validation of the foreign constructs included in the file."

We doubt that his could be achieved in all cases.



=== 12) HREF (minor)

The href attribute definition reads: "The 'href' attribute contains a valid URL that describes the location of a file."

Maybe the same definition as the one in HTML could be used? ("URI for linked resource.")



=== 13) Backward Compatibility

The section 61. discuss backward compatibility and states "TMX 2.0 was designed to be compatible with TMX 1.4b".

If there are things to change in a 1.4 TMX to make it readable by a 2.0 Reader there is no backward compatibility.



=== 14) Errata and Feedback

It would be nice if provisions for an errata page would be given and if the feedback would be directed to a public archived list
like tmx_imp@lists.lisa-open.org.



Best regards,
-the ITS WG
Received on Friday, 1 June 2007 20:46:52 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:12:49 GMT