ITS Storage Restriction Data category mapping in XLIFF 2.1

Hi All,

While working on the XLIFF standards text for the ITS Storage Restriction Profile I realized a likely uncommon but not impossible situation. ITS allows the encoding and line ending style to vary between restricted spans or even within nested spans. The XLIFF SLR module can support this without change but the mechanism is slightly complicated using internal references. Here is the design I'm currently writing into the standard:

ITS storage restrictions also cover native code which may or may not have been extracted to the XLIFF document. In order to not place additional restrictions on how extractors supporting ITS should perform general content extraction I have added a mechanism to allow the extractor to include or omit structural native codes as it sees fit while still fully enabling ITS Storage Restrictions.

==Implementation==
* A new profile for the SLR module with the name "itsm:storage"
* A new empty element <itsm:storageConfig> that MAY be used in the <slr:profiles> element  and MAY be used in the <slr:data> element to configure the options of the "itsm:storage" profile. If not used the defaults of UTF-8 + LF is used.
It has three attributes:
     * Required: "storageEncoding" to select the desired encoding as allowed by ITS just like in ITS
     * Required: "lineBreakType" to select the desired line break type as allowed by ITS
     * Optional: "ref" to reference a sub section of the content not using the document level configuration. This attribute is only meaningful when <itsm:storageConfig> is used as a child of <slr:data>, it MUST be ignored if used on a child of <slr:profiles> .
* The existing <slr:normalization> element MAY be used and storage normalization is selected as for standard XLIFF profiles.
* The existing "slr:equivStorage" attribute MAY be used on inline codes to represent their contribution to storage size.
* The existing "slr: storageRestriction" attribute MUST be used to express the maximum size. It MUST contain a single integer representing the maximum number of bytes.
* A new element <itsm:structureSize> that MAY be used multiple times in the <slr:data> element to represent the size contribution by structural parts of the native document not present in the XLIFF document. The size is integral in bytes and present as the content of the element. If containing restricted spans exist with different encodings or line break styles this element must be repeated for each encoding+lineBreakType used.
It has two attributes:
     * Optional: "storageEncoding" to select the encoding yielding the indicated size. If omitted the size applies to the document default encoding.
     * Optional "lineBreakType" to select the line break type yielding the indicated size. If omitted the size applies to the document default line break type.


==Processing==
The storage size is calculated as described in the ITS specification.
  Plus adding the sum of all "slr:equivStorage" attributes on inline codes, if "slr:equivStorage" is not defined on an inline code the size of the native code stored in <originalData> is used. If neither is present an inline code is assumed to consume no storage space.
  Plus adding the sum of <itsm:structureSize> (for the encoding+lineBreakType being used) for this span of all structural elements it covers (<file>,<group>,<unit>).

This gives the same result as in ITS if either "slr:equivStorage" is used or the full native code is placed in <originalData>.

A transcoding error from Unicode to a non-Unicode encoding constitutes a validation failure.

==Non normative notes/hints==
* A non-normative recommendation to use NFC normalization is included to match the extended implementation hint in ITS.
* Not using a Unicode encoding will only work in limited cases as translations regularly need a different legacy code page than the source.
* If nested spans use different encodings and native codes yield different numbers of bytes when encoded using the different encodings the raw native code SHOULD be placed in <originalData> to give accurate results.
* An extractor MAY omit structural size and instead reduce the size of a restricted span during extraction if that is possible.
* If an extractor can place all native code as inline codes and define them in <originalData>: structureSize and equivStorage need not be used.

This design should support all possible scenarios while leaving the simple cases reasonably simple.

Regards,
Fredrik Estreen

Received on Tuesday, 21 April 2015 14:43:08 UTC