W3C home > Mailing lists > Public > public-multilingualweb-lt@w3.org > May 2013

RE: Tikal

From: Yves Savourel <ysavourel@enlaso.com>
Date: Wed, 1 May 2013 11:09:01 -0600
To: "'Philip'" <Philip.Oduffy@ul.ie>
CC: <public-multilingualweb-lt@w3.org>
Message-ID: <002301ce468e$940503a0$bc0f0ae0$@com>
Hi Phil,

> We've come across an issue with Tikal. Tikal converts apostrophes 
> into their html equivalent, (ie. ' = &#39; ) on merging a XLIFF 
> file with it's original html.

> The apostrophe is preserved during
> extraction and this only occurs during a merge.
> ...
> Is there anyway to make Tikal produce the same file if you extract 
> it and then merge it without alterations.

Apostrophes are not "preserved", they are extracted as literal: The XML/HTML document is parsed and any apostrophe (escaped or not) is extracted as a literal apostrophe.

During the merge several things may change because there is no way for the filter to know what was the original form of the apostrophe (e.g. was it escaped or not). So we have to pick one form.

You can tweak this to some degree using some options. See for example http://www.opentag.com/okapi/wiki/index.php?title=XML_Filter#escapeQuotes. But it's not possible to guarantee a merged document that will be always exactly the same as the original one because several aspects cannot be known by the filter (whether characters were in escaped form or not, whether attribute values were double-quoted or single-quoted, etc.

If you want to compare two HTML/XML files with some text-based comparison tool (i.e. unaware of XML/HTML syntax), one safe solution is to run something like Tidy on both documents before doing the compare.

Hope this helps,
-yves
Received on Wednesday, 1 May 2013 17:09:33 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 9 June 2013 00:25:11 UTC