W3C home > Mailing lists > Public > public-multilingualweb-lt@w3.org > October 2012

[all] XML from drupal

From: Yves Savourel <ysavourel@enlaso.com>
Date: Wed, 24 Oct 2012 09:20:49 -0600
To: <public-multilingualweb-lt@w3.org>
Message-ID: <assp.064416e58c.assp.0644ae3180.001a01cdb1fb$287f2020$797d6060$@com>
Hi Mauricio, all,

Looking at the example here:
http://www.w3.org/International/multilingualweb/lt/wiki/LSP_Localization_Chain_Side_Use_Case_Demonstration#Step_3:_Postproduction_process

I have some feedback that is probably more for Moritz and his team, than for you Mauricio.

Since, hopefully, we will have an open API to communicate with Drupal at some point and anyone could get this documents, I'd like to offer a few notes from an average LSP viewpoint:


--- a) HTML inside CDATA.

That is an all-too-common problem that causes so many problems for LSPs. Many tools now can use sub-filters are deal with it but it's far from being a nice solution. I do understand this is how it's going to be. But I just wanted to point out that it is not an LSP-friendly output.

Moving on :)


--- b) Recursive <item> element:

What worries me more than the CDATA is the <item> element being used recursively:

<item id="11-body">
 <item id="11-body-0">
  <item id="11-body-0-value" its:allowedCharacters="."><![CDATA[blah]]></item>
 </item>
</item>

Quite a few tools will be able to work with "<item>CDATA</item>", but having the <item> element contain sometimes another <item> or sometimes CDATA is not going to be easily dealt with many tools.

Sure, maybe it's always the third <item> that is to be extracted, or maybe it's always the <item> with an id that ends with "value". But such jerry-rigging really looks bad for something created in a project like Web-LT :) 

Having two distinct elements: one for the structure, and one for the content would be a lot cleaner. Or at least have some kind of attribute that allows to distinguish between the two types of content.


--- c) The **es_es** codes

I've noticed those **es_es** markers sprinkled apparently randomly throughout the content.
I'm assuming there are just the "translation simulation marks" the comment is talking about. Right?


--- d) The title

The node title has the ITS attribute its:allowedCharacters="[a-zA-Z0-9'&quot; ]" is that means we can't use accented characters, dashes, even +, $, % signs, etc? That's a bit strange.


Cheers,
-yves
Received on Wednesday, 24 October 2012 15:21:28 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:31:56 UTC