RE: [all] XML from drupal

Hi Yves.

 

I answer to the points wehre I can clarify something:

 

--- a) HTML inside CDATA.

 

That is an all-too-common problem that causes so many problems for LSPs. Many tools now can use sub-filters are deal with it but it's far from being a nice solution. I do understand this is how it's going to be. But I just wanted to point out that it is not an LSP-friendly output.

 

[Mauricio] We use CDATA in the preprocessed files that we import in the CAT tool and we have filters that block/unblock and segment correctly this content.

 

--- c) The **es_es** codes

 

I've noticed those **es_es** markers sprinkled apparently randomly throughout the content.

I'm assuming there are just the "translation simulation marks" the comment is talking about. Right?

 

[Mauricio] Yes, the idea is to illustrate wich content woul be translated. Inside the <span translate="no">text</span> there is no marks.

 

--- d) The title

 

The node title has the ITS attribute its:allowedCharacters="[a-zA-Z0-9'&quot; ]" is that means we can't use accented characters, dashes, even +, $, % signs, etc? That's a bit strange.

 

[Mauricio] I’ve updated the example file, it was outdated. We (Karl and I) have been discussing about this and the regular expression that we are using now for the title is [^<>].

What we really want to achieve is that no HTML tags are allowed in this node.

I plan to talk about this issue in Lyon. In the use case high level summary I address it:

http://www.w3.org/International/multilingualweb/lt/wiki/Use_cases_-_high_level_summary#More_Information_and_Implementation_Status.2FIssues_6

 

Thanks for your comments.

__________________________________

Mauricio del Olmo Martínez

Dpto. Técnico/I+D+i

Linguaserve Internacionalización de Servicios, S.A.

Tel.: +34 91 761 64 60 ext. 0421
Fax: +34 91 542 89 28 

E-mail:  <mailto:tecnico@linguaserve.com> tecnico@linguaserve.com

www.linguaserve.com <http://www.linguaserve.com/> 

 

«En cumplimiento con lo previsto con los artículos 21 y 22 de la Ley 34/2002, de 11 de julio, de Servicios de la Sociedad de Información y Comercio Electrónico, le informamos que procederemos al archivo y tratamiento de sus datos exclusivamente con fines de promoción de los productos y servicios ofrecidos por LINGUASERVE INTERNACIONALIZACIÓN DE SERVICIOS, S.A. En caso de que Vdes. no deseen que procedamos al archivo y tratamiento de los datos proporcionados, o no deseen recibir comunicaciones comerciales sobre los productos y servicios ofrecidos, comuníquenoslo a clients@linguaserve.com, y su petición será inmediatamente cumplida.»

 

"According to the provisions set forth in articles 21 and 22 of Law 34/2002 of July 11 regarding Information Society and eCommerce Services, we will store and use your personal data with the sole purpose of marketing the products and services offered by LINGUASERVE INTERNACIONALIZACIÓN DE SERVICIOS, S.A. If you do not wish your personal data to be stored and handled, or you do not wish to receive further information regarding products and services offered by our company, please e-mail us to clients@linguaserve.com. Your request will be processed immediately.”

__________________________________

 

 

-----Mensaje original-----
De: Yves Savourel [mailto:ysavourel@enlaso.com] 
Enviado el: miércoles, 24 de octubre de 2012 17:21
Para: public-multilingualweb-lt@w3.org
Asunto: [all] XML from drupal

 

Hi Mauricio, all,

 

Looking at the example here:

 <http://www.w3.org/International/multilingualweb/lt/wiki/LSP_Localization_Chain_Side_Use_Case_Demonstration#Step_3:_Postproduction_process> http://www.w3.org/International/multilingualweb/lt/wiki/LSP_Localization_Chain_Side_Use_Case_Demonstration#Step_3:_Postproduction_process

 

I have some feedback that is probably more for Moritz and his team, than for you Mauricio.

 

Since, hopefully, we will have an open API to communicate with Drupal at some point and anyone could get this documents, I'd like to offer a few notes from an average LSP viewpoint:

 

 

--- a) HTML inside CDATA.

 

That is an all-too-common problem that causes so many problems for LSPs. Many tools now can use sub-filters are deal with it but it's far from being a nice solution. I do understand this is how it's going to be. But I just wanted to point out that it is not an LSP-friendly output.

 

Moving on :)

 

 

--- b) Recursive <item> element:

 

What worries me more than the CDATA is the <item> element being used recursively:

 

<item id="11-body">

<item id="11-body-0">

  <item id="11-body-0-value" its:allowedCharacters="."><![CDATA[blah]]></item>

</item>

</item>

 

Quite a few tools will be able to work with "<item>CDATA</item>", but having the <item> element contain sometimes another <item> or sometimes CDATA is not going to be easily dealt with many tools.

 

Sure, maybe it's always the third <item> that is to be extracted, or maybe it's always the <item> with an id that ends with "value". But such jerry-rigging really looks bad for something created in a project like Web-LT :) 

 

Having two distinct elements: one for the structure, and one for the content would be a lot cleaner. Or at least have some kind of attribute that allows to distinguish between the two types of content.

 

 

--- c) The **es_es** codes

 

I've noticed those **es_es** markers sprinkled apparently randomly throughout the content.

I'm assuming there are just the "translation simulation marks" the comment is talking about. Right?

 

 

--- d) The title

 

The node title has the ITS attribute its:allowedCharacters="[a-zA-Z0-9'&quot; ]" is that means we can't use accented characters, dashes, even +, $, % signs, etc? That's a bit strange.

 

 

Cheers,

-yves

 

 

 

 

 

 

 

Received on Wednesday, 24 October 2012 15:56:06 UTC