- From: Tim Foster <Tim.Foster@Sun.COM>
- Date: Tue, 22 Feb 2005 15:58:52 +0000
- To: public-i18n-its@w3.org
Hi Folks, I promised this at the first call, so here goes : my thoughts on the draft document of the requirements for localisable DTD design that Richard and Yves put together (I'm referring to the 7th July Working Draft) at : http://people.w3.org/rishida/localizable-dtds/ General comments - I think this document is spot-on in most cases. I think it captures the problems that are frequently encountered by folks trying to translate XML documents, and it's adoption would certainly help there. I'm guessing that the target audience for this document is people either writing XML dtds, or teams producing content. In particular for the latter group, it's important that any requirements that we put on them should be as easy to implement as possible, and shouldn't put too much of a strain on any decent authoring tool. (Oh, just re-read the document - you mention this already, excellent) I'll go through the sections I have thoughts about, and will refer to their section number and title below : 2.2 Direct identification of content that should not be localised Could I expand on this and ask for : * identification of content that should not be word-counted * identification of content that should not be segmented - in our generic XML, HTML and SGML Docbook XLIFF filters, we've needed these extra identifiers in order to spot particular types of text that could appear within translatable sections, but which could trip up either segmentation or wordcounting algorithms. They should be translated, but just with care, or special skills perhaps : eg. <para>This is a <filename>com.sun.java.foo</filename> java package.</para> <para> <programlisting> public class Tim public static void main (String[] args){ System.out.println("Hello World!"); } } </programlisting> </para> - we'd want to wordcount 5 words (7 if we had a means to <span> around "Hello World!" - but we don't (yet!)) but specifically, protect the contents of the programlisting and the filename from being passed through a simple segmentation algorithm, which (assuming sentence-level segmentation) could create some pretty weird segments otherwise. 2.7 Emphasis & document conventions I get the intention of this, but I think it's important not to make XML-ITS into a DTP-type application - I don't think we should specify these under the ITS namespace (where do you stop ? eg. We provide <importance>, <irony>, <sarcasm>, but not <mildannoyance>, <subtlehumour>, etc.) Along with the strict tag-set in ITS, are we planning on having a "Suggestions to DTD authors", so that things that don't directly fall inside the tag-set can still be mentioned for consideration ? 2.11 Declaring the language of the content Multi-lingual XML documents are usually a pain in the neck for translators to deal with : typically they'd have to split a document up into mono-lingual bites, translate each section (presumably by several different translators) and then recombine the document. At Sun, where possible, we always try to keep language resources separate, so the user can easily install a new language package at runtime : multilingual resource files make this extremely complex for installer programs... Now, of course, it's good to have a way of marking up multi-lingual documents in terms of providing some way in which you can display the separate elements, I guess I'm suggesting "if you're going to be publishing XML documents, and the source text is mostly in one language, then please don't combine the translations in with the source document". I've come across XML documents where we have to add elements in order to provide translations, and it's a real pain. 2.12 Describing other cultural aspects of the content Similar comments as to the above : I'm not against multi-lingual documents, just so long as they're done properly (and when necessary) but in general, I'd really ask people if they really need all translations in a single file. 2.13 Citations Is this outside the scope of XML-ITS ? Shouldn't entity declarations in the XML document be used to do this ? 2.14 References to UI messages in Documentation Yep, by all means provide clues that such a string might be a UI message (eg. Docbook's <computeroutput> or <screen> tags) do this at the moment - clues which we use in our TM system to aid segmentation... but I don't know if I'd call-out message strings directly -- the XML document would be illegible without the message resource file being avaiable, it might be better to just mark up the section, and let the TM system fill in the translation. 2.16 Infinite Naming Scheme Yay !! (I've seen this problem in the wild too) 2.17 Allowed Characters How can you enforce this ? Isn't it up to the content authoring tool to do this job ? Are there hooks that already provide such functionality ? (eg. notes to translators imploring them to use only ASCII, or strings of a certain length ?) 2.18 Term identification Yep, this is good : who decides what a Term is though ? This sounds like a bit of repetition though - wasn't 2.24 Support for localisable resource data Do you mean the stuff that the Mozilla folks have done with the way they translate .dtd files for the UI ? Anyway, that's all for now - sorry for going on so long :-) cheers, tim -- Tim Foster - Tools Engineer, Software Globalisation http://sunweb.ireland/~timf http://blogs.sun.com/timf http://www.netsoc.ucd.ie/~timf
Received on Tuesday, 22 February 2005 16:00:28 UTC