[ESW Wiki] Update of "its0503ReqSpan" by TimFoster

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "ESW Wiki" for change notification.

The following page has been changed by TimFoster:
http://esw.w3.org/topic/its0503ReqSpan


------------------------------------------------------------------------------
  
  A span-like element is required to allow authors to mark sections text that may have special properties, from a localisation and internationalisation point of view. Span-like elements are required as a general concept for the different requirements for the ITS.
  
- ''[note to ITS folk : I think this is a bit vague, perhaps we could define it more carefully ? I'm trying to suggest that the span element can be used for several purposes, all of which we haven't defined yet (but would be defined by the time we release the spec - I'm not suggesting it should be a free-for-all, but that as a requirement, we do need a way to mark sections of text for different purposes)]''
- 
- '''[MD] I think this is clear enough for the moment. We can refine this later if needed.'''
- 
  
  == Background: ==
  
- This allows localisation tools to determine their behaviour on certain sections of text. This could be for sections of text that need to be translated by a domain-expert (as with source code fragments) or need special terminology in order to be properly translated. In particular, a span-like element can be useful to help translation tools determine where to apply sentence-breaks and also to assist word-counting algorithms. '''[TF] added text''' A span-like element is also extremely useful for marking langauge information in source files which translation tools can also use to determine which translation process to use for each given section of text (eg. a Latin quotation in a section of English text is often intended to be left in Latin for the translated version of the English text.)''' [TF] end added text''' Other uses are foreseen, within the scope of the ITS.
+ This allows localisation tools to determine their behaviour on certain sections of text. This could be for sections of text that need to be translated by a domain-expert (as with source code fragments) or need special terminology in order to be properly translated. In particular, a span-like element can be useful to help translation tools determine where to apply sentence-breaks and also to assist word-counting algorithms.  A span-like element is also extremely useful for marking langauge information in source files which translation tools can also use to determine which translation process to use for each given section of text (eg. a Latin quotation in a section of English text is often intended to be left in Latin for the translated version of the English text.) Other uses are foreseen, within the scope of the ITS.
- 
- '''[MD] This omits a very important use of the <span> element, and the main reason it was added to HTML originally: language information.
- Language information is important both for internationalization (e.g. different styling according to language) as well as localization (text needs to go to different translator, or not translated, or otherwise treated differently).'''
- 
- '''[TF] Good point, I've added that text above'''
- 
  
  One example would be the following sentence, which contains some source code that we would like to treat specially during translation :
  
+ {{{
- 'The statement in the Java programming language, System.out.println("Hello World!"); prints the text "Hello World!" to standard output.'
+ The statement in the Java programming language, System.out.println("Hello World!"); prints the text "Hello World!" to standard output.
+ }}}
  
  Here, we would like to put a spanning element around the source code fragment to indicate that it is not standard text for translation and should be translated by a someone familiar with the Java programming language. Also, translation tools should treat the exclamation points in the sample text carefully with respect to sentence-segmentation if they perform that function.
  
- '''[RI] Hmm. This is not such a good example in my mind, since it seems to suggest that it's ok not to put System.out.println("Hello World!"); in an element such as <code>.  On the contrary, I think we should have a guideline and expectation that people will have this marked up so that a span element is not necessary.  Same goes for the output.'''
+ While the <code> tag in XHTML could be used to markup this text (in an XHTML document), it's often not specific enough for translators : it doesn't tell the translator what sort of source code is contained inside the tag, nor does it mark which portions of the code contents are translatable.
  
- '''[TF] Okay, I probably didn't choose the best example here. What I'd really like to see, is some way of marking purely the translatable text in the sentence, allowing the author to clearly delimit the parts of the code tag that are translatable vs. non-translatable : right now, all <code> says is that there's source code present - it's up to tools to work out (a) what type of code it is, and (b) which parts of the code are translatable. Perhaps something like this would be a better example :'''
+ A suggestion of the sort of usage we could forsee for a span-like element could be the following :
  
+ {{{
- 'The statement in the Java programming language <code><its:donttranslate>System.out.println("<its:/donttranslate>Hello World<its:donttranslate>");<its:/donttranslate></code> prints the text "Hello World!" to standard output.'
+ The statement in the Java programming language <code><span:span-donttranslate>System.out.println("<its:/span-donttranslate>Hello World<its:span-donttranslate>");<its:/span-donttranslate></code> prints the text "Hello World!" to standard output.'
+ }}}
  
- '''[TF] The point is, I'm suggesting we shift some of the responsibility of identifying translatable vs. non-translatable content off the translation tools author (or at the very least, make recommendations to content authors to separate out the translatable vs. non-translatable portions of text more clearly (eg. leave the entire contents of <code> as non-translatable and use an entity reference or some other means to refer in the translatable text from elsewhere, eg. <code>System.out.println("&java.code.example.text;");</code>) -- but I'm going into implementation details here, and was trying to avoid that for this requirements document :-)'''
+ An alternative to this sort of construction, would be to put insert translatable text in the source document using some form of text, linking, for example :
  
+ {{{
+ <code>System.out.println("&java.code.example.text;");</code>
+ }}}
  
+ In these examples, the point has been to shift some of the responsiblity of identifying translatable vs. non-translatable content off the translation tools author, or at the very least, make recommendations to content authors to separate out the translatable vs. non-translatable portions of text more clearly.
  
- This next section of text shows a filename that should also not be translated :
+ Another example is shown below, where we have a piece of text that contains a filename which should also not be translated :
  
+ {{{
- 'The file /etc/passwd is a local source of information about users'
+ The file /etc/passwd is a local source of information about users'
- accounts.'
+ accounts.
+ }}}
  
- In this case, the filename "/etc/passwd" should not be translated, and we would like to mark that filename with an element to indicate this.
+ In this case, the filename "/etc/passwd" should not be translated, and we would like to add markup that indicates this.
  
+ This requirement is related to some other requirements, namely :
- ''[ Note to ITS folks :
- The current list of possible uses of the span element is :
  
+  * http://esw.w3.org/topic/its0504ReqPurposeSpecMap

+  * http://esw.w3.org/topic/its0504ReqLinkedText

-  * Text that should not be segmented
-  * Text that should not be translated
-  * Text that should not be wordcounted
  
+ For the Purpose Specific Mapping, we need to ensure that any related semantics in the target schema are also sufficient for translation : that is for example, saying that a <programlisting> element in DocBook is related to a <code> element in XHTML is interesting, but neither will help the translator determine which contents of code or programlisting are actually translatable.
- Are there any more that people can think of that don't directly fall into other sections, that is, I expect we would have particular requirements for dealing with terms, phrases, etc. elsewhere in the requirements document.
- ]''
  
- '''[MD] I think two more aspects should be mentioned shortly:
+ A span-like element could be used in cases like these where we want to markup specific text properties.
  
-  - Any element that in and by itself doesn't carry specific semantics
-    is fine. If the target schema already has such an element, fine.
-  - Say where the element should be allowed in the target schema:
-    Everywhere where natural language/translatable text can appear.
- 
- I'm using the term 'target schema' here to talk about the schema that we are trying to internationalize/localize.'''
- 
- '''[TF] I guess the question is, to what level are the semantics in the target schema sufficient for the purpose of translation ? For HTML, <code> clearly indicates that the contents are source code, but doesn't say which parts of that source code are translatable, nor what programming language it contains (and of course, not every string-literal in a given programming language is translatable either - hence the need for a span-like element I think)'''
- 

Received on Wednesday, 29 June 2005 13:15:32 UTC