RE: [all] query in usage of 'elements within text'

Hi Dave,

 

>From my viewpoint the ‘nested’ property of the Within Text data category is to be used only a few cases. A quote inside a paragraph is not one of them. I see the attraction of it from the SMT view point, but, from my experience in general, breaking down a content into too many parts starts to have more negative effects than benefits.

 

I would use nested only for constructs that clearly showing that two contents are completely separated, like for the text of a footnote embedded inside a paragraph in DocBook, or something similar.

 

As a rule of thumb, if the element enclosing the possible sub-flow is one that seems to be sometime a candidate for ‘nested’ and sometime a candidate for ‘yes’, I would choose ‘yes’ in all cases.

 

Another hint could be the segmentation too. In your example aside from the quotation marks, it’s clear that the citation is part of the same segment. To me, that’s an additional indication that it should be ‘wintin text’ rather than ‘nested’.

 

Hope this helps

-yves

 

 

From: Dave Lewis [mailto:dave.lewis@cs.tcd.ie] 
Sent: Monday, September 03, 2012 6:26 AM
To: Multilingual Web LT Public List
Subject: [all] query in usage of 'elements within text'

 

Hi,
Leroy and I have been discussing some examples for CMS-MT integration scenarios with Declan and Ankit. One issue that's come up was how to deal with quotations in a segment passed to MT.

for example, take the segment (from wikipedia)
"To be or not to be" is the opening phrase of a soliloquy <http://en.wikipedia.org/wiki/Soliloquy>  in William Shakespeare <http://en.wikipedia.org/wiki/William_Shakespeare> 's play Hamlet <http://en.wikipedia.org/wiki/Hamlet> .

as (simplified) mark-up
<b>"To be or not to be"</b> is the opening phrase of a soliloquy in William Shakespeare's play <i>Hamlet</i>.

With SMT, to retain the integrity of the quote, it may well be run through the MT engine separately from the rest of the segment (or perhaps even through a different engine trained specifically on shakespeare bi-text in this example).

I'm not clear in this case how (or even if) 'element within text' would help, since <b>"To be or not to be"</b> is part of the flow, but it does affect how it would be translated (in that it would be subsegemented for SMT-based translation).

It seems like a nested withinText value, e.g.: 
<b its:withinText="nested">"To be or not to be"</b> is the opening phrase of a soliloquy in William Shakespeare's play <i>Hamlet</i>.

But this doesn't match the example of nested given, where the sub-element is a footnote that can be completely removed from the parent element. 

Any advice from the ITS1.0 experts on this?

One other point about the wording of the definition, it starts saying:
"The Elements Within Text data category reveals _if_ and how an element affects the way text content behaves from a linguistic viewpoint." But if you take the "if" literally as a question, the sense of the value definitions seems inverted to me, i.e. 'yes' means the element _doesn't_ affect the way the text in the element is treated during translation.  

thanks in advance,
Dave


 

Received on Monday, 3 September 2012 12:50:49 UTC