Re: [all] query in usage of 'elements within text'

Thanks Yves, that's very clear.

When i was discussing this with Declan, he was planning to address this 
within their Matrex SMT system by parsing for quotation marks, so this 
doesn't give us a specific problem right now.

But I'd be interested in hearing from the MT guys, Declan, Dan etc, 
whether this is a general problem for them or not?

If so,  could we try and address it simply by including a new value for 
within text? Would something like the following solve the problem?

"subsegment" indicates that while the text should be treated as part of 
the flow of the parent element for segmentation for TM systems, it also 
informs MT systems that the element may be treated as a coherent 
subsegment whose integrity may be considered during the translation process.
<b its:withinText="subsegment">"To be or not to be"</b> is the opening 
phrase of a soliloquy in William Shakespeare's play <i>Hamlet</i>.

cheers,
Dave



On 03/09/2012 13:49, Yves Savourel wrote:
>
> Hi Dave,
>
> From my viewpoint the ‘nested’ property of the Within Text data 
> category is to be used only a few cases. A quote inside a paragraph is 
> not one of them. I see the attraction of it from the SMT view point, 
> but, from my experience in general, breaking down a content into too 
> many parts starts to have more negative effects than benefits.
>
> I would use nested only for constructs that clearly showing that two 
> contents are completely separated, like for the text of a footnote 
> embedded inside a paragraph in DocBook, or something similar.
>
> As a rule of thumb, if the element enclosing the possible sub-flow is 
> one that seems to be sometime a candidate for ‘nested’ and sometime a 
> candidate for ‘yes’, I would choose ‘yes’ in all cases.
>
> Another hint could be the segmentation too. In your example aside from 
> the quotation marks, it’s clear that the citation is part of the same 
> segment. To me, that’s an additional indication that it should be 
> ‘wintin text’ rather than ‘nested’.
>
> Hope this helps
>
> -yves
>
> *From:*Dave Lewis [mailto:dave.lewis@cs.tcd.ie]
> *Sent:* Monday, September 03, 2012 6:26 AM
> *To:* Multilingual Web LT Public List
> *Subject:* [all] query in usage of 'elements within text'
>
> Hi,
> Leroy and I have been discussing some examples for CMS-MT integration 
> scenarios with Declan and Ankit. One issue that's come up was how to 
> deal with quotations in a segment passed to MT.
>
> for example, take the segment (from wikipedia)
> "*To be or not to be*" is the opening phrase of a soliloquy 
> <http://en.wikipedia.org/wiki/Soliloquy> in William Shakespeare 
> <http://en.wikipedia.org/wiki/William_Shakespeare>'s play /Hamlet 
> <http://en.wikipedia.org/wiki/Hamlet>/.
>
> as (simplified) mark-up
> <b>"To be or not to be"</b> is the opening phrase of a soliloquy in 
> William Shakespeare's play <i>Hamlet</i>.
>
> With SMT, to retain the integrity of the quote, it may well be run 
> through the MT engine separately from the rest of the segment (or 
> perhaps even through a different engine trained specifically on 
> shakespeare bi-text in this example).
>
> I'm not clear in this case how (or even if) 'element within text' 
> would help, since <b>"To be or not to be"</b> is part of the flow, but 
> it does affect how it would be translated (in that it would be 
> subsegemented for SMT-based translation).
>
> It seems like a nested withinText value, e.g.:
> <b its:withinText="nested">"To be or not to be"</b> is the opening 
> phrase of a soliloquy in William Shakespeare's play <i>Hamlet</i>.
>
> But this doesn't match the example of nested given, where the 
> sub-element is a footnote that can be completely removed from the 
> parent element.
>
> Any advice from the ITS1.0 experts on this?
>
> One other point about the wording of the definition, it starts saying:
> "The Elements Within Text data category reveals *_if_* and how an 
> element affects the way text content behaves from a linguistic 
> viewpoint." But if you take the "if" literally as a question, the 
> sense of the value definitions seems inverted to me, i.e. 'yes' means 
> the element _doesn't_ affect the way the text in the element is 
> treated during translation.
>
> thanks in advance,
> Dave
>
>

Received on Monday, 3 September 2012 16:03:29 UTC