W3C home > Mailing lists > Public > public-i18n-its@w3.org > April to June 2006

Re: Elements within text

From: Felix Sasaki <fsasaki@w3.org>
Date: Tue, 02 May 2006 14:03:12 +0900
Message-ID: <4456E810.3040105@w3.org>
To: Yves Savourel <yves@opentag.com>
Cc: public-i18n-its@w3.org
Hi Yves,

Agree with & understand all you said re' 1 and 2.

As for 3, how about these rules:

a) if an element is selected by a <withinTextRule> with the value "yes":
it is a "within text" element
b) if an element is selected by a <withinTextRule> with the value "no":
it is a "within text" element
c) if an element is not selected by any <withinTextRule> element, but
matches your rule ("if you encounter an element while you are within a
parent element that has already some text, most of the time it should be
treated as 'within text'."), it is within text.

c) would allow ITS processing without declaring anything.

regarding your notation:
<its:withinTextRule defaultValue="no" withinText="yes"
selector="//b|//em" />
we currently say in the draft "ITS information is attached to the nodes
selected by the selector attribute". But what you want is "attaching
information to nodes which are *not* selected". I'm a little bit worried
what that could mean precedence: you could come up with combinations like

<its:withinTextRule defaultValue="no" withinText="yes"
selector="//b|//em" />
<its:withinTextRule defaultValue="yes" withinText="yes"
selector="//b|//em" />
<its:withinTextRule defaultValue="no" withinText="no" selector="//b|//em" />
<its:withinTextRule defaultValue="yes" withinText="no"
selector="//b|//em" />

and would need separate precedence rules for selected versus not
selected nodes ...

what you want to achieve "declare that default for non-listed elements
": I think it could also be done by

<its:withinTextRule withinText="xxx" selector="//*" />

as the first rule. xxx would be the default. This rule would be
overridden by the following rules.



Yves Savourel wrote:
> Answering Christian and Felix comments:
> ---1: Wording
>> From Christian:
>> However, I am worried about the wording ...
>> 1. What is a flow? What is a sub-flow?
>> 2. What does "element is part of its parent content" mean? I guess 
>> this is not a statement related to the Infoset ... Rather, I guess 
>> its a linguistic notion.
>> I am not sure if we could borrow from typography some terminology 
>> related to widows and orphans.
>> <its:orphanRule orphan="yes" selector="//term"/>
> Good points.
> I'm not sure the typography terminology would help much: it's no guarantee more people will know it, and I don't think there is an
> equivalent for the concept of subflow.
> Maybe this could be solved with a better definition sub-section? Another possibility could be to rename the 'subflow' value to
> 'nested' which might be more generally understood? We would have then something like this (in the definition section):
> ----------
> The data category elements within text expresses information about how elements should affect the flow of the content. In this
> context the flow of the content represents how the nodes of the elements should be treated as a single unit for linguistic purposes.
> Sometimes, a flow can be nested within another one.
> The values associated with this data category are:
> - "yes" (the element and its content are part of the flow of its parent element),
> - "nested" (the element is part of the flow of its parent element, its content is an independent flow),
> - and "no" (the element splits the text flow of its parent element and its content is an independent text flow).
> Elements not listed are considered to have the value "no".
> ----------
> ---2: Examples
>> from Felix:
>> here is another case: in the TEI schema, these two variants are 
>> possible:
>> 1) <p> some text .. <li><item>...</item></li> more text ...</p>
>> 2) <p> some text .. </p><li><item>...</item></li> more text ...</p>
>> without thinking about the markup structure, both means: "there is 
>> a paragraph, a list, and another paragraph". But in 2), the list 
>> is inside the <p> element, so it looks like "there is a paragraph, 
>> it contains some text, a list, and more text".
>> what would be the appropriate usage of withinText for such cases?
> Reading your description I'm guessing you probably mean:
> 1) <p>Some text...</p><li><item>Item text</item></li><p>More text...</p>
> 2) <p>Some text... <li><item>Item text</item></li>More text ...</p>
> With the current definitions we have the usage of withinText for these examples would be: nothing to declare. Because <li> would be
> not listed and our current default is withinText='no', the <p> in example 2) would broken down into three flows:
> - "Some text..."
> - "Item text"
> - "More text..."
> Just like in example 1).
> ---3: Default for non-listed elements
> I'm still worried about how to handle elements not listed in 'withinText'.
> The main reason is that with the current describtion, as soon as a vocabulary has some elements that should be as 'winthin text' it
> forces us to have to declare them. And if possible it would be very nice to allow ITS processing without declaring anything when
> possible.
> In practice a large number of documents have no subflow issues, and they could be processed without declarations and few or no
> errors: if you encounter an element while you are within a parent element that has already some text, most of the time it should be
> treated as 'within text'.
> The only reason declaring 'within text' elements is to recognize the cases like <p><b>...</b><b>...</b></p> that are no
> programatically distinct from <li><p>...</p><p>...</p></li>. If not for that case, we could get away with declaring only the subflow
> elements.
> The bottom line is that there are many XML documents that have no subflows and could be processed without any 'within text'
> declarations if we were to say that elements not listed are assumed withinText='yes' (instead of withinText='no'.
> This leads me to wonder if we could have a way to declare that default for non-listed elements and then make our default
> defaultValue "yes"?
> The notation would be something like <its:withinTextRule defaultValue="no" withinText="yes" selector="//b|//em" />
> Being able to choose the default would also allow easier declarations in some case: depending on th type of document some may have
> much less withinText='no' that withinText='yes'.
> But that means it would make sense to have it only once in a <rules> while there is nothing preventing us to have several
> <withinTextRule>. I guess the last one would win, but it feels very un-elegant.
> Another way to achieve this would be to say "If there is at least one <withinTextRule> element present, the default value for
> non-listed elements is "no", otherwise it's "yes". But that also feels awkward.
> Just thinking aloud...
> Cheers,
> -yves

Received on Tuesday, 2 May 2006 05:03:34 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:43:07 UTC