RE: Elements within text

Answering Christian and Felix comments:

---1: Wording

> From Christian:
> However, I am worried about the wording ...
> 1. What is a flow? What is a sub-flow?
> 2. What does "element is part of its parent content" mean? I guess 
> this is not a statement related to the Infoset ... Rather, I guess 
> its a linguistic notion.
> I am not sure if we could borrow from typography some terminology 
> related to widows and orphans.
> <its:orphanRule orphan="yes" selector="//term"/>

Good points.

I'm not sure the typography terminology would help much: it's no guarantee more people will know it, and I don't think there is an
equivalent for the concept of subflow.

Maybe this could be solved with a better definition sub-section? Another possibility could be to rename the 'subflow' value to
'nested' which might be more generally understood? We would have then something like this (in the definition section):

----------
The data category elements within text expresses information about how elements should affect the flow of the content. In this
context the flow of the content represents how the nodes of the elements should be treated as a single unit for linguistic purposes.
Sometimes, a flow can be nested within another one.

The values associated with this data category are:

- "yes" (the element and its content are part of the flow of its parent element),

- "nested" (the element is part of the flow of its parent element, its content is an independent flow),

- and "no" (the element splits the text flow of its parent element and its content is an independent text flow).
Elements not listed are considered to have the value "no".
----------


---2: Examples

> from Felix:
> here is another case: in the TEI schema, these two variants are 
> possible:
> 1) <p> some text .. <li><item>...</item></li> more text ...</p>
> 2) <p> some text .. </p><li><item>...</item></li> more text ...</p>
> without thinking about the markup structure, both means: "there is 
> a paragraph, a list, and another paragraph". But in 2), the list 
> is inside the <p> element, so it looks like "there is a paragraph, 
> it contains some text, a list, and more text".
> what would be the appropriate usage of withinText for such cases?

Reading your description I'm guessing you probably mean:

1) <p>Some text...</p><li><item>Item text</item></li><p>More text...</p>

2) <p>Some text... <li><item>Item text</item></li>More text ...</p>

With the current definitions we have the usage of withinText for these examples would be: nothing to declare. Because <li> would be
not listed and our current default is withinText='no', the <p> in example 2) would broken down into three flows:

- "Some text..."
- "Item text"
- "More text..."

Just like in example 1).


---3: Default for non-listed elements

I'm still worried about how to handle elements not listed in 'withinText'.
The main reason is that with the current describtion, as soon as a vocabulary has some elements that should be as 'winthin text' it
forces us to have to declare them. And if possible it would be very nice to allow ITS processing without declaring anything when
possible.

In practice a large number of documents have no subflow issues, and they could be processed without declarations and few or no
errors: if you encounter an element while you are within a parent element that has already some text, most of the time it should be
treated as 'within text'.

The only reason declaring 'within text' elements is to recognize the cases like <p><b>...</b><b>...</b></p> that are no
programatically distinct from <li><p>...</p><p>...</p></li>. If not for that case, we could get away with declaring only the subflow
elements.

The bottom line is that there are many XML documents that have no subflows and could be processed without any 'within text'
declarations if we were to say that elements not listed are assumed withinText='yes' (instead of withinText='no'.

This leads me to wonder if we could have a way to declare that default for non-listed elements and then make our default
defaultValue "yes"?
The notation would be something like <its:withinTextRule defaultValue="no" withinText="yes" selector="//b|//em" />

Being able to choose the default would also allow easier declarations in some case: depending on th type of document some may have
much less withinText='no' that withinText='yes'.

But that means it would make sense to have it only once in a <rules> while there is nothing preventing us to have several
<withinTextRule>. I guess the last one would win, but it feels very un-elegant.

Another way to achieve this would be to say "If there is at least one <withinTextRule> element present, the default value for
non-listed elements is "no", otherwise it's "yes". But that also feels awkward.

Just thinking aloud...

Cheers,
-yves

Received on Tuesday, 2 May 2006 04:35:52 UTC