Re: [css3-gcpm] Comments on Generated Content for Paged Media 2007-05-04 from Ladd Van Tol on 2007-05-11 (www-style@w3.org from May 2007)

From: Ladd Van Tol <ladd@criticalpath.com>
Date: Thu, 10 May 2007 18:30:30 -0700
To: Håkon Wium Lie <howcome@opera.com>
Cc: www-style@w3.org
Message-Id: <0B6ED0D6-67A6-45CF-8EF1-AC9682162A1A@criticalpath.com>
On May 9, 2007, at 5:54 AM, Håkon Wium Lie wrote:

> Also sprach Ladd Van Tol:
>
>> I believe that hyphenation exceptions are currently an external
>> mechanism in TeX, although it's been a long time since I've used TeX.
>>
>> As far as I understand hyphenation rules, there are supposed to be
>> two kinds of exceptions:
>>
>> 1. Positive exceptions that show how to hyphenate particular words,
>> ignoring the more general rules found in the pattern list (aka
>> "dictionary")
>> 2. Negative exceptions that show words that can never be hyphenated,
>> generally due to having more than one proper hyphenation (dependent
>> on grammatical context)
>
> Right. My hypothesis is that both these exceptions can be encoded as
> general rules. In any case, the 'hyphenate-resource' property is
> general enough to host both exception lists, generic rules and
> dictionaries with embedded hyphenation points.

Hmm. I guess it would be interesting for someone to define a format  
that comprehensively covers natural language hyphenation.

>> So what happens to an ordinary element placed between the index-
>> marker and the index-entry divs?
>
> Let's see. Say, we now have:
>
>   <div id="index">
>     <h2>Index</h2>
>     <div id="index-marker"></div>
>     <p>foo</p>
>     <div id="index-entry"></div>
>   </div>
>
> The rendering of the initial state is:
>
>   Index
>   foo
>
> Now, say the formatter encounters this element:
>
>   <dfn class="entry">shark</dfn>
>
> two GLEs (generated list elements) are inserted, first one of type
> #index-marker and then one of type #index-entry.
>
> After the #index-marker has been inserted, the generated list looks
> like this:
>
>   Index
>   S
>   foo
>
> because only the first-letter (of "shark") is used, and the
> 'text-transform' specifies uppercase. The "S" appears above "foo"
> because the that's where the prototype elements (#index-marker) is
> found.
>
> The spec should say that the first GLE element is inserted where its
> prototype element is found.
>
> Next, a GLE of type #index-entry is inserted. It also has
> 'prototype-insert-position: sorted'. The question becomes: sorted
> relative to which elements? There are several options:
>
>  a) elements created with the same prototype element
>  b) elements created with the any prototype element
>  c) any block-level element
>  d) any element
>
> I think b) is the correct answer. You want to be able to sort
> elements, even if they have diffent styling.
>
> Let's assume b). Then the #index-entry is entered after the "S" (as
> "shark" comes after "s" in assumed sorting order). So, we get:
>
>   Index
>   S
>   shark
>   foo
>
> This result may be counter-intuitive as the <p> element comes before
> #index-entry in the structure.
>
> The benefit of this approach, however, is that we allow non-prototype
> elements inside and don't need to create dummy elements only
> consisting of prototype elements. For headings (like "Index") this is
> useful -- you will have a <div> arount the whole index (including the
> heading), but don't really want a <div> with everything except the
> heading.
>
> Does it make sense?

This seems weird, and I'm not sure how it interacts with nesting of  
mixed prototype and non-prototype elements. I'm thinking there is  
some utility in allowing for non-prototype elements to be copied for  
each element.

Consider, an arbitrary example where I've decided to output into a  
table:

<style>
#glossary { prototype: container }
#glossary-term { insert-position: sorted }
#glossary-definition { insert-position: current }
  dfn { prototype-insert: glossary-term self, glossary-definition attr 
(title) }

</style>
...
<table id="glossary">
<tr>
	<td>Glossary Entry:</td>
	<td id="glossary-term"></td>
	<td id="glossary-definition"></td>
</tr>
</table>
...
   <p>The <dfn title="Leading paragraph">introduction</dfn> comes  
first.</p>

In this case, you'd want a behavior where the interior elements of  
table are copied for each insertion.

>> One possibility might be to call it "prototype-container" and have a
>> value of "list" to indicate the list-like nature of indexes,
>> glossaries, and tables of contents. This may also provide expansion
>> opportunity if somebody comes up with another way to utilize
>> prototype containers.
>
> Interesting. So, "list" would be the only value in addition to "none".
>
>   prototype: list | none
>
> I like it. I've changed it in my internal version.

Great.

> I'd like to ask your advice on another proposed name change:
> "content" -> "self" or "contents". In the example you quote, one  
> finds:
>
>   prototype-insert: index-marker first-letter, index-entry content
>
> the problem with 'content' is that it's also the name of a property.
> I'd like to avoid that, if possible. I see two possible alternatives:
> "self" and "contents". "self" is shorter and has not singular/plural
> dilemma. "contents" may be more descriptive -- we're copying the
> contents of the element, but necessarily its style or structure.
>
> Any advice?

I would favor emphasizing that it copies the text of the node. "text- 
content" might be good, and corresponds with the "textContent"  
property in the DOM.

>> Seems good, although this example does assume that each term is only
>> defined once.
>
> Right, if a term is defined twice, it would end up twice in the
> glossary. This is better than deleting the first entry, no?

I guess it seems a little clunky to build glossaries this way.

On consideration, I'm also wondering about using this for index  
creation. Many real-world indices allow for page ranges (as in  
"49-50",  or "325,401"). This doesn't seem possible in the current  
scheme, although I would be happy to be proven wrong.

>> No problem -- it's very interesting, and relates closely to my
>> current work.
>
> If so, would you know someone who could implement it? I think it's
> possible to do it in a preprocessor without doing all the formatting.
> That is, you would not resolve page numbers, but create the necessary
> links so that a true formatter could resolve them later.

I might be interested in doing the generated lists portion as a  
preprocessor.

- Ladd
Received on Friday, 11 May 2007 01:31:53 UTC