Re: HTML5 Paragraphs, Sentences and Phrases

On Mon, Apr 16, 2012 at 8:21 PM, Adam Sobieski <> wrote:
> With regard to sentence segmentation and phrase annotation, there are some
> semi-automated, interactive natural language processing approaches
> and techniques as well as other natural language processing approaches for
> authoring software scenarios which, presently, may require more computation
> than rapid for page loading and initialization.

I doubt the Unicode sentence segmentation algorithm is so slow.

Do you have examples of NLP-based features that:

    1. People want to implement in browsers today.
    2. Would depend on phrase detection.
    3. Would need to happen *during* page load, rather than shortly
after or on demand.
    4. Where the additional cost of calculating phrase boundaries
would slow page load down so much that the effect on page load would
inhibit adding the feature.

I think we've got a better chance of making client sentence/phrase
detection good and fast enough for clients to use than of getting
sufficient authoring tools to generate such markup that it's worth
browsers relying on sentence/phrase markup in the web corpus.

> While span-based solutions are functional, for scenarios including EPUB3,
> span for sentences is a popular usage scenario.  Span for phrases could
> become so as well with CSS3 speech and text features including text-wrap.

I'm not sure what you mean by "[w]hile" here. Do you agree that <span>
already addresses these usages or not?

> A summarization of some research is that reading speed, comprehension
> and retention can be enhanced by text formatting including phrase-based.

That's more like a rephrase than an elaboration. Are you just talking
about avoiding line breaks within phrases so that readers see (and
quickly recognize) the phrase as a whole, or are there additional
examples of phrase-based text formatting? Do Unicode spaces and
non-breaking spaces address this? If not, why not?

> With regard to indicating phrase structure in hypertext with markup and
> style, there are at least two scenarios; in one, indicated phrases are
> sparse in hypertext, and, in another, regions of hypertext are more or less
> segmented entirely into phrases.  Authors and authoring software could make
> use of phrase structure for scenarios including keywords, phrasemes,
> collocations and compound terms.

This doesn't sound like it would make for a coherent user experience
on the basis of <phrase> …

Benjamin Hawkes-Lewis

Received on Thursday, 19 April 2012 05:37:41 UTC