RE: HTML5 Paragraphs, Sentences and Phrases from Adam Sobieski on 2012-04-20 (public-html@w3.org from April 2012)

From: Adam Sobieski <adamsobieski@hotmail.com>
Date: Fri, 20 Apr 2012 00:37:22 +0000
To: Benjamin Hawkes-Lewis <bhawkeslewis@googlemail.com>
CC: <public-html@w3.org>
Message-ID: <SNT138-W1892BEC5EB9C56CEF55F99C5220@phx.gbl>
Benjamin Hawkes-Lewis, The processing of Unicode intersentence spaces would be rapid and interoperable with a ::sentence pseudoselector. An entity name could exist for such Unicode, e.g. &ss;: Here is one sentence.&ss;Here is another. In April of 2003, a similar discussion occurred, [XHTML2] Unicode line and paragraph separators: http://lists.w3.org/Archives/Public/www-html/2003Apr/0016.html . In that discussion, sentences were mentioned (http://lists.w3.org/Archives/Public/www-html/2003Apr/0063.html). The discussion included whether content in Unicode paragraph-separated hypertext was inside of a paragraph; an argument from http://lists.w3.org/Archives/Public/www-html/2003Apr/0068.html can be rephrased as: <p>
Here is one sentence.
&ss;
Here is another.
&ss;
<img src="urn:x-internal:test-image" alt="Am I inside a sentence?" height="10" width="10" />
</p> In that same conversation, a <sentence> element was specifically mentioned (http://lists.w3.org/Archives/Public/www-html/2003Apr/0080.html) and commented upon (http://lists.w3.org/Archives/Public/www-html/2003Apr/0082.html). It was also commented upon, in Sentence element (Was: [XHTML2] Unicode line and paragraph separators), that SSML includes a sentence element (http://lists.w3.org/Archives/Public/www-html/2003Apr/0130.html, http://www.w3.org/TR/speech-synthesis11/#edef_sentence). Speech synthesis remains a compelling argument for granular markup and style. New developments, in that regard, include that speech API's may include processing, beyond strings of text or strings of XML, input parameters that are document object model elements. Phrase-based text layout is desirable today and would depend on phrase detection, either in the web browsers, authoring software, or both. It is possible that the two routes are not mutually exclusive; both manual and automatic modes could exist. I meant to express that, while <span> suffices for connecting structure to style, including sentence- and phrase-granular, the semantics of sentences and phrases is not indicated, including to speech synthesis processors. Speech synthesis processors could make use of CSS Speech module styles on granular elements, however. That is a good summarization, avoiding line breaks within phrases so that readers see (and quickly recognize) the phrase as a whole. There are then the sparse (e.g. keywords) and dense (e.g. phrasemes, collocations and compound terms) scenarios and some of the possible usage scenarios may include CSS speech and speech synthesis topics.

 

Kind regards,
 
Adam
  > From: bhawkeslewis@googlemail.com
> Date: Thu, 19 Apr 2012 06:36:48 +0100
> To: adamsobieski@hotmail.com
> CC: public-html@w3.org
> Subject: Re: HTML5 Paragraphs, Sentences and Phrases
> 
> On Mon, Apr 16, 2012 at 8:21 PM, Adam Sobieski <adamsobieski@hotmail.com> wrote:
> > With regard to sentence segmentation and phrase annotation, there are some
> > semi-automated, interactive natural language processing approaches
> > and techniques as well as other natural language processing approaches for
> > authoring software scenarios which, presently, may require more computation
> > than rapid for page loading and initialization.
> 
> I doubt the Unicode sentence segmentation algorithm is so slow.
> 
> Do you have examples of NLP-based features that:
> 
>     1. People want to implement in browsers today.
>     2. Would depend on phrase detection.
>     3. Would need to happen *during* page load, rather than shortly
> after or on demand.
>     4. Where the additional cost of calculating phrase boundaries
> would slow page load down so much that the effect on page load would
> inhibit adding the feature.
> 
> I think we've got a better chance of making client sentence/phrase
> detection good and fast enough for clients to use than of getting
> sufficient authoring tools to generate such markup that it's worth
> browsers relying on sentence/phrase markup in the web corpus.
> 
> > While span-based solutions are functional, for scenarios including EPUB3,
> > span for sentences is a popular usage scenario.  Span for phrases could
> > become so as well with CSS3 speech and text features including text-wrap.
> 
> I'm not sure what you mean by "[w]hile" here. Do you agree that <span>
> already addresses these usages or not?
> 
> > A summarization of some research is that reading speed, comprehension
> > and retention can be enhanced by text formatting including phrase-based.
> 
> That's more like a rephrase than an elaboration. Are you just talking
> about avoiding line breaks within phrases so that readers see (and
> quickly recognize) the phrase as a whole, or are there additional
> examples of phrase-based text formatting? Do Unicode spaces and
> non-breaking spaces address this? If not, why not?
> 
> > With regard to indicating phrase structure in hypertext with markup and
> > style, there are at least two scenarios; in one, indicated phrases are
> > sparse in hypertext, and, in another, regions of hypertext are more or less
> > segmented entirely into phrases.  Authors and authoring software could make
> > use of phrase structure for scenarios including keywords, phrasemes,
> > collocations and compound terms.
> 
> This doesn't sound like it would make for a coherent user experience
> on the basis of <phrase> …
> 
> --
> Benjamin Hawkes-Lewis
>
Received on Friday, 20 April 2012 00:37:52 UTC