- From: Benjamin Hawkes-Lewis <bhawkeslewis@googlemail.com>
- Date: Mon, 9 Apr 2012 14:20:24 +0100
- To: Adam Sobieski <adamsobieski@hotmail.com>
- Cc: public-html@w3.org
On Mon, Apr 9, 2012 at 12:41 PM, Adam Sobieski <adamsobieski@hotmail.com> wrote: > While HTML5 presently has a document structure granularity of > paragraphs, for sentences and phrases in hypertext, options include the > <span> element, e.g. <span class="sentence"> and <span class="phrase">, and > the use of XML from other XMLNS. Also microdata, RDFa, and Unicode sentence segmentation: http://www.unicode.org/reports/tr29/#Sentence_Boundaries > HTML5 markup elements for sentences and phrases are possible. Possible, but their necessity is undemonstrated. > In any eventuality, sentences and phrases are important CSS3 usage scenarios. > A non-exhaustive list of the benefits of sentences in hypertext include: > > 1. Sentence-level granularity can be of use to the styling, layout and > rendering of hypertext. Topics include layout with regard to columns and > pages as well as intersentence spacing. Sentence and phrase granularity in > documents can facilitate readability, reading speed and comprehension > (http://lists.w3.org/Archives/Public/www-style/2012Apr/0153.html). The web corpus is not going to get marked up with phrases and sentences in the absence of NLP advances that would make such markup mostly redundant. If you want a way to tweak this spacing from CSS, a ::sentence pseudo-element (comparable to ::first-character and ::first-line) that selected sentences based on the Unicode sentence segmentation algorithm would work reasonably at web scale, whereas a dedicated <sentence> semantic would only work in the small subset of documents that applied it. Authors who want to tweak the spacing in particular cases can use <span>. I suggest you propose ::sentence for CSS Selectors Level 4. > 2. Media overlays in EPUB, based upon SMIL, "text elements' src attributes > refer to EPUB Content Document elements by their IDs. The granularity level > of the Media Overlay therefore depends on how the EPUB Content Document is > marked up. If the finest level of markup is at the paragraph level, then > that is the finest possible level at which Media Overlay synchronization can > be authored. Likewise, if sub-paragraph markup is available, such as span > elements representing phrases or sentences, then finer granularity is > possible in the Media Overlay. Finer granularity gives Users more precise > results for synchronized playback when navigating by word or phrase and when > searching the text, but increases the file size of the Media Overlay > Documents." > (http://idpf.org/epub/30/spec/epub30-mediaoverlays.html#sec-media-overlays-granularity) >From that document, it sounds like <span> already works for their use case? > 3. Natural language processing of hypertext. See also: > http://en.wikipedia.org/wiki/Sentence_boundary_disambiguation . Is NLP that needs markup to discern sentence boundaries still NLP? > 4. Navigational. Sentence elements with IDs can be navigated to and > specifically referenced. See also: > http://idpf.org/epub/linking/cfi/epub-cfi.html . @id already works for this. Why is the "sentence" semantic needed? > 5. Sentence-level granularity of structure can facilitate new semantics > including annotational. For example, the epub:type attribute, resembling the > role attribute, with some uses indicated at > http://idpf.org/epub/vocab/structure/#h_document-text including > "concluding-sentence" and "topic-sentence". We don't need to introduce new semantics to the core vocabulary to facilitate annotations that can already be made with microdata/RDFa. > 6. Speech synthesis. SSML includes paragraphs and sentences > (http://www.w3.org/TR/speech-synthesis11/#S3.1.8.1). Sentence granularity > can enhance the audio output of synthesis processors processing hypertext. The Unicode sentence segmentation algorithm sounds good enough for this. If it's not, improving the NLP algorithms of text-to-speech agents is going to more cost effective than trying to persuade authors to add sentence markup to the corpus. > A non-exhaustive list of the benefits of phrases in hypertext include: > > 1. Phrase-level granularity can be of use to styling, layout and rendering. > Topics include text wrapping. Sentence and phrase granularity in documents > can facilitate readability, reading speed and comprehension > (http://lists.w3.org/Archives/Public/www-style/2012Apr/0153.html). Can you summarize from your reading list what these benefits would be and why they can't be achieved using existing mechanisms like Unicode non-breaking spaces? > 2. Media overlays in EPUB [snip] > 3. Natural language processing of hypertext. > 4. Phrase-level granularity of structure can facilitate new semantics > including annotational. For example, the epub:type attribute, resembling the > role attribute, with some uses indicated > at http://idpf.org/epub/vocab/structure/#h_document-text including > "keyword". Already discussed above. > 5. Speech synthesis. For example, pauses between words may differ inside and > between phrase elements. Do you have an example of this? This behavior sounds like it would be phrase-specific rather than general to everything authors might mark up with <phrase>. How are you defining "phrase" here anyway? -- Benjamin Hawkes-Lewis
Received on Monday, 9 April 2012 13:21:14 UTC