- From: Jirka Kosek <jirka@kosek.cz>
- Date: Wed, 05 Dec 2012 20:11:22 +0100
- To: "Thomas A. Fine" <fine@head.cfa.harvard.edu>
- CC: public-html@w3.org
- Message-ID: <50BF9C5A.3020904@kosek.cz>
On 5.12.2012 18:57, Thomas A. Fine wrote: > HTML needs a tag to indicate sentence structure. This seems as a quite bold statement given the fact that most authors of Web content will be too lazy to markup sentences. > Like other semantic tags, a sentence tag can be useful in attempts to > extract meaning from a document, or to convert text to speech with more > reliable inflection, or to provide more reliable translations, and > probably for many other reasons. Yes, for translation it is sometimes important to do segmentation to sentences properly. However as semantics of HTML elements is known in advance there usually no problem with this. For some rare ambiguous cases you can use ITS markup (which can be applied to HTML as well) to set segmentation boundaries (http://www.w3.org/TR/its20/#elements-within-text). > While there are suggested algorithms for detecting sentences, none of > them works completely reliably. An accurate solution defies even the > most advanced AI approach, and in fact even another human being would > likely fail to accurately guess what the content creator had in mind in > all cases. Well if automatic spacing algorithms fail (which is not that often as you describe, at least in my experience) you can always fix missing space by inserting en- or em-space character manually which seems as much less barrier then putting element around each sentence. -- ------------------------------------------------------------------ Jirka Kosek e-mail: jirka@kosek.cz http://xmlguru.cz ------------------------------------------------------------------ Professional XML consulting and training services DocBook customization, custom XSLT/XSL-FO document processing ------------------------------------------------------------------ OASIS DocBook TC member, W3C Invited Expert, ISO JTC1/SC34 rep. ------------------------------------------------------------------ Bringing you XML Prague conference http://xmlprague.cz ------------------------------------------------------------------
Received on Wednesday, 5 December 2012 19:11:52 UTC