- From: Thomas A. Fine <fine@head.cfa.harvard.edu>
- Date: Wed, 05 Dec 2012 16:20:10 -0500
- To: public-html@w3.org
On 12/5/12 2:11 PM, Jirka Kosek wrote: > On 5.12.2012 18:57, Thomas A. Fine wrote: > >> HTML needs a tag to indicate sentence structure. > > This seems as a quite bold statement given the fact that most authors of > Web content will be too lazy to markup sentences. I'll admit that this is likely true to some degree. Unfortunately it's also a self-fulfilling prophecy: if we don't offer tools for marking up sentences, then we have effectively prevented anyone but the most dedicated and steadfast from attempting some sort of substitute. At present, the most common recommendation to the people who are interested in sentence formatting are told to use the broken solution of adding a non-breaking space. Even if a less broken solution of some other space like the em or en space entity was offered, we're still telling people to do their formatting with stone knives and bearskins, when in all other avenues we strongly encourage people to turn to CSS for their formatting needs. And people interested in semantic tags have nothing at all. And, at this point HTML is well past the point where it can refuse to define markup simply because the audience is a probably a minority. >> Like other semantic tags, a sentence tag can be useful in attempts to >> extract meaning from a document, or to convert text to speech with more >> reliable inflection, or to provide more reliable translations, and >> probably for many other reasons. > > Yes, for translation it is sometimes important to do segmentation to > sentences properly. However as semantics of HTML elements is known in > advance there usually no problem with this. For some rare ambiguous > cases you can use ITS markup (which can be applied to HTML as well) to > set segmentation boundaries > (http://www.w3.org/TR/its20/#elements-within-text). Or you could just mark all the sentences as the content is created and take care of semantics and formatting all in one sensible method, creating content that is easier to parse without having to be an expert in parsing content, and allowing the user to fine-tune formatting with CSS without having to go back and fiddle with ever single sentence by hand. At least you could if you had a sentence tag. > Well if automatic spacing algorithms fail (which is not that often as > you describe, at least in my experience) you can always fix missing > space by inserting en- or em-space character manually which seems as > much less barrier then putting element around each sentence. Again, this assumes the only interest is in formatting, and it disregards the preferred CSS approach to provide finely controlled formatting. tom
Received on Wednesday, 5 December 2012 21:20:39 UTC