W3C home > Mailing lists > Public > public-html@w3.org > December 2012

Re: The missing Sentence tag

From: Thomas A. Fine <fine@head.cfa.harvard.edu>
Date: Wed, 05 Dec 2012 16:20:10 -0500
Message-ID: <50BFBA8A.704@head.cfa.harvard.edu>
To: public-html@w3.org
On 12/5/12 2:11 PM, Jirka Kosek wrote:
> On 5.12.2012 18:57, Thomas A. Fine wrote:
>
>> HTML needs a tag to indicate sentence structure.
>
> This seems as a quite bold statement given the fact that most authors of
> Web content will be too lazy to markup sentences.

I'll admit that this is likely true to some degree.  Unfortunately it's 
also a self-fulfilling prophecy: if we don't offer tools for marking up 
sentences, then we have effectively prevented anyone but the most 
dedicated and steadfast from attempting some sort of substitute.  At 
present, the most common recommendation to the people who are interested 
in sentence formatting are told to use the broken solution of adding a 
non-breaking space.  Even if a less broken solution of some other space 
like the em or en space entity was offered, we're still telling people 
to do their formatting with stone knives and bearskins, when in all 
other avenues we strongly encourage people to turn to CSS for their 
formatting needs.

And people interested in semantic tags have nothing at all.

And, at this point HTML is well past the point where it can refuse to 
define markup simply because the audience is a probably a minority.

>> Like other semantic tags, a sentence tag can be useful in attempts to
>> extract meaning from a document, or to convert text to speech with more
>> reliable inflection, or to provide more reliable translations, and
>> probably for many other reasons.
>
> Yes, for translation it is sometimes important to do segmentation to
> sentences properly. However as semantics of HTML elements is known in
> advance there usually no problem with this. For some rare ambiguous
> cases you can use ITS markup (which can be applied to HTML as well) to
> set segmentation boundaries
> (http://www.w3.org/TR/its20/#elements-within-text).

Or you could just mark all the sentences as the content is created and 
take care of semantics and formatting all in one sensible method, 
creating content that is easier to parse without having to be an expert 
in parsing content, and allowing the user to fine-tune formatting with 
CSS without having to go back and fiddle with ever single sentence by 
hand.  At least you could if you had a sentence tag.

> Well if automatic spacing algorithms fail (which is not that often as
> you describe, at least in my experience) you can always fix missing
> space by inserting en- or em-space character manually which seems as
> much less barrier then putting element around each sentence.

Again, this assumes the only interest is in formatting, and it 
disregards the preferred CSS approach to provide finely controlled 
formatting.

     tom
Received on Wednesday, 5 December 2012 21:20:39 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 5 December 2012 21:20:39 GMT