- From: Thomas A. Fine <fine@head.cfa.harvard.edu>
- Date: Wed, 05 Dec 2012 12:57:28 -0500
- To: public-html@w3.org
HTML needs a tag to indicate sentence structure.
So, how do I go about having this tag added? Is there a formal
procedure? Should I submit a bug report? Is there a specific group or
mailing list where I should start? What exactly is the process?
Here's a brief summary of why I think this is needed:
HTML5 has already added a number of other semantic tags which describe
recognizable pieces of documents which are larger than sentences (e.g.
SECTION). And this trend has continued with RDF and Microdata showing
that there is a significant interest in indicating smaller semantic
pieces down to the sub-sentence level.
For this reason alone it should be obvious that it would be ludicrous
for HTML to offer semantic tags for a vast array of different chunks of
information, and yet ignore the absolutely most common semantic chunk,
the sentence.
Like other semantic tags, a sentence tag can be useful in attempts to
extract meaning from a document, or to convert text to speech with more
reliable inflection, or to provide more reliable translations, and
probably for many other reasons.
In addition to semantic reasons, my primary interest in this issue is in
providing a mechanism for sentence spacing. As HTML could arguably be
the most consumed document type for the printed word today or in the
near future, it's shocking that it can't do the one common formatting
option that typesetters often used for hundreds of years after the
invention of movable type: wider sentence spacing.
It's not my intention to start or facilitate some kind of war about
sentence spacing. Indeed, HTML should absolutely be agnostic on the
issue. Unfortunately, it's inability to handle what is historically the
most basic text formatting operation can not be considered an agnostic
position. I've seen arguments of this issue where people hold up HTML
as evidence that wider sentence spacing is no longer correct. In other
words, there is now a belief that the HTML standard has already taken sides.
Here's a few reasons why people might want to adjust sentence formatting:
* Representation of the look of historical documents.
* As an aid to new readers, or people learning a new langauge.
* As an aid to people with learning or visual disabilities.
* As an additional means of adding emphasis to text.
* Simply because they prefer it for aesthetic reasons.
While there are suggested algorithms for detecting sentences, none of
them works completely reliably. An accurate solution defies even the
most advanced AI approach, and in fact even another human being would
likely fail to accurately guess what the content creator had in mind in
all cases.
If HTML has been given all the modern tools of convenience that we now
have, shouldn't it also include one of the most basic tools that
typesetters have been using for centuries?
tom
Received on Wednesday, 5 December 2012 17:58:01 UTC