W3C home > Mailing lists > Public > public-html@w3.org > December 2012

Re: The missing Sentence tag

From: Gavin Carothers <gavin@carothers.name>
Date: Wed, 5 Dec 2012 13:10:54 -0800
Message-ID: <CAPqY83zqDTGWrAycu+dZpc-qqnG-jkZ4Ex9Ko98PDWpbcYW-fg@mail.gmail.com>
To: "Thomas A. Fine" <fine@head.cfa.harvard.edu>
Cc: HTMLwg <public-html@w3.org>
On Wed, Dec 5, 2012 at 9:57 AM, Thomas A. Fine <fine@head.cfa.harvard.edu>wrote:

> HTML needs a tag to indicate sentence structure.
> So, how do I go about having this tag added?  Is there a formal procedure?
>  Should I submit a bug report?  Is there a specific group or mailing list
> where I should start?  What exactly is the process?
> Here's a brief summary of why I think this is needed:
> HTML5 has already added a number of other semantic tags which describe
> recognizable pieces of documents which are larger than sentences (e.g.
> SECTION).  And this trend has continued with RDF and Microdata showing that
> there is a significant interest in indicating smaller semantic pieces down
> to the sub-sentence level.
> For this reason alone it should be obvious that it would be ludicrous for
> HTML to offer semantic tags for a vast array of different chunks of
> information, and yet ignore the absolutely most common semantic chunk, the
> sentence.
> Like other semantic tags, a sentence tag can be useful in attempts to
> extract meaning from a document, or to convert text to speech with more
> reliable inflection, or to provide more reliable translations, and probably
> for many other reasons.
> In addition to semantic reasons, my primary interest in this issue is in
> providing a mechanism for sentence spacing.  As HTML could arguably be the
> most consumed document type for the printed word today or in the near
> future, it's shocking that it can't do the one common formatting option
> that typesetters often used for hundreds of years after the invention of
> movable type: wider sentence spacing.
> It's not my intention to start or facilitate some kind of war about
> sentence spacing.  Indeed, HTML should absolutely be agnostic on the issue.
>  Unfortunately, it's inability to handle what is historically the most
> basic text formatting operation can not be considered an agnostic position.
>  I've seen arguments of this issue where people hold up HTML as evidence
> that wider sentence spacing is no longer correct.  In other words, there is
> now a belief that the HTML standard has already taken sides.
> Here's a few reasons why people might want to adjust sentence formatting:
>   * Representation of the look of historical documents.

TEI is likely to cover all of this need already. The needs of historical
documents are longer and more exhaustive then HTML needs to deal with. TEI
can be readily transformed into HTML preserving some of the semantics in
classes and other attributes.

>   * As an aid to new readers, or people learning a new langauge.

"A new language"... so we don't need a tag for sentences, we'd need a tag
for all grammar structures in every language.

>   * As an aid to people with learning or visual disabilities.

I am skeptical.

>   * As an additional means of adding emphasis to text.

  * Simply because they prefer it for aesthetic reasons.

Aesthetics is not part of semantic markup.

> While there are suggested algorithms for detecting sentences, none of them
> works completely reliably.  An accurate solution defies even the most
> advanced AI approach, and in fact even another human being would likely
> fail to accurately guess what the content creator had in mind in all cases.

I'd really stick to TEI for all of this and there already are conventions
for converting TEI into HTML.

> If HTML has been given all the modern tools of convenience that we now
> have, shouldn't it also include one of the most basic tools that
> typesetters have been using for centuries?
>       tom
Received on Wednesday, 5 December 2012 21:11:22 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 29 October 2015 10:16:29 UTC