- From: Thomas A. Fine <fine@head.cfa.harvard.edu>
- Date: Tue, 24 Apr 2012 12:44:06 -0400 (EDT)
- To: public-html@w3.org
This discussion seems to be focused so far on natural language processing, but for me this is not the primary issue. The primary purpose of HTML is to provide content authors control over the look of their document. The use of the unicode (or any other) sentence algorithm together with a pseudo-tag is inadequate, since the algorithm won't work in all cases, authors don't actually control what is formatted in this case. This is the entire reason a sentence tag is needed. If any current or proposed method of parsing sentences 100% matched human interpretation, a sentence tag would not be necessary. Marking sentences (and phrases) is certainly a tedious task, and not something the majority of content creaters are likely to be interested in. However there is good evidence that such formatting is useful (at least) to early readers and new readers coming from a different language. There would certainly be those who would be interested if this were available. Let's look at a basic case where an author wants to format sentences with extra space. A knowledgable and patient author can mark sentences with span tags, and then use these to (more or less) achieve the desired formatting. Someone with little HTML experience, or someone sitting in front of a web authoring tool is unlikely to be able to accomplish this at all, or is likely to be steered towards the incorrect solution of using nbsp. Ideally, web authoring tools could aid the user in marking sentences by using sentence detection algorithms, and allow the user to override those cases where this method fails. While this is possible with a span tag, no such tools are ever likely to be developed in the absence of a dedicated sentence tag. So my opinion is that while sentence formatting can be accomplished with a span tag, easy and accessible sentence formatting is unlikely to be available to most content creators without a dedicated sentence tag. For similar reasons, I'd suggest that the sort of tools Mr. Sobieski has discussed are also unlikely to make significant progress. The same arguments can easily be extended to phrase tags, especially since we can't reasonably suggest any algorithm that might yield phrases. Things do get a bit more sticky there, as I don't believe current CSS models are up to the task of correctly formatting phrases differently from each other and form sentences. tom
Received on Tuesday, 24 April 2012 16:44:37 UTC