- From: Thomas A. Fine <fine@head.cfa.harvard.edu>
- Date: Thu, 12 Apr 2012 16:48:06 -0400 (EDT)
- To: public-html-comments@w3.org
This is in response to Benjamin Hawkes-Lewis' response to Adam Sobieski's proposal for sentence and phrase tags. Speaking to the "necessity" of these tags, while I'm not sure really any tag, or HTML or the web or even a good slice of pizza can be described as necessary, these tags can definitely be useful, and most likely they can be important. Sentence and phrase markings can be very useful to: People relying on audio conversion to access the web. People relying on automated translation. People who are just learning to read. People who are reading an article not in their native language. People who are interested in inter-sentence spacing or inter-phrase spacing. People with commercial interests, looking to maximize their reach. Of course, simply adding tags won't really help any of these people. The real point is that such tags can facilitate tools that help these people. The problem with using span tags is that they won't facilitate tool development. In the absence of a real standard, no one is going to develop software to process sentences by searching for spans that might be labeled "sentence" or "sent" or "stc" or who knows what else. Only in the presence of a standard tag, can developers use these tags to improve translation, or emphasize phrasing and sentence structure for improved readability. Mr. Hawkes-Lewis wrote: >The web corpus is not going to get marked up with phrases and >sentences in the absence of NLP advances that would make such markup >mostly redundant. Natural Language Processing is riddled with problems, and there is nothing to suggest that this will change in the near future. On the other hand, someone who is authoring content is in the perfect situation to accurately identify sentences or phrases. NLP can be an aid to that user, and can provide hints to help them select sentence structure. But as I said above, no such software would ever be developed to use NLP to aid users in marking sentence structure unless there were already dedicated sentence and phrase tags. So in essence, you are correct, but only because you're argument is a self-fulfilling prophesy. You also suggest simply using a CSS pseudo-tag, and relying on the unicode sentence breaking conventions. However, looking at these conventions, they are just another attempt at some sort of automated processing, and they acknowledge that this will not work for all cases. This is just one more argument in favor of giving content providers the ability to accurately mark up sentence structure. I'll further note that any form of automated NLP is wholly inadequate when it comes to users interested simply in formatting control issues. Giving them a mechanism that does not provide control over where and when content will be formatted (other than some outside algorithm they don't control) is not providing any real control over formatting. If you are saying that you don't think most people will bother, that is probably true. But that doesn't mean that there aren't people with a legitimate and important interest. So back to the original question, are these tags necessary? I would now say yes, these tags are necessary to the development of software tools to aid users in marking sentence structure, and they are necessary to the development of tools that allow content providers to improve readability of their web pages for several classes of web users. tom
Received on Thursday, 12 April 2012 20:48:36 UTC