Re: semantics [was Re: Proposed Design Principles updated] from Dan Brickley on 2007-04-06 (www-archive@w3.org from April 2007)

From: Dan Brickley <danbri@danbri.org>
Date: Fri, 06 Apr 2007 13:20:25 +0100
To: Dan Connolly <connolly@w3.org>
Cc: "Dailey, David P." <david.dailey@sru.edu>, Karl Dubost <karl@w3.org>, www-archive@w3.org
Message-ID: <46163B09.5000708@danbri.org>

Dan Connolly wrote:
> -cc public-html; +cc www-archive
> http://lists.w3.org/Archives/Public/www-archive/
> On Wed, 2007-04-04 at 10:55 -0400, Dailey, David P. wrote:
>> Out of curiosity, does anyone interpret our charge or realm as including 
>> that substrate of markup that might connect utterances made in HTML to their
>>  meanings in some natural-language-inferential-logical sense? It would be
>>  rather fun if it did, but perhaps that will be for our next go-round after
>>  this particular HTML-N* has been properly enumerated.
> 
> I agree that it would be rather fun if it did. It's pretty much my
> research focus.

Hi all

http://www.w3.org/TR/2007/REC-semantic-interpretation-20070405/
Semantic Interpretation for Speech Recognition (SISR) Version 1.0
W3C Recommendation 5 April 2007
...seems somehow relevant here, though I've not reviewed it. I wonder if 
there are many people in common between the new HTML WG and the 
community that produced SISR. It's designed for Speech Recognition.

Excerpting from the Abstract:
[[
The results of semantic interpretation describe the meaning of a natural 
language utterance. The current specification represents this 
information as an ECMAScript object, and defines a mechanism to 
serialize the result into [XML]. The W3C Multimodal Interaction Activity 
[MMI] is defining an XML data format [EMMA] for containing and 
annotating the information in user utterances. It is expected that the 
EMMA language will be able to integrate results generated by Semantic 
Interpretation for Speech Recognition.
]]

There are some examples in
http://www.w3.org/TR/2007/REC-semantic-interpretation-20070405/#SI3.2.4
that show a form of normalisation which might perhaps also be applied to 
HTML, eg. for named entity recognition or mapping to RDFa constructs.

eg.   <rule id="yes">
       <one-of>
          <item>yes</item>
          <item>yeah<tag>yes</tag></item>

          <item><token>you bet</token><tag>yes</tag></item>
          <item xml:lang="fr-CA">oui<tag>yes</tag></item>
       </one-of>
    </rule>

I sort of expect I'd see probabilistic info attached to these options, 
but it's not in the examples. The word "probability" only occurs briefly 
in the spec,
[[
Likewise, for every Rule Variable, there is an associated variable 
called score, of type Number, which holds a value that is related to the 
confidence or probability of the corresponding grammar rule or some 
similar measure. Higher score values indicate higher confidence or 
probability over the corresponding grammar rule. Processors that don't 
compute or don't have access to such values must return undefined as the 
score value. Score variables are not part of the Rule Variable and the 
value of the score variables cannot be modified.
]]

At which point i remember the existance of
http://www.w3.org/2005/Incubator/urw3/charter and pop that on my reading 
list for the weekend...

Thinking out loud,

Dan

Received on Friday, 6 April 2007 12:26:52 UTC