RE: ideas for standard JSON-based semantic representation

Thanks for your comments. Yes, we thought about a JSON serialization of EMMA in the last WD of EMMA, but we never got around to actually proposing what it might look like. This proposal is an attempt to move that idea forward, with the addition of  terminology for “intents” and “entities”, because all the toolkits use something like that (they might call “entities” “slots” or “concepts”, but I would argue that all those terms refer to basically the same thing). Because EMMA is explicitly agnostic as to the actual application-specific semantics, this proposal is a bit of an extension to EMMA.

 

From: Dirk Schnelle-Walka <dirk.schnelle@jvoicexml.org> 
Sent: Tuesday, November 27, 2018 7:02 PM
To: Deborah Dahl <dahl@conversational-technologies.com>
Cc: public-voiceinteraction@w3.org
Subject: Re: ideas for standard JSON-based semantic representation

 

Thank you, Deborah. This looks like a great start.

 

Conceptually, I fully agree that the EMMA format is suited to transfer the semester interpretation from the various NLU toolkits that are available. In contrast to that these toolkits usually rely on the JSON format.

 

So, a mapping would be helpful and was already thought of when authoring the EMMA standard:

 

"Not addressed in this draft, but planned for a later Working Draft of EMMA 2.0, is a JSON serialization of EMMA documents for use in contexts were JSON is better suited than XML for representing user inputs and system outputs."

 

One of the questions that ought to be addressed is indeed: How to come up with an easy bridge among these 2 formats. Technically, this is pretty easy.

 

But how much of this mapping do we really need? Despite the fact that EMMA documents cannot be fully represented in JSON as stated above, EMMA is already prepared to carry JSON formatted semantic interpretation via emma:result-format="application/json"

 

Just some first basic ideas in a sleepless night.

 

Dirk

 

 

Am 27.11.2018 20:47 schrieb Deborah Dahl <dahl@conversational-technologies.com <mailto:dahl@conversational-technologies.com> >:

There are currently quite a few cloud-based natural language application development toolkits, all with their own proprietary result formats, even though their functionality doesn’t differ too much. Proprietary formats shouldn’t be necessary. It would be extremely useful to have a standard representation for natural language results for many reasons; for example, to make it easier to switch vendors and to encourage the development of third-party natural language development tools. The EMMA standard (https://w3c.github.io/emma/emma2_0/emma_2_0_editor_draft.html) was developed for representing semantic results and has the ability to represent a rich set of metadata about semantic processing. EMMA would be a good option for use as a standard with current toolkits. However, EMMA is an XML format and all of the current toolkit result formats are based on JSON, which is very popular with developers. I think it should be possible to develop a JSON format that captures the kind of information that’s contained in EMMA. To that end,  I put together a writeup with some suggestions for representing natural language results using JSON syntax and added it to the Voice Interaction GitHub repository

HTML rendered version: https://w3c.github.io/voiceinteraction/voice%20interaction%20drafts/emmaJSON.htm 

Repository: https://github.com/w3c/voiceinteraction/tree/master/voice%20interaction%20drafts/emmaJSON.htm  <https://github.com/w3c/voiceinteraction/tree/master/voice%20interaction%20drafts/emmaJSON.htm%20%0d> 

 

Please take a look and send comments to this list, or post them in the group wiki, https://github.com/w3c/voiceinteraction/wiki/Home/_edit 

We have the option to eventually publish some version of this as a Community Group report.

 

 

Received on Wednesday, 28 November 2018 14:40:45 UTC