Re: ideas for standard JSON-based semantic representation

Hi Debbie,
Yes, that is an excellent idea. It is essential to clear up these terminology issues and it would be useful to extend the terms to include greater level of detail as provided by EMMA.

Regards,

Michael McTear
Emeritus Professor of Knowledge Engineering
Ulster University
https://www.ulster.ac.uk/staff/mf-mctear

http://www.spokenlanguagetechnology.com/


Conversational Interaction Conference, San Jose, March 11-12, 2019
http://www.conversationalinteraction.com/program



From: Deborah Dahl <Dahl@conversational-Technologies.com>
Date: Wednesday, 28 November 2018 at 14:40
To: 'Dirk Schnelle-Walka' <dirk.schnelle@jvoicexml.org>
Cc: "public-voiceinteraction@w3.org" <public-voiceinteraction@w3.org>
Subject: RE: ideas for standard JSON-based semantic representation
Resent-From: <public-voiceinteraction@w3.org>
Resent-Date: Wednesday, 28 November 2018 at 14:40

Thanks for your comments. Yes, we thought about a JSON serialization of EMMA in the last WD of EMMA, but we never got around to actually proposing what it might look like. This proposal is an attempt to move that idea forward, with the addition of  terminology for “intents” and “entities”, because all the toolkits use something like that (they might call “entities” “slots” or “concepts”, but I would argue that all those terms refer to basically the same thing). Because EMMA is explicitly agnostic as to the actual application-specific semantics, this proposal is a bit of an extension to EMMA.

From: Dirk Schnelle-Walka <dirk.schnelle@jvoicexml.org>
Sent: Tuesday, November 27, 2018 7:02 PM
To: Deborah Dahl <dahl@conversational-technologies.com>
Cc: public-voiceinteraction@w3.org
Subject: Re: ideas for standard JSON-based semantic representation

Thank you, Deborah. This looks like a great start.

Conceptually, I fully agree that the EMMA format is suited to transfer the semester interpretation from the various NLU toolkits that are available. In contrast to that these toolkits usually rely on the JSON format.

So, a mapping would be helpful and was already thought of when authoring the EMMA standard:

"Not addressed in this draft, but planned for a later Working Draft of EMMA 2.0, is a JSON serialization of EMMA documents for use in contexts were JSON is better suited than XML for representing user inputs and system outputs."

One of the questions that ought to be addressed is indeed: How to come up with an easy bridge among these 2 formats. Technically, this is pretty easy.

But how much of this mapping do we really need? Despite the fact that EMMA documents cannot be fully represented in JSON as stated above, EMMA is already prepared to carry JSON formatted semantic interpretation via emma:result-format="application/json"

Just some first basic ideas in a sleepless night.

Dirk


Am 27.11.2018 20:47 schrieb Deborah Dahl <dahl@conversational-technologies.com<mailto:dahl@conversational-technologies.com>>:

There are currently quite a few cloud-based natural language application development toolkits, all with their own proprietary result formats, even though their functionality doesn’t differ too much. Proprietary formats shouldn’t be necessary. It would be extremely useful to have a standard representation for natural language results for many reasons; for example, to make it easier to switch vendors and to encourage the development of third-party natural language development tools. The EMMA standard (https://w3c.github.io/emma/emma2_0/emma_2_0_editor_draft.html) was developed for representing semantic results and has the ability to represent a rich set of metadata about semantic processing. EMMA would be a good option for use as a standard with current toolkits. However, EMMA is an XML format and all of the current toolkit result formats are based on JSON, which is very popular with developers. I think it should be possible to develop a JSON format that captures the kind of information that’s contained in EMMA. To that end,  I put together a writeup with some suggestions for representing natural language results using JSON syntax and added it to the Voice Interaction GitHub repository

HTML rendered version: https://w3c.github.io/voiceinteraction/voice%20interaction%20drafts/emmaJSON.htm


Repository: https://github.com/w3c/voiceinteraction/tree/master/voice%20interaction%20drafts/emmaJSON.htm <https://github.com/w3c/voiceinteraction/tree/master/voice%20interaction%20drafts/emmaJSON.htm%20%0d>



Please take a look and send comments to this list, or post them in the group wiki, https://github.com/w3c/voiceinteraction/wiki/Home/_edit


We have the option to eventually publish some version of this as a Community Group report.





This email and any attachments are confidential and intended solely for the use of the addressee and may contain information which is covered by legal, professional or other privilege. If you have received this email in error please notify the system manager at postmaster@ulster.ac.uk and delete this email immediately. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Ulster University.
The University's computer systems may be monitored and communications carried out on them may be recorded to secure the effective operation of the system and for other lawful purposes. Ulster University does not guarantee that this email or any attachments are free from viruses or 100% secure. Unless expressly stated in the body of a separate attachment, the text of email is not intended to form a binding contract. Correspondence to and from the University may be subject to requests for disclosure by 3rd parties under relevant legislation.
The Ulster University was founded by Royal Charter in 1984 and is registered with company number RC000726 and VAT registered number GB672390524.The primary contact address for Ulster University in Northern Ireland is Cromore Road, Coleraine, Co. Londonderry BT52 1SA

Received on Wednesday, 28 November 2018 14:57:38 UTC