Re: [aksw-core] Charter for W3C Community Group Natural Language Interfaces for the Web of Data

Dear Petr, 

thanks for you participating. I will add you suggestions to the charter. Also, the pointers to your datasets are appreciated. We are currently collecting dataset information at https://github.com/aksw/qa-datasets <https://github.com/aksw/qa-datasets>. 
We should collect all your mentioned issues in the first deliverable or blog post to raise the awareness and point out future research directions. 

I also think, that a common format plus a common implementation (in 2 or more languages) would be beneficial to support new comers.

Best regards,
Ricardo

> On 10 Apr 2016, at 04:47, Petr Baudis <pasky@ucw.cz> wrote:
> 
>  Hi!
> 
>  I was actually wondering about what the main purpose and relevance
> of this W3C community would be - but the idea of proposing some common
> reference benchmarks and suites for training+testing machine learned
> information retrieval systems is excellent and makes absolute sense!
> 
>  The most widespread common benchmarks for NLI interfaces for the Web
> of Data are probably:
> 
>  * Semantic Parsing datasets like GeoQuery, Free917, WebQuestions,
>    SimpleQuestions and QALD.
> 
>  * Fulltext Question Answering datasets like the TREC9-12 QA dataset.
> 
>  * Domain-specific datasets (which may be also hybrid), e.g. BioASQ.
> 
>  There are various issues that need to be addressed - e.g. non-even
> sampling of user inputs (say numerical questions have bad coverage),
> issues of evaluating output correctness, temporal instability and
> dependencies on continuously evolving corpora.
> 
>  Plus, getting started requires a lot of preprocessing of the inputs
> (ranging from not sharing a common format to different methods for
> entity linking) which could be common but often isn't shared (or is just
> thrown into public repositories with little or no documentation) and
> that makes things hard for newcomers.
> 
>  And I'm sure others who approach this from different viewpoints will
> see different issues to address.  (My background is in machine learning
> and NLP rather than semantic web and ontologies.)
> 
> 
>  In our activities related to the YodaQA, we are maintaining and
> evolving several datasets, maybe they could serve as starting points
> for some common benchmarks:
> 
>  * https://github.com/brmson/dataset-factoid-curated <https://github.com/brmson/dataset-factoid-curated> for evolution of
>    much cleaned up TREC9-12 dataset; many of these questions cannot
>    be answered by Web of Data (at least for now), though
> 
>  * https://github.com/brmson/dataset-factoid-movies <https://github.com/brmson/dataset-factoid-movies> for domain-specific
>    questions on movies (which makes for an attractive and well defined
>    subset for good coverage)
> 
>  * https://github.com/brmson/dataset-factoid-webquestions <https://github.com/brmson/dataset-factoid-webquestions> for a suite
>    of tools and post-processed versions of the popular WebQuestions
>    datasets
> 
> 
>  I think making progress in commonly accepted benchmarks and query
> datasets would be a valuable contribution of this community!  Would
> others in this community be interested in working towards this?
> 
> On Wed, Apr 06, 2016 at 04:38:19PM +0200, Ricardo Usbeck wrote:
>> Dear all, 
>> 
>> thanks for the online as well as offline discussion. So far, we identified some common points for deliverables of this group:
>> 
>> * At least one common format for benchmarks
>> * At least one test suite for extensive benchmarking of components, e.g., like [2]
>> 
>> We also identified discussions pertaining to:
>> * How and how often to communicate
>> 
>> @Edgard: I think we will focus on protocols like [1]
>> 
>> Furthermore, if you want to edit the charter [3], let me know your github user name. 
>> 
>> Best regards
>> Ricardo
>> 
>> [1] Andreas Both, Dennis Diefenbach, Kuldeep Singh, Saeedeh Shekarpour, Didier Cherix and Christoph Lange. Qanary -- An Extensible Vocabulary for Open Question Answering Systems
>> [2] GERBIL -- General Entity Annotation Benchmark Framework by Ricardo Usbeck, Michael Röder, Axel-Cyrille Ngonga Ngomo, Ciro Baron, Andreas Both, Martin Brümmer, Diego Ceccarelli, Marco Cornolti, Didier Cherix, Bernd Eickmann, Paolo Ferragina, Christiane Lemke, Andrea Moro, Roberto Navigli, Francesco Piccinno, Giuseppe Rizzo, Harald Sack, René Speck, Raphaël Troncy, Jörg Waitelonis, and Lars Wesemann in24th WWW conference
>> [3] https://github.com/Natural-Language-Interfaces-CG/charter
>> 
>> On 16 Mar 2016, at 15:53, Edgard Marx <marx@informatik.uni-leipzig.de> wrote:
>>> 
>>> Hi Ricardo,
>>> 
>>> Thanks for leading the discussion and organization.
>>> 
>>>>> * scope and goals (e.g., an ontology to ease communication of modules across platforms and deployments)
>>> 
>>> First of all, I would like to start a discussion regarding the scope of the working group.
>>> In my opinion, a good start is define some borders.
>>> 
>>> For instance, will the group work in interfaces as (a) Communication Protocols or (b) User Interfaces?
>>> We can even be more specific e.g. In case (a) our work will be just define the message format etc.
>>> 
>>> In my opinion the scope should be in a functionality level of NLP processes e.g. input/output  not even specifying the format.
>>> Program languages does it and work.
>>> 
>>>>> * communication process (monthly telcos?)
>>> 
>>> I think nowadays there are nice social media tools that help people to follow and participate in discussion e.g. Facebook, Doodle.
>>> I would think in organize calls just if it is extremely necessary. however, I am not against in having it :-).
>>> 
>>>>> * deliverables? how to coordinate a specification
>>> Yes, this work just fine, tasks/goals/roles.
>>> 
>>> best regards,
>>> Edgard
>>> 
>>> 
>>> On Wed, Mar 16, 2016 at 9:37 AM, Ricardo Usbeck <usbeck@informatik.uni-leipzig.de <mailto:usbeck@informatik.uni-leipzig.de> <mailto:usbeck@informatik.uni-leipzig.de <mailto:usbeck@informatik.uni-leipzig.de>>> wrote:
>>> *** Please apologise for cross-posting***
>>> 
>>> Dear all,
>>> 
>>> we are currently looking for input to our charter for the W3C Community Group Natural Language Interfaces for the Web of Data https://www.w3.org/community/nli/ <https://www.w3.org/community/nli/> <https://www.w3.org/community/nli/ <https://www.w3.org/community/nli/>>. 
>>> 
>>> The current draft can be found here http://natural-language-interfaces-cg.github.io/charter/charter-nli.md <http://natural-language-interfaces-cg.github.io/charter/charter-nli.md> <http://natural-language-interfaces-cg.github.io/charter/charter-nli.md <http://natural-language-interfaces-cg.github.io/charter/charter-nli.md>>
>>> 
>>> Main issues currently:
>>> * communication process (monthly telcos?)
>>> * scope and goals (e.g., an ontology to ease communication of modules across platforms and deployments)
>>> * deliverables? how to coordinate a specification
>>> And of course, anything else you are interested in to clarify the direction of this CG.
>>> 
>>> 
>>> Feel free to contribute directly to the git repository https://github.com/Natural-Language-Interfaces-CG/charter <https://github.com/Natural-Language-Interfaces-CG/charter> <https://github.com/Natural-Language-Interfaces-CG/charter <https://github.com/Natural-Language-Interfaces-CG/charter>>
>>> 
>>> Best regards,
>>> Ricardo 
>>> 
>>> _______________________________________________
>>> aksw-core mailing list
>>> aksw-core@lists.informatik.uni-leipzig.de <mailto:aksw-core@lists.informatik.uni-leipzig.de> <mailto:aksw-core@lists.informatik.uni-leipzig.de <mailto:aksw-core@lists.informatik.uni-leipzig.de>>
>>> http://lists.informatik.uni-leipzig.de/mailman/listinfo/aksw-core <http://lists.informatik.uni-leipzig.de/mailman/listinfo/aksw-core> <http://lists.informatik.uni-leipzig.de/mailman/listinfo/aksw-core <http://lists.informatik.uni-leipzig.de/mailman/listinfo/aksw-core>>
>>> 
>>> 
>> 
> 
> -- 
>     Petr Baudis
>  If you have good ideas, good data and fast computers,
>  you can do almost anything. -- Geoffrey Hinton

Received on Monday, 11 April 2016 05:44:25 UTC