[Minutes] RAX CG 2016-11-25 from Felix Sasaki on 2016-11-25 (public-rax@w3.org from November 2016)

From: Felix Sasaki <fsasaki@w3.org>
Date: Fri, 25 Nov 2016 14:46:11 +0100
To: public-rax@w3.org
Message-Id: <938B091E-E5D8-454D-BA82-C6B0F2DA40ED@w3.org>
See

https://www.w3.org/2016/11/25-rax-minutes.html

and below as text. We discussed the two use cases from Christopher

https://www.w3.org/community/rax/wiki/Draft_Material#Data_acquisition_from_job_postings_via_GATE 
https://www.w3.org/community/rax/wiki/Draft_Material#AutomationML_industry_automation_models_integration

and issues with converting (potentially with back conversion = round tripping) from XML/HTML to RDF. From that we may derive some general patterns that may be worth documenting. We will provide examples of input output in the github - feel free to do the same. Next call would be 9 December. 

Best,

Felix 

   [1]W3C

      [1] http://www.w3.org/

                               - DRAFT -

                                 rax cg

25 Nov 2016

   [2]Agenda

      [2] https://lists.w3.org/Archives/Public/public-rax/2016Nov/0008.html

   See also: [3]IRC log

      [3] http://www.w3.org/2016/11/25-rax-irc

Attendees

   Present
          philr, felix, timea, christoph

   Regrets
          christian, gerard, jose

   Chair
          phil

   Scribe
          fsasaki

Contents

     * [4]Topics
         1. [5]meeting start
         2. [6]bdva summit
         3. [7]AOB
     * [8]Summary of Action Items
     * [9]Summary of Resolutions
     __________________________________________________________

meeting start

   phil: did a review of use cases this morning. not too much
   change, missed one that christoph added.

   [10]https://www.w3.org/community/rax/wiki/Draft_Material#Data_a
   cquisition_from_job_postings_via_GATE

     [10] https://www.w3.org/community/rax/wiki/Draft_Material#Data_acquisition_from_job_postings_via_GATE

   phil: thanks a lot for adding this, christoph - can you give a
   brief description?

   christoph: sure. have not yet managed to share the
   descriptions, I have more material, and will get it done to
   share this
   ... will also add more concrete examples. Application setting
   is: we collect job postings in the form of plain text from the
   web
   ... we do named entity recognition with gate, and we get XML
   output
   ... begining and end of each token is annotated

   <clange> text text text <start/>recognised entity<end/> text
   text

   christoph: see above XML example. this has to be translated to
   RDF

   <clange> <start id="foo"/>

   <clange> <start href="#foo"/>

   christoph: start and end tags look like the above

   <clange> ids or refs (forgot which direction) are in these
   start/end tags

   christoph: we are using XSLT based tool I developed (trextor)
   to create RDF. it is quite hard

   <clange> krextor

   christoph: with XPath it is hard to select elements between
   start and end tags
   ... that is a bit tricky, you need a good knowledge of XPath,
   the sibling axis' etc.
   ... in context of European project, in which another partner is
   doing the extraction

   phil: is this similar to Martynas case?

   christopher: in terms of Xpath complexity, yes
   ... general XML to RDF transformation issue?

   [11]https://github.com/fsasaki/its20-extractor/tree/master/wiki
   pedia-extractor

     [11] https://github.com/fsasaki/its20-extractor/tree/master/wikipedia-extractor

   <philr> felix: I've written various converters

   <philr> ...it is always special case issues

   <philr> ...XML has various ways to include content

   <philr> ...special purpose handling is somwhat unavoidable

   <philr> ...example documents with guideance would be useful

   scribe: may be useful to give guidance on how to handle various
   cases

   christopher: there are patterns, e.g. parent child relations in
   XML and RDF properties
   ... for this you can provide a high level translation patterns

   <philr> clange: High level translation is possible with simple
   parent-child relationships

   <philr> felix: mixture of text and element nodes is challenging

   [12]https://github.com/fsasaki/its20-extractor/blob/master/wiki
   pedia-extractor/its-ta-2-nif-wikipedia.xsl#L43

     [12] https://github.com/fsasaki/its20-extractor/blob/master/wikipedia-extractor/its-ta-2-nif-wikipedia.xsl#L43

   <clange> fsasaki: handling of specific links (specific to wiki
   markup)

   phil: in FREME project we are also doing named entity
   recognition on plain text. our services are capable of
   returning turtle files, but we can cover many formats

   [13]https://api-dev.freme-project.eu/ckeditor-dev/ckeditor/samp
   les/freme.html

     [13] https://api-dev.freme-project.eu/ckeditor-dev/ckeditor/samples/freme.html

   various types of output, inline or external using json-ld

   <scribe> ACTION: felix to provide examples of round tripping as
   done in the freme project [recorded in
   [14]http://www.w3.org/2016/11/25-rax-minutes.html#action01]

     [14] http://www.w3.org/2016/11/25-rax-minutes.html#action01]

bdva summit

   <philr> felix: to collect information on what better tooling is
   needed

   <philr> ...best practices abd standardization

   <philr> ...1.5 hour session on requirements

   <philr> clange: is there more I can do if I do not attend the
   summit?

   <philr> felix: it would be good if someone from your
   organization could attend

   <philr> ...questionnaire to bdva members but want input from
   companies

   <philr> Is there a fee to join bdva?

   felix: yes, will send info on that

   <clange> fsasaki 14:29: EU is not necessarily interested in new
   standards being developed, but in existing standards to be
   _applied_ in a better way

   thanks, clange

   discussion on automationML use case

   felix will send further infos on BDVA around

AOB

   next meeting 9th of December

   phil cannot make it, christian to chair

Summary of Action Items

   [NEW] ACTION: felix to provide examples of round tripping as
   done in the freme project [recorded in
   [15]http://www.w3.org/2016/11/25-rax-minutes.html#action01]

     [15] http://www.w3.org/2016/11/25-rax-minutes.html#action01

Summary of Resolutions

   [End of minutes]
     __________________________________________________________


    Minutes formatted by David Booth's [16]scribe.perl version
    1.148 ([17]CVS log)
    $Date: 2016/11/25 13:41:09 $
     __________________________________________________________

     [16] http://dev.w3.org/cvsweb/~checkout~/2002/scribe/scribedoc.htm
     [17] http://dev.w3.org/cvsweb/2002/scribe/

Scribe.perl diagnostic output

   [Delete this section before finalizing the minutes.]
This is scribe.perl Revision: 1.148  of Date: 2016/10/11 12:55:14
Check for newer version at [18]http://dev.w3.org/cvsweb/~checkout~/2002/
scribe/

     [18] http://dev.w3.org/cvsweb/~checkout~/2002/scribe/

Guessing input format: RRSAgent_Text_Format (score 1.00)

Succeeded: s/this/this, christoph/
No ScribeNick specified.  Guessing ScribeNick: fsasaki
Inferring Scribes: fsasaki
Present: philr felix timea christoph
Regrets: christian gerard jose
Agenda: [19]https://lists.w3.org/Archives/Public/public-rax/2016Nov/0008
.html
Got date from IRC log name: 25 Nov 2016
Guessing minutes URL: [20]http://www.w3.org/2016/11/25-rax-minutes.html
People with action items: felix

     [19] https://lists.w3.org/Archives/Public/public-rax/2016Nov/0008.html
     [20] http://www.w3.org/2016/11/25-rax-minutes.html


   [End of [21]scribe.perl diagnostic output]

     [21] http://dev.w3.org/cvsweb/~checkout~/2002/scribe/scribedoc.htm
Received on Friday, 25 November 2016 13:46:26 UTC