W3C home > Mailing lists > Public > public-rax@w3.org > November 2016

[Minutes] RAX CG 2016-11-25

From: Felix Sasaki <fsasaki@w3.org>
Date: Fri, 25 Nov 2016 14:46:11 +0100
Message-Id: <938B091E-E5D8-454D-BA82-C6B0F2DA40ED@w3.org>
To: public-rax@w3.org


and below as text. We discussed the two use cases from Christopher


and issues with converting (potentially with back conversion = round tripping) from XML/HTML to RDF. From that we may derive some general patterns that may be worth documenting. We will provide examples of input output in the github - feel free to do the same. Next call would be 9 December. 




      [1] http://www.w3.org/

                               - DRAFT -

                                 rax cg

25 Nov 2016


      [2] https://lists.w3.org/Archives/Public/public-rax/2016Nov/0008.html

   See also: [3]IRC log

      [3] http://www.w3.org/2016/11/25-rax-irc


          philr, felix, timea, christoph

          christian, gerard, jose




     * [4]Topics
         1. [5]meeting start
         2. [6]bdva summit
         3. [7]AOB
     * [8]Summary of Action Items
     * [9]Summary of Resolutions

meeting start

   phil: did a review of use cases this morning. not too much
   change, missed one that christoph added.


     [10] https://www.w3.org/community/rax/wiki/Draft_Material#Data_acquisition_from_job_postings_via_GATE

   phil: thanks a lot for adding this, christoph - can you give a
   brief description?

   christoph: sure. have not yet managed to share the
   descriptions, I have more material, and will get it done to
   share this
   ... will also add more concrete examples. Application setting
   is: we collect job postings in the form of plain text from the
   ... we do named entity recognition with gate, and we get XML
   ... begining and end of each token is annotated

   <clange> text text text <start/>recognised entity<end/> text

   christoph: see above XML example. this has to be translated to

   <clange> <start id="foo"/>

   <clange> <start href="#foo"/>

   christoph: start and end tags look like the above

   <clange> ids or refs (forgot which direction) are in these
   start/end tags

   christoph: we are using XSLT based tool I developed (trextor)
   to create RDF. it is quite hard

   <clange> krextor

   christoph: with XPath it is hard to select elements between
   start and end tags
   ... that is a bit tricky, you need a good knowledge of XPath,
   the sibling axis' etc.
   ... in context of European project, in which another partner is
   doing the extraction

   phil: is this similar to Martynas case?

   christopher: in terms of Xpath complexity, yes
   ... general XML to RDF transformation issue?


     [11] https://github.com/fsasaki/its20-extractor/tree/master/wikipedia-extractor

   <philr> felix: I've written various converters

   <philr> ...it is always special case issues

   <philr> ...XML has various ways to include content

   <philr> ...special purpose handling is somwhat unavoidable

   <philr> ...example documents with guideance would be useful

   scribe: may be useful to give guidance on how to handle various

   christopher: there are patterns, e.g. parent child relations in
   XML and RDF properties
   ... for this you can provide a high level translation patterns

   <philr> clange: High level translation is possible with simple
   parent-child relationships

   <philr> felix: mixture of text and element nodes is challenging


     [12] https://github.com/fsasaki/its20-extractor/blob/master/wikipedia-extractor/its-ta-2-nif-wikipedia.xsl#L43

   <clange> fsasaki: handling of specific links (specific to wiki

   phil: in FREME project we are also doing named entity
   recognition on plain text. our services are capable of
   returning turtle files, but we can cover many formats


     [13] https://api-dev.freme-project.eu/ckeditor-dev/ckeditor/samples/freme.html

   various types of output, inline or external using json-ld

   <scribe> ACTION: felix to provide examples of round tripping as
   done in the freme project [recorded in

     [14] http://www.w3.org/2016/11/25-rax-minutes.html#action01]

bdva summit

   <philr> felix: to collect information on what better tooling is

   <philr> ...best practices abd standardization

   <philr> ...1.5 hour session on requirements

   <philr> clange: is there more I can do if I do not attend the

   <philr> felix: it would be good if someone from your
   organization could attend

   <philr> ...questionnaire to bdva members but want input from

   <philr> Is there a fee to join bdva?

   felix: yes, will send info on that

   <clange> fsasaki 14:29: EU is not necessarily interested in new
   standards being developed, but in existing standards to be
   _applied_ in a better way

   thanks, clange

   discussion on automationML use case

   felix will send further infos on BDVA around


   next meeting 9th of December

   phil cannot make it, christian to chair

Summary of Action Items

   [NEW] ACTION: felix to provide examples of round tripping as
   done in the freme project [recorded in

     [15] http://www.w3.org/2016/11/25-rax-minutes.html#action01

Summary of Resolutions

   [End of minutes]

    Minutes formatted by David Booth's [16]scribe.perl version
    1.148 ([17]CVS log)
    $Date: 2016/11/25 13:41:09 $

     [16] http://dev.w3.org/cvsweb/~checkout~/2002/scribe/scribedoc.htm
     [17] http://dev.w3.org/cvsweb/2002/scribe/

Scribe.perl diagnostic output

   [Delete this section before finalizing the minutes.]
This is scribe.perl Revision: 1.148  of Date: 2016/10/11 12:55:14
Check for newer version at [18]http://dev.w3.org/cvsweb/~checkout~/2002/

     [18] http://dev.w3.org/cvsweb/~checkout~/2002/scribe/

Guessing input format: RRSAgent_Text_Format (score 1.00)

Succeeded: s/this/this, christoph/
No ScribeNick specified.  Guessing ScribeNick: fsasaki
Inferring Scribes: fsasaki
Present: philr felix timea christoph
Regrets: christian gerard jose
Agenda: [19]https://lists.w3.org/Archives/Public/public-rax/2016Nov/0008
Got date from IRC log name: 25 Nov 2016
Guessing minutes URL: [20]http://www.w3.org/2016/11/25-rax-minutes.html
People with action items: felix

     [19] https://lists.w3.org/Archives/Public/public-rax/2016Nov/0008.html
     [20] http://www.w3.org/2016/11/25-rax-minutes.html

   [End of [21]scribe.perl diagnostic output]

     [21] http://dev.w3.org/cvsweb/~checkout~/2002/scribe/scribedoc.htm
Received on Friday, 25 November 2016 13:46:26 UTC

This archive was generated by hypermail 2.3.1 : Friday, 25 November 2016 13:46:27 UTC