- From: Felix Sasaki <fsasaki@w3.org>
- Date: Fri, 25 Nov 2016 14:46:11 +0100
- To: public-rax@w3.org
See
https://www.w3.org/2016/11/25-rax-minutes.html
and below as text. We discussed the two use cases from Christopher
https://www.w3.org/community/rax/wiki/Draft_Material#Data_acquisition_from_job_postings_via_GATE
https://www.w3.org/community/rax/wiki/Draft_Material#AutomationML_industry_automation_models_integration
and issues with converting (potentially with back conversion = round tripping) from XML/HTML to RDF. From that we may derive some general patterns that may be worth documenting. We will provide examples of input output in the github - feel free to do the same. Next call would be 9 December.
Best,
Felix
[1]W3C
[1] http://www.w3.org/
- DRAFT -
rax cg
25 Nov 2016
[2]Agenda
[2] https://lists.w3.org/Archives/Public/public-rax/2016Nov/0008.html
See also: [3]IRC log
[3] http://www.w3.org/2016/11/25-rax-irc
Attendees
Present
philr, felix, timea, christoph
Regrets
christian, gerard, jose
Chair
phil
Scribe
fsasaki
Contents
* [4]Topics
1. [5]meeting start
2. [6]bdva summit
3. [7]AOB
* [8]Summary of Action Items
* [9]Summary of Resolutions
__________________________________________________________
meeting start
phil: did a review of use cases this morning. not too much
change, missed one that christoph added.
[10]https://www.w3.org/community/rax/wiki/Draft_Material#Data_a
cquisition_from_job_postings_via_GATE
[10] https://www.w3.org/community/rax/wiki/Draft_Material#Data_acquisition_from_job_postings_via_GATE
phil: thanks a lot for adding this, christoph - can you give a
brief description?
christoph: sure. have not yet managed to share the
descriptions, I have more material, and will get it done to
share this
... will also add more concrete examples. Application setting
is: we collect job postings in the form of plain text from the
web
... we do named entity recognition with gate, and we get XML
output
... begining and end of each token is annotated
<clange> text text text <start/>recognised entity<end/> text
text
christoph: see above XML example. this has to be translated to
RDF
<clange> <start id="foo"/>
<clange> <start href="#foo"/>
christoph: start and end tags look like the above
<clange> ids or refs (forgot which direction) are in these
start/end tags
christoph: we are using XSLT based tool I developed (trextor)
to create RDF. it is quite hard
<clange> krextor
christoph: with XPath it is hard to select elements between
start and end tags
... that is a bit tricky, you need a good knowledge of XPath,
the sibling axis' etc.
... in context of European project, in which another partner is
doing the extraction
phil: is this similar to Martynas case?
christopher: in terms of Xpath complexity, yes
... general XML to RDF transformation issue?
[11]https://github.com/fsasaki/its20-extractor/tree/master/wiki
pedia-extractor
[11] https://github.com/fsasaki/its20-extractor/tree/master/wikipedia-extractor
<philr> felix: I've written various converters
<philr> ...it is always special case issues
<philr> ...XML has various ways to include content
<philr> ...special purpose handling is somwhat unavoidable
<philr> ...example documents with guideance would be useful
scribe: may be useful to give guidance on how to handle various
cases
christopher: there are patterns, e.g. parent child relations in
XML and RDF properties
... for this you can provide a high level translation patterns
<philr> clange: High level translation is possible with simple
parent-child relationships
<philr> felix: mixture of text and element nodes is challenging
[12]https://github.com/fsasaki/its20-extractor/blob/master/wiki
pedia-extractor/its-ta-2-nif-wikipedia.xsl#L43
[12] https://github.com/fsasaki/its20-extractor/blob/master/wikipedia-extractor/its-ta-2-nif-wikipedia.xsl#L43
<clange> fsasaki: handling of specific links (specific to wiki
markup)
phil: in FREME project we are also doing named entity
recognition on plain text. our services are capable of
returning turtle files, but we can cover many formats
[13]https://api-dev.freme-project.eu/ckeditor-dev/ckeditor/samp
les/freme.html
[13] https://api-dev.freme-project.eu/ckeditor-dev/ckeditor/samples/freme.html
various types of output, inline or external using json-ld
<scribe> ACTION: felix to provide examples of round tripping as
done in the freme project [recorded in
[14]http://www.w3.org/2016/11/25-rax-minutes.html#action01]
[14] http://www.w3.org/2016/11/25-rax-minutes.html#action01]
bdva summit
<philr> felix: to collect information on what better tooling is
needed
<philr> ...best practices abd standardization
<philr> ...1.5 hour session on requirements
<philr> clange: is there more I can do if I do not attend the
summit?
<philr> felix: it would be good if someone from your
organization could attend
<philr> ...questionnaire to bdva members but want input from
companies
<philr> Is there a fee to join bdva?
felix: yes, will send info on that
<clange> fsasaki 14:29: EU is not necessarily interested in new
standards being developed, but in existing standards to be
_applied_ in a better way
thanks, clange
discussion on automationML use case
felix will send further infos on BDVA around
AOB
next meeting 9th of December
phil cannot make it, christian to chair
Summary of Action Items
[NEW] ACTION: felix to provide examples of round tripping as
done in the freme project [recorded in
[15]http://www.w3.org/2016/11/25-rax-minutes.html#action01]
[15] http://www.w3.org/2016/11/25-rax-minutes.html#action01
Summary of Resolutions
[End of minutes]
__________________________________________________________
Minutes formatted by David Booth's [16]scribe.perl version
1.148 ([17]CVS log)
$Date: 2016/11/25 13:41:09 $
__________________________________________________________
[16] http://dev.w3.org/cvsweb/~checkout~/2002/scribe/scribedoc.htm
[17] http://dev.w3.org/cvsweb/2002/scribe/
Scribe.perl diagnostic output
[Delete this section before finalizing the minutes.]
This is scribe.perl Revision: 1.148 of Date: 2016/10/11 12:55:14
Check for newer version at [18]http://dev.w3.org/cvsweb/~checkout~/2002/
scribe/
[18] http://dev.w3.org/cvsweb/~checkout~/2002/scribe/
Guessing input format: RRSAgent_Text_Format (score 1.00)
Succeeded: s/this/this, christoph/
No ScribeNick specified. Guessing ScribeNick: fsasaki
Inferring Scribes: fsasaki
Present: philr felix timea christoph
Regrets: christian gerard jose
Agenda: [19]https://lists.w3.org/Archives/Public/public-rax/2016Nov/0008
.html
Got date from IRC log name: 25 Nov 2016
Guessing minutes URL: [20]http://www.w3.org/2016/11/25-rax-minutes.html
People with action items: felix
[19] https://lists.w3.org/Archives/Public/public-rax/2016Nov/0008.html
[20] http://www.w3.org/2016/11/25-rax-minutes.html
[End of [21]scribe.perl diagnostic output]
[21] http://dev.w3.org/cvsweb/~checkout~/2002/scribe/scribedoc.htm
Received on Friday, 25 November 2016 13:46:26 UTC