Fwd: Re: [Wikidata-l] Provenance tracking on the Web with NIF-URIs

From: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de> · Date: Thu, 21 Jun 2012 22:04:00 +0200

Dear Provenance group,
there was a discussion at WikiData, which lead to contacting you:
http://lists.wikimedia.org/pipermail/wikidata-l/2012-May/000475.html
http://lists.wikimedia.org/pipermail/wikidata-l/2012-May/000478.html
http://lists.wikimedia.org/pipermail/wikidata-l/2012-May/000566.html
http://lists.wikimedia.org/pipermail/wikidata-l/2012-June/000751.html
...

You are tracking provenance on the resource level. in NIF Fragments of 
resources are used as subject in RDF.
Hence you could consider for inclusion, if it is not a too far stretch, 
and if there is enough time left.
You could read here for a start: 
http://lists.wikimedia.org/pipermail/wikidata-l/2012-May/000475.html
or here http://svn.aksw.org/papers/2012/WWW_NIF/public/string_ontology.pdf

All the best,
Sebastian

-------- Original Message --------
Subject: 	Re: [Wikidata-l] Provenance tracking on the Web with NIF-URIs
Date: 	Thu, 21 Jun 2012 20:34:14 +0100
From: 	Barry Norton <barry.norton@ontotext.com>
To: 	Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
CC: 	Discussion list for the Wikidata project. 
<wikidata-l@lists.wikimedia.org>

As excused I wasn't really following your discussion, but indeed if
you're giving URIs to these fragments...

Barry

On 21/06/2012 20:29, Sebastian Hellmann wrote:
>  Hi Barry,
>
>  On 06/21/2012 08:51 PM, Barry Norton wrote:
>>
>>  Sorry to jump in (without really understanding the context), but you
>>  guys saw this today, right?
>>  http://www.w3.org/TR/2012/WD-prov-aq-20120619/
>  It seems to be very unrelated. That is only resource-level, right?
>  "Fundamentally, provenance information
>  <http://www.w3.org/TR/2012/WD-prov-aq-20120619/#dfn-provenance-information>
>  is /about/ resource
>  <http://www.w3.org/TR/2012/WD-prov-aq-20120619/#dfn-resource>s." So
>  you would need a subject first. How do you say that the fact you just
>  added to WikiData comes from a specific fragment of a resource?
>  i.e. http://www.w3.org/DesignIssues/LinkedData.html#offset_717_729 the
>  first occurence of "Semantic Web"
>
>  Do you suggest, that NIF URIs might be standardized by inclusion in
>  the PROV-AQ ? Might work. It could be compatible.
>
>  Sebastian
>
>>
>>  Barry
>>
>>
>>  On 21/06/2012 19:05, Sebastian Hellmann wrote:
>>>  Hello Denny,
>>>  I was traveling for the past few weeks and can finally answer your
>>>  email.
>>>  See my comments inline.
>>>
>>>  On 05/29/2012 05:25 PM, Denny VrandeÄ?iÄ? wrote:
>>>>  Hello Sebastian,
>>>>
>>>>
>>>>  Just a few questions - as you note, it is easier if we all use the
>>>>  same
>>>>  standards, and so I want to ask about the relation to other related
>>>>  standards:
>>>>  * I understand that you dismiss IETF RFC 5147 because it is not stable
>>>>  enough, right?
>>>  The offset scheme of NIF is built on this RFC.
>>>  So the following would hold:
>>>  @prefix ld:<http://www.w3.org/DesignIssues/LinkedData.html#>  .
>>>  @prefix owl:<http://www.w3.org/2002/07/owl#>  .
>>>  ld:offset_717_729  owl:sameAs ld:char=717,12 .
>>>
>>>
>>>  We might change the syntax and reuse the RFC syntax, but it has
>>>  several issues:
>>>  1.  The optional part is not easy to handle, because you would need
>>>  to add owl:sameAs statements:
>>>
>>>  ld:char=717,12;length=12,UTF-8 owl:sameAs ld:char=717,12;length=12 .
>>>  ld:char=717,12;length=12,UTF-8 owl:sameAs ld:char=717,12 .
>>>  ld:char=717,12;UTF-8 owl:sameAs ld:char=717,12;length=9876 .
>>>
>>>  So theoretically ok, but annoying to implement and check.
>>>
>>>  2. When implementing web services, NIF allows the client to choose
>>>  the prefix:
>>>  http://nlp2rdf.lod2.eu/demo/NIFStemmer?input-type=text&nif=true&prefix=http%3A%2F%2Fthis.is%2Fa%2Fslash%2Fprefix%2F&urirecipe=offset&input=President+Obama+is+president.
>>>
>>>  returning URIs like<http://this.is/a/slash/prefix/offset_10_15>
>>>  So RFC 5147 would look like:
>>>  <http://this.is/a/slash/prefix/char=717,12>
>>>  <http://this.is/a/slash/prefix/char=717,12;UTF-8>
>>>  or
>>>  <http://this.is/a/slash/prefix?char=717,12>
>>>  <http://this.is/a/slash/prefix?char=717,12;UTF-8>
>>>
>>>  3. Character like = , prevent the use of prefixes:
>>>  echo "@prefix ld:<http://www.w3.org/DesignIssues/LinkedData.html#>  .
>>>  @prefix owl:<http://www.w3.org/2002/07/owl#>  .
>>>  ld:offset_717_729  owl:sameAs ld:char=717,12 .
>>>  ">  test.ttl ; rapper -i turtle  test.ttl
>>>
>>>  4. implementation is a little bit more difficult, given that :
>>>  $arr = split("_", "offset_717_729") ;
>>>  switch ($arr[0]){
>>>      case 'offset' :
>>>          $begin = $arr[1];
>>>          $end = $arr[2];
>>>          break;
>>>      case 'hash' :
>>>          $clength = $arr[1];
>>>          $slength = $arr[2];
>>>          $hash = $arr[3];
>>>          $rest = /*merge remaining with '_' */
>>>          break;
>>>  }
>>>
>>>  5. RFC assumes a certain mime type, i.e. plain text. NIF does have a
>>>  broader assumption.
>>>>  * what is the relation to the W3C media fragment URIs? Did not find a
>>>>  pointer there.
>>>  They are designed for media such as images, video, not strings.
>>>  Potentially, the same principle can be applied, but it is not yet
>>>  engineered/researched.
>>>>  * any plans of standardizing your approach?
>>>  We will do NIF 2.0  as a community standard and finish it in a
>>>  couple of months. It will be published under open licences, so
>>>  anybody W3C or ISO might pick it up, easily. Other than that there
>>>  are plans by several EU projects (see e.g. here
>>>  http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Jun/0101.html)
>>>  and a US project to use it and there are several third party
>>>  implementations, already.  We would rather have it adopted first on
>>>  a large scale and then standardized, properly, i.e. W3C. This worked
>>>  quite well for the FOAF project or for RDB2RDF Mappers.
>>>  Chances for fast standardization are not so unlikely, I would assume.
>>>>  We would strongly prefer to just use a standard instead of advocating
>>>>  contenders for one -- if one exists.
>>>  You might want to look at:
>>>  http://www.w3.org/community/openannotation/wiki/TextCommentOnWebPage
>>>  and the same highlighting here:
>>>  http://pcai042.informatik.uni-leipzig.de/~swp12-9/vorprojekt/index.php?annotation_request=http%3A%2F%2Fwww.w3.org%2FDesignIssues%2FLinkedData.html%23hash_10_12_60f02d3b96c55e137e13494cf9a02d06_Semantic%2520Web
>>>
>>>
>>>  NIF equivalent (4 triples instad of 14 and only one generated uuid):
>>>  ld:hash_10_12_60f02d3b96c55e137e13494cf9a02d06_Semantic%20Web a
>>>  str:String ;
>>>      oa:hasBody [
>>>          oa:annotator<mailto:Bob>  ;
>>>          cnt:chars "Hey Tim, good idea that Semantic Web!" .
>>>      ]
>>>
>>>  So you might not think in a "contender" way. Approaches are
>>>  complementary. NIF is simpler and the URIs have some features that
>>>  might be wanted (stability, uniqueness, easy to implement).
>>>  This is why I was asking for your *use case* .
>>>
>>>  Note that: there are still some problems, when annotating DOM with
>>>  URIs, e.g. xPointer is abandoned and was never finished. Xpath has
>>>  its limits and is also expensive (i.e. SAX not possible).
>>>  I think there is no proper solution as of now.
>>>  All the best,
>>>  Sebastian
>>>
>>>>  Cheers,
>>>>  Denny
>>>>
>>>>
>>>>
>>>>
>>>>  2012/5/18 Sebastian Hellmann<hellmann@informatik.uni-leipzig.de>
>>>>
>>>>>  Hello again,
>>>>>  maybe the question, I asked was lost, as the text was TL;DR
>>>>>
>>>>>  I heard that, it is planned to track provenance of facts. e.g.
>>>>>  Berlin has
>>>>>  3,337,000 citizens found
>>>>>  here:http://www.worldatlas.com/**citypops.htm<http://www.worldatlas.com/citypops.htm>
>>>>>  Do you have a place where the use case and the requirements are
>>>>>  documented
>>>>>  for this? Or is it out of scope?
>>>>>  Will it be course grained, i.e. website level ? Or fine grained,
>>>>>  i.e. text
>>>>>  paragraph level? See e.g. how Berlin is highlighted here:
>>>>>  http://pcai042.informatik.uni-**leipzig.de/~swp12-9/**
>>>>>  vorprojekt/index.php?**annotation_request=http%3A%2F%**
>>>>>  2Fwww.worldatlas.com%**2Fcitypops.htm%23hash_4_30_**
>>>>>  7449e732716c8e68842289bf2e6667**d5_Berlin%2C%2520Germany%2520-**%25203%2C<http://pcai042.informatik.uni-leipzig.de/~swp12-9/vorprojekt/index.php?annotation_request=http%3A%2F%2Fwww.worldatlas.com%2Fcitypops.htm%23hash_4_30_7449e732716c8e68842289bf2e6667d5_Berlin%2C%2520Germany%2520-%25203%2C>
>>>>>
>>>>>  in this very early prototype.
>>>>>
>>>>>  Could you give me a link were I can read more about any Wikidata
>>>>>  plans
>>>>>  towards this direction?
>>>>>  Sebastian
>>>>>
>>>>>
>>>>>
>>>>>  On 05/16/2012 09:10 AM, Sebastian Hellmann wrote:
>>>>>
>>>>>>  Dear all,
>>>>>>  (Note: I could not find the document, where your requirements
>>>>>>  regarding
>>>>>>  the tracking of facts on the web are written, so I am giving a
>>>>>>  general
>>>>>>  introduction to NIF. Please send me a link to the document that
>>>>>>  specifies
>>>>>>  your need for tracing facts on the web, thanks)
>>>>>>
>>>>>>  I would like to point your attention to the URIs used in the NLP
>>>>>>  Interchange Format (NIF).
>>>>>>  NIF-URIs are quite easy to use, understand and implement. NIF has a
>>>>>>  one-triple-per-annotation paradigm.  The latest documentation can
>>>>>>  be found
>>>>>>  here:
>>>>>>  http://svn.aksw.org/papers/**2012/WWW_NIF/public/string_**ontology.pdf<http://svn.aksw.org/papers/2012/WWW_NIF/public/string_ontology.pdf>
>>>>>>
>>>>>>
>>>>>>  The basic idea is to use URIs with hash fragment ids to annotate
>>>>>>  or mark
>>>>>>  pages on the web:
>>>>>>  An example is the first occurrence of "Semantic Web" on
>>>>>>  http://www.w3.org/**DesignIssues/LinkedData.html<http://www.w3.org/DesignIssues/LinkedData.html>
>>>>>>  as highlighted here:
>>>>>>  http://pcai042.informatik.uni-**leipzig.de/~swp12-9/**
>>>>>>  vorprojekt/index.php?**annotation_request=http%3A%2F%**
>>>>>>  2Fwww.w3.org%2FDesignIssues%**2FLinkedData.html%23hash_10_**12_**
>>>>>>  60f02d3b96c55e137e13494cf9a02d**06_Semantic%2520Web<http://pcai042.informatik.uni-leipzig.de/~swp12-9/vorprojekt/index.php?annotation_request=http%3A%2F%2Fwww.w3.org%2FDesignIssues%2FLinkedData.html%23hash_10_12_60f02d3b96c55e137e13494cf9a02d06_Semantic%2520Web>
>>>>>>
>>>>>>
>>>>>>  Here is a NIF example for linking a part of the document to the
>>>>>>  DBpedia
>>>>>>  entry of the Semantic Web:
>>>>>>  <http://www.w3.org/**DesignIssues/LinkedData.html#**offset_717_729<http://www.w3.org/DesignIssues/LinkedData.html#offset_717_729>
>>>>>>
>>>>>>        a str:StringInContext ;
>>>>>>        sso:oen
>>>>>>  <http://dbpedia.org/resource/**Semantic_Web<http://dbpedia.org/resource/Semantic_Web>>
>>>>>>  .
>>>>>>
>>>>>>
>>>>>>  We are currently preparing a new draft for the spec 2.0. The old
>>>>>>  one can
>>>>>>  be found here:
>>>>>>  http://nlp2rdf.org/nif-1-0/
>>>>>>
>>>>>>  There are several EU projects that intend to use NIF.
>>>>>>  Furthermore, it is
>>>>>>  easier for everybody, if we standardize a Web annotation format
>>>>>>  together.
>>>>>>  Please give feedback of your use cases.
>>>>>>  All the best,
>>>>>>  Sebastian
>>>>>>
>>>>>>
>>>>>  -- 
>>>>>  Dipl. Inf. Sebastian Hellmann
>>>>>  Department of Computer Science, University of Leipzig
>>>>>  Projects:http://nlp2rdf.org ,http://dbpedia.org
>>>>>  Homepage:http://bis.informatik.uni-**leipzig.de/SebastianHellmann<http://bis.informatik.uni-leipzig.de/SebastianHellmann>
>>>>>
>>>>>  Research Group:http://aksw.org
>>>>>
>>>>>
>>>>>  ______________________________**_________________
>>>>>  Wikidata-l mailing list
>>>>>  Wikidata-l@lists.wikimedia.org
>>>>>  https://lists.wikimedia.org/**mailman/listinfo/wikidata-l<https://lists.wikimedia.org/mailman/listinfo/wikidata-l>
>>>>>
>>>>>
>>>>
>>>>
>>>>  _______________________________________________
>>>>  Wikidata-l mailing list
>>>>  Wikidata-l@lists.wikimedia.org
>>>>  https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>>>
>>>
>>>  -- 
>>>  Dipl. Inf. Sebastian Hellmann
>>>  Department of Computer Science, University of Leipzig
>>>  Projects:http://nlp2rdf.org ,http://dbpedia.org
>>>  Homepage:http://bis.informatik.uni-leipzig.de/SebastianHellmann
>>>  Research Group:http://aksw.org
>>>
>>>
>>>  _______________________________________________
>>>  Wikidata-l mailing list
>>>  Wikidata-l@lists.wikimedia.org
>>>  https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>>
>>
>>
>>
>>
>>  _______________________________________________
>>  Wikidata-l mailing list
>>  Wikidata-l@lists.wikimedia.org
>>  https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>
>
>  -- 
>  Dipl. Inf. Sebastian Hellmann
>  Department of Computer Science, University of Leipzig
>  Projects:http://nlp2rdf.org  ,http://dbpedia.org
>  Homepage:http://bis.informatik.uni-leipzig.de/SebastianHellmann
>  Research Group:http://aksw.org