Re: [Wikidata-l] Provenance tracking on the Web with NIF-URIs

From: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de> · Date: Fri, 22 Jun 2012 09:02:01 +0200

Hi Timothy,

On 06/21/2012 11:09 PM, Timothy Lebo wrote:
>
>> in NIF Fragments of resources are used as subject in RDF.
>> Hence you could consider for inclusion, if it is not a too far stretch, and if there is enough time left.
>
> What specifically are you proposing the PROV-WG include?
Well, if you have a (web) document and you want to express, that a 
certain part was written by you.
e.g. I have written (with some exceptions) the beginning of 
http://wole2012.eurecom.fr/call-papers
 From "This workshop envisions the Semantic..." until "Natural Language 
Processing and Semantic Web. "

How do you express this with the current work of your group? NIF-URIs 
could fill this spot very well.
All the best,
Sebastian

>
> Thanks for pointing out the NIF work, it will be great to reuse existing models for the strings in documents.
>
> Regards,
> Tim Lebo
>
>
>> You could read here for a start: http://lists.wikimedia.org/pipermail/wikidata-l/2012-May/000475.html
>> or here http://svn.aksw.org/papers/2012/WWW_NIF/public/string_ontology.pdf
>> All the best,
>> Sebastian
>>
>> -------- Original Message --------
>> Subject:	Re: [Wikidata-l] Provenance tracking on the Web with NIF-URIs
>> Date:	Thu, 21 Jun 2012 20:34:14 +0100
>> From:	Barry Norton<barry.norton@ontotext.com>
>> To:	Sebastian Hellmann<hellmann@informatik.uni-leipzig.de>
>> CC:	Discussion list for the Wikidata project.<wikidata-l@lists.wikimedia.org>
>>
>> As excused I wasn't really following your discussion, but indeed if
>> you're giving URIs to these fragments...
>>
>> Barry
>>
>>
>> On 21/06/2012 20:29, Sebastian Hellmann wrote:
>>> Hi Barry,
>>>
>>> On 06/21/2012 08:51 PM, Barry Norton wrote:
>>>> Sorry to jump in (without really understanding the context), but you
>>>> guys saw this today, right?
>>>>
>> http://www.w3.org/TR/2012/WD-prov-aq-20120619/
>>
>>> It seems to be very unrelated. That is only resource-level, right?
>>> "Fundamentally, provenance information
>>>
>> <http://www.w3.org/TR/2012/WD-prov-aq-20120619/#dfn-provenance-information>
>>
>>> is /about/ resource
>>>
>> <http://www.w3.org/TR/2012/WD-prov-aq-20120619/#dfn-resource>
>> s." So
>>> you would need a subject first. How do you say that the fact you just
>>> added to WikiData comes from a specific fragment of a resource?
>>> i.e.
>> http://www.w3.org/DesignIssues/LinkedData.html#offset_717_729
>>   the
>>> first occurence of "Semantic Web"
>>>
>>> Do you suggest, that NIF URIs might be standardized by inclusion in
>>> the PROV-AQ ? Might work. It could be compatible.
>>>
>>> Sebastian
>>>
>>>> Barry
>>>>
>>>>
>>>> On 21/06/2012 19:05, Sebastian Hellmann wrote:
>>>>> Hello Denny,
>>>>> I was traveling for the past few weeks and can finally answer your
>>>>> email.
>>>>> See my comments inline.
>>>>>
>>>>> On 05/29/2012 05:25 PM, Denny VrandeÄ?iÄ? wrote:
>>>>>> Hello Sebastian,
>>>>>>
>>>>>>
>>>>>> Just a few questions - as you note, it is easier if we all use the
>>>>>> same
>>>>>> standards, and so I want to ask about the relation to other related
>>>>>> standards:
>>>>>> * I understand that you dismiss IETF RFC 5147 because it is not stable
>>>>>> enough, right?
>>>>> The offset scheme of NIF is built on this RFC.
>>>>> So the following would hold:
>>>>> @prefix ld:
>> <http://www.w3.org/DesignIssues/LinkedData.html#>
>>   .
>>>>> @prefix owl:
>> <http://www.w3.org/2002/07/owl#>
>>   .
>>>>> ld:offset_717_729  owl:sameAs ld:char=717,12 .
>>>>>
>>>>>
>>>>> We might change the syntax and reuse the RFC syntax, but it has
>>>>> several issues:
>>>>> 1.  The optional part is not easy to handle, because you would need
>>>>> to add owl:sameAs statements:
>>>>>
>>>>> ld:char=717,12;length=12,UTF-8 owl:sameAs ld:char=717,12;length=12 .
>>>>> ld:char=717,12;length=12,UTF-8 owl:sameAs ld:char=717,12 .
>>>>> ld:char=717,12;UTF-8 owl:sameAs ld:char=717,12;length=9876 .
>>>>>
>>>>> So theoretically ok, but annoying to implement and check.
>>>>>
>>>>> 2. When implementing web services, NIF allows the client to choose
>>>>> the prefix:
>>>>>
>> http://nlp2rdf.lod2.eu/demo/NIFStemmer?input-type=text&nif=true&prefix=http%3A%2F%2Fthis.is%2Fa%2Fslash%2Fprefix%2F&urirecipe=offset&input=President+Obama+is+president
>> .
>>>>> returning URIs like
>> <http://this.is/a/slash/prefix/offset_10_15>
>>
>>>>> So RFC 5147 would look like:
>>>>>
>> <http://this.is/a/slash/prefix/char=717,12>
>>
>> <http://this.is/a/slash/prefix/char=717,12;UTF-8>
>>
>>>>> or
>>>>>
>> <http://this.is/a/slash/prefix?char=717,12>
>>
>> <http://this.is/a/slash/prefix?char=717,12;UTF-8>
>>
>>>>> 3. Character like = , prevent the use of prefixes:
>>>>> echo "@prefix ld:
>> <http://www.w3.org/DesignIssues/LinkedData.html#>
>>   .
>>>>> @prefix owl:
>> <http://www.w3.org/2002/07/owl#>
>>   .
>>>>> ld:offset_717_729  owl:sameAs ld:char=717,12 .
>>>>> ">  test.ttl ; rapper -i turtle  test.ttl
>>>>>
>>>>> 4. implementation is a little bit more difficult, given that :
>>>>> $arr = split("_", "offset_717_729") ;
>>>>> switch ($arr[0]){
>>>>>      case 'offset' :
>>>>>          $begin = $arr[1];
>>>>>          $end = $arr[2];
>>>>>          break;
>>>>>      case 'hash' :
>>>>>          $clength = $arr[1];
>>>>>          $slength = $arr[2];
>>>>>          $hash = $arr[3];
>>>>>          $rest = /*merge remaining with '_' */
>>>>>          break;
>>>>> }
>>>>>
>>>>> 5. RFC assumes a certain mime type, i.e. plain text. NIF does have a
>>>>> broader assumption.
>>>>>> * what is the relation to the W3C media fragment URIs? Did not find a
>>>>>> pointer there.
>>>>> They are designed for media such as images, video, not strings.
>>>>> Potentially, the same principle can be applied, but it is not yet
>>>>> engineered/researched.
>>>>>> * any plans of standardizing your approach?
>>>>> We will do NIF 2.0  as a community standard and finish it in a
>>>>> couple of months. It will be published under open licences, so
>>>>> anybody W3C or ISO might pick it up, easily. Other than that there
>>>>> are plans by several EU projects (see e.g. here
>>>>>
>> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Jun/0101.html
>> )
>>>>> and a US project to use it and there are several third party
>>>>> implementations, already.  We would rather have it adopted first on
>>>>> a large scale and then standardized, properly, i.e. W3C. This worked
>>>>> quite well for the FOAF project or for RDB2RDF Mappers.
>>>>> Chances for fast standardization are not so unlikely, I would assume.
>>>>>> We would strongly prefer to just use a standard instead of advocating
>>>>>> contenders for one -- if one exists.
>>>>> You might want to look at:
>>>>>
>> http://www.w3.org/community/openannotation/wiki/TextCommentOnWebPage
>>
>>>>> and the same highlighting here:
>>>>>
>> http://pcai042.informatik.uni-leipzig.de/~swp12-9/vorprojekt/index.php?annotation_request=http%3A%2F%2Fwww.w3.org%2FDesignIssues%2FLinkedData.html%23hash_10_12_60f02d3b96c55e137e13494cf9a02d06_Semantic%2520Web
>>
>>>>>
>>>>> NIF equivalent (4 triples instad of 14 and only one generated uuid):
>>>>> ld:hash_10_12_60f02d3b96c55e137e13494cf9a02d06_Semantic%20Web a
>>>>> str:String ;
>>>>>      oa:hasBody [
>>>>>          oa:annotator
>> <mailto:Bob>
>>   ;
>>>>>          cnt:chars "Hey Tim, good idea that Semantic Web!" .
>>>>>      ]
>>>>>
>>>>> So you might not think in a "contender" way. Approaches are
>>>>> complementary. NIF is simpler and the URIs have some features that
>>>>> might be wanted (stability, uniqueness, easy to implement).
>>>>> This is why I was asking for your *use case* .
>>>>>
>>>>> Note that: there are still some problems, when annotating DOM with
>>>>> URIs, e.g. xPointer is abandoned and was never finished. Xpath has
>>>>> its limits and is also expensive (i.e. SAX not possible).
>>>>> I think there is no proper solution as of now.
>>>>> All the best,
>>>>> Sebastian
>>>>>
>>>>>> Cheers,
>>>>>> Denny
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2012/5/18 Sebastian Hellmann
>> <hellmann@informatik.uni-leipzig.de>
>>
>>>>>>> Hello again,
>>>>>>> maybe the question, I asked was lost, as the text was TL;DR
>>>>>>>
>>>>>>> I heard that, it is planned to track provenance of facts. e.g.
>>>>>>> Berlin has
>>>>>>> 3,337,000 citizens found
>>>>>>> here:
>> http://www.worldatlas.com/**citypops.htm<http://www.worldatlas.com/citypops.htm>
>>
>>>>>>> Do you have a place where the use case and the requirements are
>>>>>>> documented
>>>>>>> for this? Or is it out of scope?
>>>>>>> Will it be course grained, i.e. website level ? Or fine grained,
>>>>>>> i.e. text
>>>>>>> paragraph level? See e.g. how Berlin is highlighted here:
>>>>>>>
>> http://pcai042.informatik.uni-**leipzig.de/~swp12-9/**
>>
>>>>>>> vorprojekt/index.php?**annotation_request=http%3A%2F%**
>>>>>>> 2Fwww.worldatlas.com%**2Fcitypops.htm%23hash_4_30_**
>>>>>>> 7449e732716c8e68842289bf2e6667**d5_Berlin%2C%2520Germany%2520-**%25203%2C
>> <http://pcai042.informatik.uni-leipzig.de/~swp12-9/vorprojekt/index.php?annotation_request=http%3A%2F%2Fwww.worldatlas.com%2Fcitypops.htm%23hash_4_30_7449e732716c8e68842289bf2e6667d5_Berlin%2C%2520Germany%2520-%25203%2C>
>>
>>>>>>> in this very early prototype.
>>>>>>>
>>>>>>> Could you give me a link were I can read more about any Wikidata
>>>>>>> plans
>>>>>>> towards this direction?
>>>>>>> Sebastian
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 05/16/2012 09:10 AM, Sebastian Hellmann wrote:
>>>>>>>
>>>>>>>> Dear all,
>>>>>>>> (Note: I could not find the document, where your requirements
>>>>>>>> regarding
>>>>>>>> the tracking of facts on the web are written, so I am giving a
>>>>>>>> general
>>>>>>>> introduction to NIF. Please send me a link to the document that
>>>>>>>> specifies
>>>>>>>> your need for tracing facts on the web, thanks)
>>>>>>>>
>>>>>>>> I would like to point your attention to the URIs used in the NLP
>>>>>>>> Interchange Format (NIF).
>>>>>>>> NIF-URIs are quite easy to use, understand and implement. NIF has a
>>>>>>>> one-triple-per-annotation paradigm.  The latest documentation can
>>>>>>>> be found
>>>>>>>> here:
>>>>>>>>
>> http://svn.aksw.org/papers/**2012/WWW_NIF/public/string_**ontology.pdf<http://svn.aksw.org/papers/2012/WWW_NIF/public/string_ontology.pdf>
>>
>>>>>>>>
>>>>>>>> The basic idea is to use URIs with hash fragment ids to annotate
>>>>>>>> or mark
>>>>>>>> pages on the web:
>>>>>>>> An example is the first occurrence of "Semantic Web" on
>>>>>>>>
>> http://www.w3.org/**DesignIssues/LinkedData.html<http://www.w3.org/DesignIssues/LinkedData.html>
>>
>>>>>>>> as highlighted here:
>>>>>>>>
>> http://pcai042.informatik.uni-**leipzig.de/~swp12-9/**
>>
>>>>>>>> vorprojekt/index.php?**annotation_request=http%3A%2F%**
>>>>>>>> 2Fwww.w3.org%2FDesignIssues%**2FLinkedData.html%23hash_10_**12_**
>>>>>>>> 60f02d3b96c55e137e13494cf9a02d**06_Semantic%2520Web
>> <http://pcai042.informatik.uni-leipzig.de/~swp12-9/vorprojekt/index.php?annotation_request=http%3A%2F%2Fwww.w3.org%2FDesignIssues%2FLinkedData.html%23hash_10_12_60f02d3b96c55e137e13494cf9a02d06_Semantic%2520Web>
>>
>>>>>>>>
>>>>>>>> Here is a NIF example for linking a part of the document to the
>>>>>>>> DBpedia
>>>>>>>> entry of the Semantic Web:
>>>>>>>> <
>> http://www.w3.org/**DesignIssues/LinkedData.html#**offset_717_729<http://www.w3.org/DesignIssues/LinkedData.html#offset_717_729>
>>
>>>>>>>>        a str:StringInContext ;
>>>>>>>>        sso:oen
>>>>>>>> <
>> http://dbpedia.org/resource/**Semantic_Web<http://dbpedia.org/resource/Semantic_Web>
>>>>>>>> .
>>>>>>>>
>>>>>>>>
>>>>>>>> We are currently preparing a new draft for the spec 2.0. The old
>>>>>>>> one can
>>>>>>>> be found here:
>>>>>>>>
>> http://nlp2rdf.org/nif-1-0/
>>
>>>>>>>> There are several EU projects that intend to use NIF.
>>>>>>>> Furthermore, it is
>>>>>>>> easier for everybody, if we standardize a Web annotation format
>>>>>>>> together.
>>>>>>>> Please give feedback of your use cases.
>>>>>>>> All the best,
>>>>>>>> Sebastian
>>>>>>>>
>>>>>>>>
>>>>>>> -- 
>>>>>>> Dipl. Inf. Sebastian Hellmann
>>>>>>> Department of Computer Science, University of Leipzig
>>>>>>> Projects:
>> http://nlp2rdf.org ,http://dbpedia.org
>>
>>>>>>> Homepage:
>> http://bis.informatik.uni-**leipzig.de/SebastianHellmann<http://bis.informatik.uni-leipzig.de/SebastianHellmann>
>>
>>>>>>> Research Group:
>> http://aksw.org
>>
>>>>>>>
>>>>>>> ______________________________**_________________
>>>>>>> Wikidata-l mailing list
>>>>>>>
>> Wikidata-l@lists.wikimedia.org
>>
>> https://lists.wikimedia.org/**mailman/listinfo/wikidata-l<https://lists.wikimedia.org/mailman/listinfo/wikidata-l>
>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Wikidata-l mailing list
>>>>>>
>> Wikidata-l@lists.wikimedia.org
>>
>> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>>
>>>>>
>>>>> -- 
>>>>> Dipl. Inf. Sebastian Hellmann
>>>>> Department of Computer Science, University of Leipzig
>>>>> Projects:
>> http://nlp2rdf.org ,http://dbpedia.org
>>
>>>>> Homepage:
>> http://bis.informatik.uni-leipzig.de/SebastianHellmann
>>
>>>>> Research Group:
>> http://aksw.org
>>
>>>>>
>>>>> _______________________________________________
>>>>> Wikidata-l mailing list
>>>>>
>> Wikidata-l@lists.wikimedia.org
>>
>> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Wikidata-l mailing list
>>>>
>> Wikidata-l@lists.wikimedia.org
>>
>> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>>
>>>
>>> -- 
>>> Dipl. Inf. Sebastian Hellmann
>>> Department of Computer Science, University of Leipzig
>>> Projects:
>> http://nlp2rdf.org  ,http://dbpedia.org
>>
>>> Homepage:
>> http://bis.informatik.uni-leipzig.de/SebastianHellmann
>>
>>> Research Group:
>> http://aksw.org
>>
>>
>>
>>
>>
>
>

-- 
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Projects: http://nlp2rdf.org , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org