- From: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
- Date: Fri, 22 Jun 2012 09:02:01 +0200
- To: Timothy Lebo <lebot@rpi.edu>
- CC: public-rdf-prov@w3.org
Hi Timothy, On 06/21/2012 11:09 PM, Timothy Lebo wrote: > >> in NIF Fragments of resources are used as subject in RDF. >> Hence you could consider for inclusion, if it is not a too far stretch, and if there is enough time left. > > What specifically are you proposing the PROV-WG include? Well, if you have a (web) document and you want to express, that a certain part was written by you. e.g. I have written (with some exceptions) the beginning of http://wole2012.eurecom.fr/call-papers From "This workshop envisions the Semantic..." until "Natural Language Processing and Semantic Web. " How do you express this with the current work of your group? NIF-URIs could fill this spot very well. All the best, Sebastian > > Thanks for pointing out the NIF work, it will be great to reuse existing models for the strings in documents. > > Regards, > Tim Lebo > > >> You could read here for a start: http://lists.wikimedia.org/pipermail/wikidata-l/2012-May/000475.html >> or here http://svn.aksw.org/papers/2012/WWW_NIF/public/string_ontology.pdf >> All the best, >> Sebastian >> >> -------- Original Message -------- >> Subject: Re: [Wikidata-l] Provenance tracking on the Web with NIF-URIs >> Date: Thu, 21 Jun 2012 20:34:14 +0100 >> From: Barry Norton<barry.norton@ontotext.com> >> To: Sebastian Hellmann<hellmann@informatik.uni-leipzig.de> >> CC: Discussion list for the Wikidata project.<wikidata-l@lists.wikimedia.org> >> >> As excused I wasn't really following your discussion, but indeed if >> you're giving URIs to these fragments... >> >> Barry >> >> >> On 21/06/2012 20:29, Sebastian Hellmann wrote: >>> Hi Barry, >>> >>> On 06/21/2012 08:51 PM, Barry Norton wrote: >>>> Sorry to jump in (without really understanding the context), but you >>>> guys saw this today, right? >>>> >> http://www.w3.org/TR/2012/WD-prov-aq-20120619/ >> >>> It seems to be very unrelated. That is only resource-level, right? >>> "Fundamentally, provenance information >>> >> <http://www.w3.org/TR/2012/WD-prov-aq-20120619/#dfn-provenance-information> >> >>> is /about/ resource >>> >> <http://www.w3.org/TR/2012/WD-prov-aq-20120619/#dfn-resource> >> s." So >>> you would need a subject first. How do you say that the fact you just >>> added to WikiData comes from a specific fragment of a resource? >>> i.e. >> http://www.w3.org/DesignIssues/LinkedData.html#offset_717_729 >> the >>> first occurence of "Semantic Web" >>> >>> Do you suggest, that NIF URIs might be standardized by inclusion in >>> the PROV-AQ ? Might work. It could be compatible. >>> >>> Sebastian >>> >>>> Barry >>>> >>>> >>>> On 21/06/2012 19:05, Sebastian Hellmann wrote: >>>>> Hello Denny, >>>>> I was traveling for the past few weeks and can finally answer your >>>>> email. >>>>> See my comments inline. >>>>> >>>>> On 05/29/2012 05:25 PM, Denny VrandeÄ?iÄ? wrote: >>>>>> Hello Sebastian, >>>>>> >>>>>> >>>>>> Just a few questions - as you note, it is easier if we all use the >>>>>> same >>>>>> standards, and so I want to ask about the relation to other related >>>>>> standards: >>>>>> * I understand that you dismiss IETF RFC 5147 because it is not stable >>>>>> enough, right? >>>>> The offset scheme of NIF is built on this RFC. >>>>> So the following would hold: >>>>> @prefix ld: >> <http://www.w3.org/DesignIssues/LinkedData.html#> >> . >>>>> @prefix owl: >> <http://www.w3.org/2002/07/owl#> >> . >>>>> ld:offset_717_729 owl:sameAs ld:char=717,12 . >>>>> >>>>> >>>>> We might change the syntax and reuse the RFC syntax, but it has >>>>> several issues: >>>>> 1. The optional part is not easy to handle, because you would need >>>>> to add owl:sameAs statements: >>>>> >>>>> ld:char=717,12;length=12,UTF-8 owl:sameAs ld:char=717,12;length=12 . >>>>> ld:char=717,12;length=12,UTF-8 owl:sameAs ld:char=717,12 . >>>>> ld:char=717,12;UTF-8 owl:sameAs ld:char=717,12;length=9876 . >>>>> >>>>> So theoretically ok, but annoying to implement and check. >>>>> >>>>> 2. When implementing web services, NIF allows the client to choose >>>>> the prefix: >>>>> >> http://nlp2rdf.lod2.eu/demo/NIFStemmer?input-type=text&nif=true&prefix=http%3A%2F%2Fthis.is%2Fa%2Fslash%2Fprefix%2F&urirecipe=offset&input=President+Obama+is+president >> . >>>>> returning URIs like >> <http://this.is/a/slash/prefix/offset_10_15> >> >>>>> So RFC 5147 would look like: >>>>> >> <http://this.is/a/slash/prefix/char=717,12> >> >> <http://this.is/a/slash/prefix/char=717,12;UTF-8> >> >>>>> or >>>>> >> <http://this.is/a/slash/prefix?char=717,12> >> >> <http://this.is/a/slash/prefix?char=717,12;UTF-8> >> >>>>> 3. Character like = , prevent the use of prefixes: >>>>> echo "@prefix ld: >> <http://www.w3.org/DesignIssues/LinkedData.html#> >> . >>>>> @prefix owl: >> <http://www.w3.org/2002/07/owl#> >> . >>>>> ld:offset_717_729 owl:sameAs ld:char=717,12 . >>>>> "> test.ttl ; rapper -i turtle test.ttl >>>>> >>>>> 4. implementation is a little bit more difficult, given that : >>>>> $arr = split("_", "offset_717_729") ; >>>>> switch ($arr[0]){ >>>>> case 'offset' : >>>>> $begin = $arr[1]; >>>>> $end = $arr[2]; >>>>> break; >>>>> case 'hash' : >>>>> $clength = $arr[1]; >>>>> $slength = $arr[2]; >>>>> $hash = $arr[3]; >>>>> $rest = /*merge remaining with '_' */ >>>>> break; >>>>> } >>>>> >>>>> 5. RFC assumes a certain mime type, i.e. plain text. NIF does have a >>>>> broader assumption. >>>>>> * what is the relation to the W3C media fragment URIs? Did not find a >>>>>> pointer there. >>>>> They are designed for media such as images, video, not strings. >>>>> Potentially, the same principle can be applied, but it is not yet >>>>> engineered/researched. >>>>>> * any plans of standardizing your approach? >>>>> We will do NIF 2.0 as a community standard and finish it in a >>>>> couple of months. It will be published under open licences, so >>>>> anybody W3C or ISO might pick it up, easily. Other than that there >>>>> are plans by several EU projects (see e.g. here >>>>> >> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Jun/0101.html >> ) >>>>> and a US project to use it and there are several third party >>>>> implementations, already. We would rather have it adopted first on >>>>> a large scale and then standardized, properly, i.e. W3C. This worked >>>>> quite well for the FOAF project or for RDB2RDF Mappers. >>>>> Chances for fast standardization are not so unlikely, I would assume. >>>>>> We would strongly prefer to just use a standard instead of advocating >>>>>> contenders for one -- if one exists. >>>>> You might want to look at: >>>>> >> http://www.w3.org/community/openannotation/wiki/TextCommentOnWebPage >> >>>>> and the same highlighting here: >>>>> >> http://pcai042.informatik.uni-leipzig.de/~swp12-9/vorprojekt/index.php?annotation_request=http%3A%2F%2Fwww.w3.org%2FDesignIssues%2FLinkedData.html%23hash_10_12_60f02d3b96c55e137e13494cf9a02d06_Semantic%2520Web >> >>>>> >>>>> NIF equivalent (4 triples instad of 14 and only one generated uuid): >>>>> ld:hash_10_12_60f02d3b96c55e137e13494cf9a02d06_Semantic%20Web a >>>>> str:String ; >>>>> oa:hasBody [ >>>>> oa:annotator >> <mailto:Bob> >> ; >>>>> cnt:chars "Hey Tim, good idea that Semantic Web!" . >>>>> ] >>>>> >>>>> So you might not think in a "contender" way. Approaches are >>>>> complementary. NIF is simpler and the URIs have some features that >>>>> might be wanted (stability, uniqueness, easy to implement). >>>>> This is why I was asking for your *use case* . >>>>> >>>>> Note that: there are still some problems, when annotating DOM with >>>>> URIs, e.g. xPointer is abandoned and was never finished. Xpath has >>>>> its limits and is also expensive (i.e. SAX not possible). >>>>> I think there is no proper solution as of now. >>>>> All the best, >>>>> Sebastian >>>>> >>>>>> Cheers, >>>>>> Denny >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> 2012/5/18 Sebastian Hellmann >> <hellmann@informatik.uni-leipzig.de> >> >>>>>>> Hello again, >>>>>>> maybe the question, I asked was lost, as the text was TL;DR >>>>>>> >>>>>>> I heard that, it is planned to track provenance of facts. e.g. >>>>>>> Berlin has >>>>>>> 3,337,000 citizens found >>>>>>> here: >> http://www.worldatlas.com/**citypops.htm<http://www.worldatlas.com/citypops.htm> >> >>>>>>> Do you have a place where the use case and the requirements are >>>>>>> documented >>>>>>> for this? Or is it out of scope? >>>>>>> Will it be course grained, i.e. website level ? Or fine grained, >>>>>>> i.e. text >>>>>>> paragraph level? See e.g. how Berlin is highlighted here: >>>>>>> >> http://pcai042.informatik.uni-**leipzig.de/~swp12-9/** >> >>>>>>> vorprojekt/index.php?**annotation_request=http%3A%2F%** >>>>>>> 2Fwww.worldatlas.com%**2Fcitypops.htm%23hash_4_30_** >>>>>>> 7449e732716c8e68842289bf2e6667**d5_Berlin%2C%2520Germany%2520-**%25203%2C >> <http://pcai042.informatik.uni-leipzig.de/~swp12-9/vorprojekt/index.php?annotation_request=http%3A%2F%2Fwww.worldatlas.com%2Fcitypops.htm%23hash_4_30_7449e732716c8e68842289bf2e6667d5_Berlin%2C%2520Germany%2520-%25203%2C> >> >>>>>>> in this very early prototype. >>>>>>> >>>>>>> Could you give me a link were I can read more about any Wikidata >>>>>>> plans >>>>>>> towards this direction? >>>>>>> Sebastian >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 05/16/2012 09:10 AM, Sebastian Hellmann wrote: >>>>>>> >>>>>>>> Dear all, >>>>>>>> (Note: I could not find the document, where your requirements >>>>>>>> regarding >>>>>>>> the tracking of facts on the web are written, so I am giving a >>>>>>>> general >>>>>>>> introduction to NIF. Please send me a link to the document that >>>>>>>> specifies >>>>>>>> your need for tracing facts on the web, thanks) >>>>>>>> >>>>>>>> I would like to point your attention to the URIs used in the NLP >>>>>>>> Interchange Format (NIF). >>>>>>>> NIF-URIs are quite easy to use, understand and implement. NIF has a >>>>>>>> one-triple-per-annotation paradigm. The latest documentation can >>>>>>>> be found >>>>>>>> here: >>>>>>>> >> http://svn.aksw.org/papers/**2012/WWW_NIF/public/string_**ontology.pdf<http://svn.aksw.org/papers/2012/WWW_NIF/public/string_ontology.pdf> >> >>>>>>>> >>>>>>>> The basic idea is to use URIs with hash fragment ids to annotate >>>>>>>> or mark >>>>>>>> pages on the web: >>>>>>>> An example is the first occurrence of "Semantic Web" on >>>>>>>> >> http://www.w3.org/**DesignIssues/LinkedData.html<http://www.w3.org/DesignIssues/LinkedData.html> >> >>>>>>>> as highlighted here: >>>>>>>> >> http://pcai042.informatik.uni-**leipzig.de/~swp12-9/** >> >>>>>>>> vorprojekt/index.php?**annotation_request=http%3A%2F%** >>>>>>>> 2Fwww.w3.org%2FDesignIssues%**2FLinkedData.html%23hash_10_**12_** >>>>>>>> 60f02d3b96c55e137e13494cf9a02d**06_Semantic%2520Web >> <http://pcai042.informatik.uni-leipzig.de/~swp12-9/vorprojekt/index.php?annotation_request=http%3A%2F%2Fwww.w3.org%2FDesignIssues%2FLinkedData.html%23hash_10_12_60f02d3b96c55e137e13494cf9a02d06_Semantic%2520Web> >> >>>>>>>> >>>>>>>> Here is a NIF example for linking a part of the document to the >>>>>>>> DBpedia >>>>>>>> entry of the Semantic Web: >>>>>>>> < >> http://www.w3.org/**DesignIssues/LinkedData.html#**offset_717_729<http://www.w3.org/DesignIssues/LinkedData.html#offset_717_729> >> >>>>>>>> a str:StringInContext ; >>>>>>>> sso:oen >>>>>>>> < >> http://dbpedia.org/resource/**Semantic_Web<http://dbpedia.org/resource/Semantic_Web> >>>>>>>> . >>>>>>>> >>>>>>>> >>>>>>>> We are currently preparing a new draft for the spec 2.0. The old >>>>>>>> one can >>>>>>>> be found here: >>>>>>>> >> http://nlp2rdf.org/nif-1-0/ >> >>>>>>>> There are several EU projects that intend to use NIF. >>>>>>>> Furthermore, it is >>>>>>>> easier for everybody, if we standardize a Web annotation format >>>>>>>> together. >>>>>>>> Please give feedback of your use cases. >>>>>>>> All the best, >>>>>>>> Sebastian >>>>>>>> >>>>>>>> >>>>>>> -- >>>>>>> Dipl. Inf. Sebastian Hellmann >>>>>>> Department of Computer Science, University of Leipzig >>>>>>> Projects: >> http://nlp2rdf.org ,http://dbpedia.org >> >>>>>>> Homepage: >> http://bis.informatik.uni-**leipzig.de/SebastianHellmann<http://bis.informatik.uni-leipzig.de/SebastianHellmann> >> >>>>>>> Research Group: >> http://aksw.org >> >>>>>>> >>>>>>> ______________________________**_________________ >>>>>>> Wikidata-l mailing list >>>>>>> >> Wikidata-l@lists.wikimedia.org >> >> https://lists.wikimedia.org/**mailman/listinfo/wikidata-l<https://lists.wikimedia.org/mailman/listinfo/wikidata-l> >> >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Wikidata-l mailing list >>>>>> >> Wikidata-l@lists.wikimedia.org >> >> https://lists.wikimedia.org/mailman/listinfo/wikidata-l >> >>>>> >>>>> -- >>>>> Dipl. Inf. Sebastian Hellmann >>>>> Department of Computer Science, University of Leipzig >>>>> Projects: >> http://nlp2rdf.org ,http://dbpedia.org >> >>>>> Homepage: >> http://bis.informatik.uni-leipzig.de/SebastianHellmann >> >>>>> Research Group: >> http://aksw.org >> >>>>> >>>>> _______________________________________________ >>>>> Wikidata-l mailing list >>>>> >> Wikidata-l@lists.wikimedia.org >> >> https://lists.wikimedia.org/mailman/listinfo/wikidata-l >> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Wikidata-l mailing list >>>> >> Wikidata-l@lists.wikimedia.org >> >> https://lists.wikimedia.org/mailman/listinfo/wikidata-l >> >>> >>> -- >>> Dipl. Inf. Sebastian Hellmann >>> Department of Computer Science, University of Leipzig >>> Projects: >> http://nlp2rdf.org ,http://dbpedia.org >> >>> Homepage: >> http://bis.informatik.uni-leipzig.de/SebastianHellmann >> >>> Research Group: >> http://aksw.org >> >> >> >> >> > > -- Dipl. Inf. Sebastian Hellmann Department of Computer Science, University of Leipzig Projects: http://nlp2rdf.org , http://dbpedia.org Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann Research Group: http://aksw.org
Received on Friday, 22 June 2012 07:02:37 UTC