- From: Timothy Lebo <lebot@rpi.edu>
- Date: Mon, 25 Jun 2012 16:52:04 -0400
- To: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
- Cc: public-rdf-prov@w3.org
Sabastian, On Jun 22, 2012, at 3:02 AM, Sebastian Hellmann wrote: > Hi Timothy, > > On 06/21/2012 11:09 PM, Timothy Lebo wrote: >> >>> in NIF Fragments of resources are used as subject in RDF. >>> Hence you could consider for inclusion, if it is not a too far stretch, and if there is enough time left. >> >> What specifically are you proposing the PROV-WG include? > Well, if you have a (web) document and you want to express, that a certain part was written by you. > e.g. I have written (with some exceptions) the beginning of http://wole2012.eurecom.fr/call-papers > From "This workshop envisions the Semantic..." until "Natural Language Processing and Semantic Web. " > > How do you express this with the current work of your group? If NIF-URIs provide you a way to identify that snippet of the document, then PROV and PROV-O can be used to describe its provenance. Your writing can be described as the following. Depending on what other things you'd like to say, we can add more PROV assertions. @prefix prov: <http://www.w3.org/ns/prov#> . <your-nif-uri-for-that-portion-of-the-document> prov:wasAttributedTo <http://data.semanticweb.org/person/sebastian-hellmann>; . <http://data.semanticweb.org/person/sebastian-hellmann> a prov:Agent, prov:Person . > NIF-URIs could fill this spot very well. I agree. Since we haven't spent any effort for conventions on how to identify portions of resource representations, NIF and PROV complement each other nicely. Regards, Tim > All the best, > Sebastian > >> >> Thanks for pointing out the NIF work, it will be great to reuse existing models for the strings in documents. >> >> Regards, >> Tim Lebo >> >> >>> You could read here for a start: http://lists.wikimedia.org/pipermail/wikidata-l/2012-May/000475.html >>> or here http://svn.aksw.org/papers/2012/WWW_NIF/public/string_ontology.pdf >>> All the best, >>> Sebastian >>> >>> -------- Original Message -------- >>> Subject: Re: [Wikidata-l] Provenance tracking on the Web with NIF-URIs >>> Date: Thu, 21 Jun 2012 20:34:14 +0100 >>> From: Barry Norton<barry.norton@ontotext.com> >>> To: Sebastian Hellmann<hellmann@informatik.uni-leipzig.de> >>> CC: Discussion list for the Wikidata project.<wikidata-l@lists.wikimedia.org> >>> >>> As excused I wasn't really following your discussion, but indeed if >>> you're giving URIs to these fragments... >>> >>> Barry >>> >>> >>> On 21/06/2012 20:29, Sebastian Hellmann wrote: >>>> Hi Barry, >>>> >>>> On 06/21/2012 08:51 PM, Barry Norton wrote: >>>>> Sorry to jump in (without really understanding the context), but you >>>>> guys saw this today, right? >>>>> >>> http://www.w3.org/TR/2012/WD-prov-aq-20120619/ >>> >>>> It seems to be very unrelated. That is only resource-level, right? >>>> "Fundamentally, provenance information >>>> >>> <http://www.w3.org/TR/2012/WD-prov-aq-20120619/#dfn-provenance-information> >>> >>>> is /about/ resource >>>> >>> <http://www.w3.org/TR/2012/WD-prov-aq-20120619/#dfn-resource> >>> s." So >>>> you would need a subject first. How do you say that the fact you just >>>> added to WikiData comes from a specific fragment of a resource? >>>> i.e. >>> http://www.w3.org/DesignIssues/LinkedData.html#offset_717_729 >>> the >>>> first occurence of "Semantic Web" >>>> >>>> Do you suggest, that NIF URIs might be standardized by inclusion in >>>> the PROV-AQ ? Might work. It could be compatible. >>>> >>>> Sebastian >>>> >>>>> Barry >>>>> >>>>> >>>>> On 21/06/2012 19:05, Sebastian Hellmann wrote: >>>>>> Hello Denny, >>>>>> I was traveling for the past few weeks and can finally answer your >>>>>> email. >>>>>> See my comments inline. >>>>>> >>>>>> On 05/29/2012 05:25 PM, Denny VrandeÄ?iÄ? wrote: >>>>>>> Hello Sebastian, >>>>>>> >>>>>>> >>>>>>> Just a few questions - as you note, it is easier if we all use the >>>>>>> same >>>>>>> standards, and so I want to ask about the relation to other related >>>>>>> standards: >>>>>>> * I understand that you dismiss IETF RFC 5147 because it is not stable >>>>>>> enough, right? >>>>>> The offset scheme of NIF is built on this RFC. >>>>>> So the following would hold: >>>>>> @prefix ld: >>> <http://www.w3.org/DesignIssues/LinkedData.html#> >>> . >>>>>> @prefix owl: >>> <http://www.w3.org/2002/07/owl#> >>> . >>>>>> ld:offset_717_729 owl:sameAs ld:char=717,12 . >>>>>> >>>>>> >>>>>> We might change the syntax and reuse the RFC syntax, but it has >>>>>> several issues: >>>>>> 1. The optional part is not easy to handle, because you would need >>>>>> to add owl:sameAs statements: >>>>>> >>>>>> ld:char=717,12;length=12,UTF-8 owl:sameAs ld:char=717,12;length=12 . >>>>>> ld:char=717,12;length=12,UTF-8 owl:sameAs ld:char=717,12 . >>>>>> ld:char=717,12;UTF-8 owl:sameAs ld:char=717,12;length=9876 . >>>>>> >>>>>> So theoretically ok, but annoying to implement and check. >>>>>> >>>>>> 2. When implementing web services, NIF allows the client to choose >>>>>> the prefix: >>>>>> >>> http://nlp2rdf.lod2.eu/demo/NIFStemmer?input-type=text&nif=true&prefix=http%3A%2F%2Fthis.is%2Fa%2Fslash%2Fprefix%2F&urirecipe=offset&input=President+Obama+is+president >>> . >>>>>> returning URIs like >>> <http://this.is/a/slash/prefix/offset_10_15> >>> >>>>>> So RFC 5147 would look like: >>>>>> >>> <http://this.is/a/slash/prefix/char=717,12> >>> >>> <http://this.is/a/slash/prefix/char=717,12;UTF-8> >>> >>>>>> or >>>>>> >>> <http://this.is/a/slash/prefix?char=717,12> >>> >>> <http://this.is/a/slash/prefix?char=717,12;UTF-8> >>> >>>>>> 3. Character like = , prevent the use of prefixes: >>>>>> echo "@prefix ld: >>> <http://www.w3.org/DesignIssues/LinkedData.html#> >>> . >>>>>> @prefix owl: >>> <http://www.w3.org/2002/07/owl#> >>> . >>>>>> ld:offset_717_729 owl:sameAs ld:char=717,12 . >>>>>> "> test.ttl ; rapper -i turtle test.ttl >>>>>> >>>>>> 4. implementation is a little bit more difficult, given that : >>>>>> $arr = split("_", "offset_717_729") ; >>>>>> switch ($arr[0]){ >>>>>> case 'offset' : >>>>>> $begin = $arr[1]; >>>>>> $end = $arr[2]; >>>>>> break; >>>>>> case 'hash' : >>>>>> $clength = $arr[1]; >>>>>> $slength = $arr[2]; >>>>>> $hash = $arr[3]; >>>>>> $rest = /*merge remaining with '_' */ >>>>>> break; >>>>>> } >>>>>> >>>>>> 5. RFC assumes a certain mime type, i.e. plain text. NIF does have a >>>>>> broader assumption. >>>>>>> * what is the relation to the W3C media fragment URIs? Did not find a >>>>>>> pointer there. >>>>>> They are designed for media such as images, video, not strings. >>>>>> Potentially, the same principle can be applied, but it is not yet >>>>>> engineered/researched. >>>>>>> * any plans of standardizing your approach? >>>>>> We will do NIF 2.0 as a community standard and finish it in a >>>>>> couple of months. It will be published under open licences, so >>>>>> anybody W3C or ISO might pick it up, easily. Other than that there >>>>>> are plans by several EU projects (see e.g. here >>>>>> >>> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Jun/0101.html >>> ) >>>>>> and a US project to use it and there are several third party >>>>>> implementations, already. We would rather have it adopted first on >>>>>> a large scale and then standardized, properly, i.e. W3C. This worked >>>>>> quite well for the FOAF project or for RDB2RDF Mappers. >>>>>> Chances for fast standardization are not so unlikely, I would assume. >>>>>>> We would strongly prefer to just use a standard instead of advocating >>>>>>> contenders for one -- if one exists. >>>>>> You might want to look at: >>>>>> >>> http://www.w3.org/community/openannotation/wiki/TextCommentOnWebPage >>> >>>>>> and the same highlighting here: >>>>>> >>> http://pcai042.informatik.uni-leipzig.de/~swp12-9/vorprojekt/index.php?annotation_request=http%3A%2F%2Fwww.w3.org%2FDesignIssues%2FLinkedData.html%23hash_10_12_60f02d3b96c55e137e13494cf9a02d06_Semantic%2520Web >>> >>>>>> >>>>>> NIF equivalent (4 triples instad of 14 and only one generated uuid): >>>>>> ld:hash_10_12_60f02d3b96c55e137e13494cf9a02d06_Semantic%20Web a >>>>>> str:String ; >>>>>> oa:hasBody [ >>>>>> oa:annotator >>> <mailto:Bob> >>> ; >>>>>> cnt:chars "Hey Tim, good idea that Semantic Web!" . >>>>>> ] >>>>>> >>>>>> So you might not think in a "contender" way. Approaches are >>>>>> complementary. NIF is simpler and the URIs have some features that >>>>>> might be wanted (stability, uniqueness, easy to implement). >>>>>> This is why I was asking for your *use case* . >>>>>> >>>>>> Note that: there are still some problems, when annotating DOM with >>>>>> URIs, e.g. xPointer is abandoned and was never finished. Xpath has >>>>>> its limits and is also expensive (i.e. SAX not possible). >>>>>> I think there is no proper solution as of now. >>>>>> All the best, >>>>>> Sebastian >>>>>> >>>>>>> Cheers, >>>>>>> Denny >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> 2012/5/18 Sebastian Hellmann >>> <hellmann@informatik.uni-leipzig.de> >>> >>>>>>>> Hello again, >>>>>>>> maybe the question, I asked was lost, as the text was TL;DR >>>>>>>> >>>>>>>> I heard that, it is planned to track provenance of facts. e.g. >>>>>>>> Berlin has >>>>>>>> 3,337,000 citizens found >>>>>>>> here: >>> http://www.worldatlas.com/**citypops.htm<http://www.worldatlas.com/citypops.htm> >>> >>>>>>>> Do you have a place where the use case and the requirements are >>>>>>>> documented >>>>>>>> for this? Or is it out of scope? >>>>>>>> Will it be course grained, i.e. website level ? Or fine grained, >>>>>>>> i.e. text >>>>>>>> paragraph level? See e.g. how Berlin is highlighted here: >>>>>>>> >>> http://pcai042.informatik.uni-**leipzig.de/~swp12-9/** >>> >>>>>>>> vorprojekt/index.php?**annotation_request=http%3A%2F%** >>>>>>>> 2Fwww.worldatlas.com%**2Fcitypops.htm%23hash_4_30_** >>>>>>>> 7449e732716c8e68842289bf2e6667**d5_Berlin%2C%2520Germany%2520-**%25203%2C >>> <http://pcai042.informatik.uni-leipzig.de/~swp12-9/vorprojekt/index.php?annotation_request=http%3A%2F%2Fwww.worldatlas.com%2Fcitypops.htm%23hash_4_30_7449e732716c8e68842289bf2e6667d5_Berlin%2C%2520Germany%2520-%25203%2C> >>> >>>>>>>> in this very early prototype. >>>>>>>> >>>>>>>> Could you give me a link were I can read more about any Wikidata >>>>>>>> plans >>>>>>>> towards this direction? >>>>>>>> Sebastian >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 05/16/2012 09:10 AM, Sebastian Hellmann wrote: >>>>>>>> >>>>>>>>> Dear all, >>>>>>>>> (Note: I could not find the document, where your requirements >>>>>>>>> regarding >>>>>>>>> the tracking of facts on the web are written, so I am giving a >>>>>>>>> general >>>>>>>>> introduction to NIF. Please send me a link to the document that >>>>>>>>> specifies >>>>>>>>> your need for tracing facts on the web, thanks) >>>>>>>>> >>>>>>>>> I would like to point your attention to the URIs used in the NLP >>>>>>>>> Interchange Format (NIF). >>>>>>>>> NIF-URIs are quite easy to use, understand and implement. NIF has a >>>>>>>>> one-triple-per-annotation paradigm. The latest documentation can >>>>>>>>> be found >>>>>>>>> here: >>>>>>>>> >>> http://svn.aksw.org/papers/**2012/WWW_NIF/public/string_**ontology.pdf<http://svn.aksw.org/papers/2012/WWW_NIF/public/string_ontology.pdf> >>> >>>>>>>>> >>>>>>>>> The basic idea is to use URIs with hash fragment ids to annotate >>>>>>>>> or mark >>>>>>>>> pages on the web: >>>>>>>>> An example is the first occurrence of "Semantic Web" on >>>>>>>>> >>> http://www.w3.org/**DesignIssues/LinkedData.html<http://www.w3.org/DesignIssues/LinkedData.html> >>> >>>>>>>>> as highlighted here: >>>>>>>>> >>> http://pcai042.informatik.uni-**leipzig.de/~swp12-9/** >>> >>>>>>>>> vorprojekt/index.php?**annotation_request=http%3A%2F%** >>>>>>>>> 2Fwww.w3.org%2FDesignIssues%**2FLinkedData.html%23hash_10_**12_** >>>>>>>>> 60f02d3b96c55e137e13494cf9a02d**06_Semantic%2520Web >>> <http://pcai042.informatik.uni-leipzig.de/~swp12-9/vorprojekt/index.php?annotation_request=http%3A%2F%2Fwww.w3.org%2FDesignIssues%2FLinkedData.html%23hash_10_12_60f02d3b96c55e137e13494cf9a02d06_Semantic%2520Web> >>> >>>>>>>>> >>>>>>>>> Here is a NIF example for linking a part of the document to the >>>>>>>>> DBpedia >>>>>>>>> entry of the Semantic Web: >>>>>>>>> < >>> http://www.w3.org/**DesignIssues/LinkedData.html#**offset_717_729<http://www.w3.org/DesignIssues/LinkedData.html#offset_717_729> >>> >>>>>>>>> a str:StringInContext ; >>>>>>>>> sso:oen >>>>>>>>> < >>> http://dbpedia.org/resource/**Semantic_Web<http://dbpedia.org/resource/Semantic_Web> >>>>>>>>> . >>>>>>>>> >>>>>>>>> >>>>>>>>> We are currently preparing a new draft for the spec 2.0. The old >>>>>>>>> one can >>>>>>>>> be found here: >>>>>>>>> >>> http://nlp2rdf.org/nif-1-0/ >>> >>>>>>>>> There are several EU projects that intend to use NIF. >>>>>>>>> Furthermore, it is >>>>>>>>> easier for everybody, if we standardize a Web annotation format >>>>>>>>> together. >>>>>>>>> Please give feedback of your use cases. >>>>>>>>> All the best, >>>>>>>>> Sebastian >>>>>>>>> >>>>>>>>> >>>>>>>> -- >>>>>>>> Dipl. Inf. Sebastian Hellmann >>>>>>>> Department of Computer Science, University of Leipzig >>>>>>>> Projects: >>> http://nlp2rdf.org ,http://dbpedia.org >>> >>>>>>>> Homepage: >>> http://bis.informatik.uni-**leipzig.de/SebastianHellmann<http://bis.informatik.uni-leipzig.de/SebastianHellmann> >>> >>>>>>>> Research Group: >>> http://aksw.org >>> >>>>>>>> >>>>>>>> ______________________________**_________________ >>>>>>>> Wikidata-l mailing list >>>>>>>> >>> Wikidata-l@lists.wikimedia.org >>> >>> https://lists.wikimedia.org/**mailman/listinfo/wikidata-l<https://lists.wikimedia.org/mailman/listinfo/wikidata-l> >>> >>>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Wikidata-l mailing list >>>>>>> >>> Wikidata-l@lists.wikimedia.org >>> >>> https://lists.wikimedia.org/mailman/listinfo/wikidata-l >>> >>>>>> >>>>>> -- >>>>>> Dipl. Inf. Sebastian Hellmann >>>>>> Department of Computer Science, University of Leipzig >>>>>> Projects: >>> http://nlp2rdf.org ,http://dbpedia.org >>> >>>>>> Homepage: >>> http://bis.informatik.uni-leipzig.de/SebastianHellmann >>> >>>>>> Research Group: >>> http://aksw.org >>> >>>>>> >>>>>> _______________________________________________ >>>>>> Wikidata-l mailing list >>>>>> >>> Wikidata-l@lists.wikimedia.org >>> >>> https://lists.wikimedia.org/mailman/listinfo/wikidata-l >>> >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Wikidata-l mailing list >>>>> >>> Wikidata-l@lists.wikimedia.org >>> >>> https://lists.wikimedia.org/mailman/listinfo/wikidata-l >>> >>>> >>>> -- >>>> Dipl. Inf. Sebastian Hellmann >>>> Department of Computer Science, University of Leipzig >>>> Projects: >>> http://nlp2rdf.org ,http://dbpedia.org >>> >>>> Homepage: >>> http://bis.informatik.uni-leipzig.de/SebastianHellmann >>> >>>> Research Group: >>> http://aksw.org >>> >>> >>> >>> >>> >> >> > > > -- > Dipl. Inf. Sebastian Hellmann > Department of Computer Science, University of Leipzig > Projects: http://nlp2rdf.org , http://dbpedia.org > Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann > Research Group: http://aksw.org > >
Received on Monday, 25 June 2012 20:52:44 UTC