- From: Timothy Lebo <lebot@rpi.edu>
- Date: Thu, 21 Jun 2012 14:09:33 -0700
- To: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
- Cc: public-rdf-prov@w3.org
Sabastian, On Jun 21, 2012, at 1:04 PM, Sebastian Hellmann wrote: > Dear Provenance group, > there was a discussion at WikiData, which lead to contacting you: > http://lists.wikimedia.org/pipermail/wikidata-l/2012-May/000475.html > http://lists.wikimedia.org/pipermail/wikidata-l/2012-May/000478.html > http://lists.wikimedia.org/pipermail/wikidata-l/2012-May/000566.html > http://lists.wikimedia.org/pipermail/wikidata-l/2012-June/000751.html > ... > > You are tracking provenance on the resource level. Are you suggesting that text snippets within a document (resource representation, really) cannot be resources themselves? PROV provides prov:Entity, and you can choose anything that you wish to be a prov:Entity (for cases when you want to describe its provenance). So, we could tweak your example: <http://www.w3.org/DesignIssues/LinkedData.html#offset_717_729> a str:StringInContext, prov:Entity; prov:value "Semantic Web"; prov:wasQuotedFrom <http://www.w3.org/DesignIssues/LinkedData.html>; . If you're concerned about time, you can get more specific by saying: <http://www.w3.org/DesignIssues/LinkedData.html#offset_717_729> a str:StringInContext, prov:Entity; prov:wasQuotedFrom :the-page-today; . :the-page-today a prov:Entity; prov:specializationOf <http://www.w3.org/DesignIssues/LinkedData.html>; prov:generatedAtTime "2009-06-18T18:24:33"^^xsd:dateTime; . > in NIF Fragments of resources are used as subject in RDF. > Hence you could consider for inclusion, if it is not a too far stretch, and if there is enough time left. What specifically are you proposing the PROV-WG include? Thanks for pointing out the NIF work, it will be great to reuse existing models for the strings in documents. Regards, Tim Lebo > You could read here for a start: http://lists.wikimedia.org/pipermail/wikidata-l/2012-May/000475.html > or here http://svn.aksw.org/papers/2012/WWW_NIF/public/string_ontology.pdf > > All the best, > Sebastian > > -------- Original Message -------- > Subject: Re: [Wikidata-l] Provenance tracking on the Web with NIF-URIs > Date: Thu, 21 Jun 2012 20:34:14 +0100 > From: Barry Norton <barry.norton@ontotext.com> > To: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de> > CC: Discussion list for the Wikidata project. <wikidata-l@lists.wikimedia.org> > > As excused I wasn't really following your discussion, but indeed if > you're giving URIs to these fragments... > > Barry > > > On 21/06/2012 20:29, Sebastian Hellmann wrote: > > Hi Barry, > > > > On 06/21/2012 08:51 PM, Barry Norton wrote: > >> > >> Sorry to jump in (without really understanding the context), but you > >> guys saw this today, right? > >> > http://www.w3.org/TR/2012/WD-prov-aq-20120619/ > > > It seems to be very unrelated. That is only resource-level, right? > > "Fundamentally, provenance information > > > <http://www.w3.org/TR/2012/WD-prov-aq-20120619/#dfn-provenance-information> > > > is /about/ resource > > > <http://www.w3.org/TR/2012/WD-prov-aq-20120619/#dfn-resource> > s." So > > you would need a subject first. How do you say that the fact you just > > added to WikiData comes from a specific fragment of a resource? > > i.e. > http://www.w3.org/DesignIssues/LinkedData.html#offset_717_729 > the > > first occurence of "Semantic Web" > > > > Do you suggest, that NIF URIs might be standardized by inclusion in > > the PROV-AQ ? Might work. It could be compatible. > > > > Sebastian > > > >> > >> Barry > >> > >> > >> On 21/06/2012 19:05, Sebastian Hellmann wrote: > >>> Hello Denny, > >>> I was traveling for the past few weeks and can finally answer your > >>> email. > >>> See my comments inline. > >>> > >>> On 05/29/2012 05:25 PM, Denny VrandeÄ?iÄ? wrote: > >>>> Hello Sebastian, > >>>> > >>>> > >>>> Just a few questions - as you note, it is easier if we all use the > >>>> same > >>>> standards, and so I want to ask about the relation to other related > >>>> standards: > >>>> * I understand that you dismiss IETF RFC 5147 because it is not stable > >>>> enough, right? > >>> The offset scheme of NIF is built on this RFC. > >>> So the following would hold: > >>> @prefix ld: > <http://www.w3.org/DesignIssues/LinkedData.html#> > . > >>> @prefix owl: > <http://www.w3.org/2002/07/owl#> > . > >>> ld:offset_717_729 owl:sameAs ld:char=717,12 . > >>> > >>> > >>> We might change the syntax and reuse the RFC syntax, but it has > >>> several issues: > >>> 1. The optional part is not easy to handle, because you would need > >>> to add owl:sameAs statements: > >>> > >>> ld:char=717,12;length=12,UTF-8 owl:sameAs ld:char=717,12;length=12 . > >>> ld:char=717,12;length=12,UTF-8 owl:sameAs ld:char=717,12 . > >>> ld:char=717,12;UTF-8 owl:sameAs ld:char=717,12;length=9876 . > >>> > >>> So theoretically ok, but annoying to implement and check. > >>> > >>> 2. When implementing web services, NIF allows the client to choose > >>> the prefix: > >>> > http://nlp2rdf.lod2.eu/demo/NIFStemmer?input-type=text&nif=true&prefix=http%3A%2F%2Fthis.is%2Fa%2Fslash%2Fprefix%2F&urirecipe=offset&input=President+Obama+is+president > . > >>> > >>> returning URIs like > <http://this.is/a/slash/prefix/offset_10_15> > > >>> So RFC 5147 would look like: > >>> > <http://this.is/a/slash/prefix/char=717,12> > > >>> > <http://this.is/a/slash/prefix/char=717,12;UTF-8> > > >>> or > >>> > <http://this.is/a/slash/prefix?char=717,12> > > >>> > <http://this.is/a/slash/prefix?char=717,12;UTF-8> > > >>> > >>> 3. Character like = , prevent the use of prefixes: > >>> echo "@prefix ld: > <http://www.w3.org/DesignIssues/LinkedData.html#> > . > >>> @prefix owl: > <http://www.w3.org/2002/07/owl#> > . > >>> ld:offset_717_729 owl:sameAs ld:char=717,12 . > >>> " > test.ttl ; rapper -i turtle test.ttl > >>> > >>> 4. implementation is a little bit more difficult, given that : > >>> $arr = split("_", "offset_717_729") ; > >>> switch ($arr[0]){ > >>> case 'offset' : > >>> $begin = $arr[1]; > >>> $end = $arr[2]; > >>> break; > >>> case 'hash' : > >>> $clength = $arr[1]; > >>> $slength = $arr[2]; > >>> $hash = $arr[3]; > >>> $rest = /*merge remaining with '_' */ > >>> break; > >>> } > >>> > >>> 5. RFC assumes a certain mime type, i.e. plain text. NIF does have a > >>> broader assumption. > >>>> * what is the relation to the W3C media fragment URIs? Did not find a > >>>> pointer there. > >>> They are designed for media such as images, video, not strings. > >>> Potentially, the same principle can be applied, but it is not yet > >>> engineered/researched. > >>>> * any plans of standardizing your approach? > >>> We will do NIF 2.0 as a community standard and finish it in a > >>> couple of months. It will be published under open licences, so > >>> anybody W3C or ISO might pick it up, easily. Other than that there > >>> are plans by several EU projects (see e.g. here > >>> > http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Jun/0101.html > ) > >>> and a US project to use it and there are several third party > >>> implementations, already. We would rather have it adopted first on > >>> a large scale and then standardized, properly, i.e. W3C. This worked > >>> quite well for the FOAF project or for RDB2RDF Mappers. > >>> Chances for fast standardization are not so unlikely, I would assume. > >>>> We would strongly prefer to just use a standard instead of advocating > >>>> contenders for one -- if one exists. > >>> You might want to look at: > >>> > http://www.w3.org/community/openannotation/wiki/TextCommentOnWebPage > > >>> and the same highlighting here: > >>> > http://pcai042.informatik.uni-leipzig.de/~swp12-9/vorprojekt/index.php?annotation_request=http%3A%2F%2Fwww.w3.org%2FDesignIssues%2FLinkedData.html%23hash_10_12_60f02d3b96c55e137e13494cf9a02d06_Semantic%2520Web > > >>> > >>> > >>> NIF equivalent (4 triples instad of 14 and only one generated uuid): > >>> ld:hash_10_12_60f02d3b96c55e137e13494cf9a02d06_Semantic%20Web a > >>> str:String ; > >>> oa:hasBody [ > >>> oa:annotator > <mailto:Bob> > ; > >>> cnt:chars "Hey Tim, good idea that Semantic Web!" . > >>> ] > >>> > >>> So you might not think in a "contender" way. Approaches are > >>> complementary. NIF is simpler and the URIs have some features that > >>> might be wanted (stability, uniqueness, easy to implement). > >>> This is why I was asking for your *use case* . > >>> > >>> Note that: there are still some problems, when annotating DOM with > >>> URIs, e.g. xPointer is abandoned and was never finished. Xpath has > >>> its limits and is also expensive (i.e. SAX not possible). > >>> I think there is no proper solution as of now. > >>> All the best, > >>> Sebastian > >>> > >>>> Cheers, > >>>> Denny > >>>> > >>>> > >>>> > >>>> > >>>> 2012/5/18 Sebastian Hellmann > <hellmann@informatik.uni-leipzig.de> > > >>>> > >>>>> Hello again, > >>>>> maybe the question, I asked was lost, as the text was TL;DR > >>>>> > >>>>> I heard that, it is planned to track provenance of facts. e.g. > >>>>> Berlin has > >>>>> 3,337,000 citizens found > >>>>> here: > http://www.worldatlas.com/**citypops.htm<http://www.worldatlas.com/citypops.htm> > > >>>>> Do you have a place where the use case and the requirements are > >>>>> documented > >>>>> for this? Or is it out of scope? > >>>>> Will it be course grained, i.e. website level ? Or fine grained, > >>>>> i.e. text > >>>>> paragraph level? See e.g. how Berlin is highlighted here: > >>>>> > http://pcai042.informatik.uni-**leipzig.de/~swp12-9/** > > >>>>> vorprojekt/index.php?**annotation_request=http%3A%2F%** > >>>>> 2Fwww.worldatlas.com%**2Fcitypops.htm%23hash_4_30_** > >>>>> 7449e732716c8e68842289bf2e6667**d5_Berlin%2C%2520Germany%2520-**%25203%2C > <http://pcai042.informatik.uni-leipzig.de/~swp12-9/vorprojekt/index.php?annotation_request=http%3A%2F%2Fwww.worldatlas.com%2Fcitypops.htm%23hash_4_30_7449e732716c8e68842289bf2e6667d5_Berlin%2C%2520Germany%2520-%25203%2C> > > >>>>> > >>>>> in this very early prototype. > >>>>> > >>>>> Could you give me a link were I can read more about any Wikidata > >>>>> plans > >>>>> towards this direction? > >>>>> Sebastian > >>>>> > >>>>> > >>>>> > >>>>> On 05/16/2012 09:10 AM, Sebastian Hellmann wrote: > >>>>> > >>>>>> Dear all, > >>>>>> (Note: I could not find the document, where your requirements > >>>>>> regarding > >>>>>> the tracking of facts on the web are written, so I am giving a > >>>>>> general > >>>>>> introduction to NIF. Please send me a link to the document that > >>>>>> specifies > >>>>>> your need for tracing facts on the web, thanks) > >>>>>> > >>>>>> I would like to point your attention to the URIs used in the NLP > >>>>>> Interchange Format (NIF). > >>>>>> NIF-URIs are quite easy to use, understand and implement. NIF has a > >>>>>> one-triple-per-annotation paradigm. The latest documentation can > >>>>>> be found > >>>>>> here: > >>>>>> > http://svn.aksw.org/papers/**2012/WWW_NIF/public/string_**ontology.pdf<http://svn.aksw.org/papers/2012/WWW_NIF/public/string_ontology.pdf> > > >>>>>> > >>>>>> > >>>>>> The basic idea is to use URIs with hash fragment ids to annotate > >>>>>> or mark > >>>>>> pages on the web: > >>>>>> An example is the first occurrence of "Semantic Web" on > >>>>>> > http://www.w3.org/**DesignIssues/LinkedData.html<http://www.w3.org/DesignIssues/LinkedData.html> > > >>>>>> as highlighted here: > >>>>>> > http://pcai042.informatik.uni-**leipzig.de/~swp12-9/** > > >>>>>> vorprojekt/index.php?**annotation_request=http%3A%2F%** > >>>>>> 2Fwww.w3.org%2FDesignIssues%**2FLinkedData.html%23hash_10_**12_** > >>>>>> 60f02d3b96c55e137e13494cf9a02d**06_Semantic%2520Web > <http://pcai042.informatik.uni-leipzig.de/~swp12-9/vorprojekt/index.php?annotation_request=http%3A%2F%2Fwww.w3.org%2FDesignIssues%2FLinkedData.html%23hash_10_12_60f02d3b96c55e137e13494cf9a02d06_Semantic%2520Web> > > >>>>>> > >>>>>> > >>>>>> Here is a NIF example for linking a part of the document to the > >>>>>> DBpedia > >>>>>> entry of the Semantic Web: > >>>>>> < > http://www.w3.org/**DesignIssues/LinkedData.html#**offset_717_729<http://www.w3.org/DesignIssues/LinkedData.html#offset_717_729> > > >>>>>> > >>>>>> a str:StringInContext ; > >>>>>> sso:oen > >>>>>> < > http://dbpedia.org/resource/**Semantic_Web<http://dbpedia.org/resource/Semantic_Web> > > > >>>>>> . > >>>>>> > >>>>>> > >>>>>> We are currently preparing a new draft for the spec 2.0. The old > >>>>>> one can > >>>>>> be found here: > >>>>>> > http://nlp2rdf.org/nif-1-0/ > > >>>>>> > >>>>>> There are several EU projects that intend to use NIF. > >>>>>> Furthermore, it is > >>>>>> easier for everybody, if we standardize a Web annotation format > >>>>>> together. > >>>>>> Please give feedback of your use cases. > >>>>>> All the best, > >>>>>> Sebastian > >>>>>> > >>>>>> > >>>>> -- > >>>>> Dipl. Inf. Sebastian Hellmann > >>>>> Department of Computer Science, University of Leipzig > >>>>> Projects: > http://nlp2rdf.org ,http://dbpedia.org > > >>>>> Homepage: > http://bis.informatik.uni-**leipzig.de/SebastianHellmann<http://bis.informatik.uni-leipzig.de/SebastianHellmann> > > >>>>> > >>>>> Research Group: > http://aksw.org > > >>>>> > >>>>> > >>>>> ______________________________**_________________ > >>>>> Wikidata-l mailing list > >>>>> > Wikidata-l@lists.wikimedia.org > > >>>>> > https://lists.wikimedia.org/**mailman/listinfo/wikidata-l<https://lists.wikimedia.org/mailman/listinfo/wikidata-l> > > >>>>> > >>>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> Wikidata-l mailing list > >>>> > Wikidata-l@lists.wikimedia.org > > >>>> > https://lists.wikimedia.org/mailman/listinfo/wikidata-l > > >>> > >>> > >>> -- > >>> Dipl. Inf. Sebastian Hellmann > >>> Department of Computer Science, University of Leipzig > >>> Projects: > http://nlp2rdf.org ,http://dbpedia.org > > >>> Homepage: > http://bis.informatik.uni-leipzig.de/SebastianHellmann > > >>> Research Group: > http://aksw.org > > >>> > >>> > >>> _______________________________________________ > >>> Wikidata-l mailing list > >>> > Wikidata-l@lists.wikimedia.org > > >>> > https://lists.wikimedia.org/mailman/listinfo/wikidata-l > > >> > >> > >> > >> > >> > >> _______________________________________________ > >> Wikidata-l mailing list > >> > Wikidata-l@lists.wikimedia.org > > >> > https://lists.wikimedia.org/mailman/listinfo/wikidata-l > > > > > > > -- > > Dipl. Inf. Sebastian Hellmann > > Department of Computer Science, University of Leipzig > > Projects: > http://nlp2rdf.org ,http://dbpedia.org > > > Homepage: > http://bis.informatik.uni-leipzig.de/SebastianHellmann > > > Research Group: > http://aksw.org > > > > >
Received on Thursday, 21 June 2012 21:10:09 UTC