- From: Timothy Lebo <lebot@rpi.edu>
- Date: Thu, 21 Jun 2012 14:09:33 -0700
- To: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
- Cc: public-rdf-prov@w3.org
Sabastian,
On Jun 21, 2012, at 1:04 PM, Sebastian Hellmann wrote:
> Dear Provenance group,
> there was a discussion at WikiData, which lead to contacting you:
> http://lists.wikimedia.org/pipermail/wikidata-l/2012-May/000475.html
> http://lists.wikimedia.org/pipermail/wikidata-l/2012-May/000478.html
> http://lists.wikimedia.org/pipermail/wikidata-l/2012-May/000566.html
> http://lists.wikimedia.org/pipermail/wikidata-l/2012-June/000751.html
> ...
>
> You are tracking provenance on the resource level.
Are you suggesting that text snippets within a document (resource representation, really) cannot be resources themselves?
PROV provides prov:Entity, and you can choose anything that you wish to be a prov:Entity (for cases when you want to describe its provenance).
So, we could tweak your example:
<http://www.w3.org/DesignIssues/LinkedData.html#offset_717_729>
a str:StringInContext, prov:Entity;
prov:value "Semantic Web";
prov:wasQuotedFrom <http://www.w3.org/DesignIssues/LinkedData.html>;
.
If you're concerned about time, you can get more specific by saying:
<http://www.w3.org/DesignIssues/LinkedData.html#offset_717_729>
a str:StringInContext, prov:Entity;
prov:wasQuotedFrom :the-page-today;
.
:the-page-today
a prov:Entity;
prov:specializationOf <http://www.w3.org/DesignIssues/LinkedData.html>;
prov:generatedAtTime "2009-06-18T18:24:33"^^xsd:dateTime;
.
> in NIF Fragments of resources are used as subject in RDF.
> Hence you could consider for inclusion, if it is not a too far stretch, and if there is enough time left.
What specifically are you proposing the PROV-WG include?
Thanks for pointing out the NIF work, it will be great to reuse existing models for the strings in documents.
Regards,
Tim Lebo
> You could read here for a start: http://lists.wikimedia.org/pipermail/wikidata-l/2012-May/000475.html
> or here http://svn.aksw.org/papers/2012/WWW_NIF/public/string_ontology.pdf
>
> All the best,
> Sebastian
>
> -------- Original Message --------
> Subject: Re: [Wikidata-l] Provenance tracking on the Web with NIF-URIs
> Date: Thu, 21 Jun 2012 20:34:14 +0100
> From: Barry Norton <barry.norton@ontotext.com>
> To: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
> CC: Discussion list for the Wikidata project. <wikidata-l@lists.wikimedia.org>
>
> As excused I wasn't really following your discussion, but indeed if
> you're giving URIs to these fragments...
>
> Barry
>
>
> On 21/06/2012 20:29, Sebastian Hellmann wrote:
> > Hi Barry,
> >
> > On 06/21/2012 08:51 PM, Barry Norton wrote:
> >>
> >> Sorry to jump in (without really understanding the context), but you
> >> guys saw this today, right?
> >>
> http://www.w3.org/TR/2012/WD-prov-aq-20120619/
>
> > It seems to be very unrelated. That is only resource-level, right?
> > "Fundamentally, provenance information
> >
> <http://www.w3.org/TR/2012/WD-prov-aq-20120619/#dfn-provenance-information>
>
> > is /about/ resource
> >
> <http://www.w3.org/TR/2012/WD-prov-aq-20120619/#dfn-resource>
> s." So
> > you would need a subject first. How do you say that the fact you just
> > added to WikiData comes from a specific fragment of a resource?
> > i.e.
> http://www.w3.org/DesignIssues/LinkedData.html#offset_717_729
> the
> > first occurence of "Semantic Web"
> >
> > Do you suggest, that NIF URIs might be standardized by inclusion in
> > the PROV-AQ ? Might work. It could be compatible.
> >
> > Sebastian
> >
> >>
> >> Barry
> >>
> >>
> >> On 21/06/2012 19:05, Sebastian Hellmann wrote:
> >>> Hello Denny,
> >>> I was traveling for the past few weeks and can finally answer your
> >>> email.
> >>> See my comments inline.
> >>>
> >>> On 05/29/2012 05:25 PM, Denny VrandeÄ?iÄ? wrote:
> >>>> Hello Sebastian,
> >>>>
> >>>>
> >>>> Just a few questions - as you note, it is easier if we all use the
> >>>> same
> >>>> standards, and so I want to ask about the relation to other related
> >>>> standards:
> >>>> * I understand that you dismiss IETF RFC 5147 because it is not stable
> >>>> enough, right?
> >>> The offset scheme of NIF is built on this RFC.
> >>> So the following would hold:
> >>> @prefix ld:
> <http://www.w3.org/DesignIssues/LinkedData.html#>
> .
> >>> @prefix owl:
> <http://www.w3.org/2002/07/owl#>
> .
> >>> ld:offset_717_729 owl:sameAs ld:char=717,12 .
> >>>
> >>>
> >>> We might change the syntax and reuse the RFC syntax, but it has
> >>> several issues:
> >>> 1. The optional part is not easy to handle, because you would need
> >>> to add owl:sameAs statements:
> >>>
> >>> ld:char=717,12;length=12,UTF-8 owl:sameAs ld:char=717,12;length=12 .
> >>> ld:char=717,12;length=12,UTF-8 owl:sameAs ld:char=717,12 .
> >>> ld:char=717,12;UTF-8 owl:sameAs ld:char=717,12;length=9876 .
> >>>
> >>> So theoretically ok, but annoying to implement and check.
> >>>
> >>> 2. When implementing web services, NIF allows the client to choose
> >>> the prefix:
> >>>
> http://nlp2rdf.lod2.eu/demo/NIFStemmer?input-type=text&nif=true&prefix=http%3A%2F%2Fthis.is%2Fa%2Fslash%2Fprefix%2F&urirecipe=offset&input=President+Obama+is+president
> .
> >>>
> >>> returning URIs like
> <http://this.is/a/slash/prefix/offset_10_15>
>
> >>> So RFC 5147 would look like:
> >>>
> <http://this.is/a/slash/prefix/char=717,12>
>
> >>>
> <http://this.is/a/slash/prefix/char=717,12;UTF-8>
>
> >>> or
> >>>
> <http://this.is/a/slash/prefix?char=717,12>
>
> >>>
> <http://this.is/a/slash/prefix?char=717,12;UTF-8>
>
> >>>
> >>> 3. Character like = , prevent the use of prefixes:
> >>> echo "@prefix ld:
> <http://www.w3.org/DesignIssues/LinkedData.html#>
> .
> >>> @prefix owl:
> <http://www.w3.org/2002/07/owl#>
> .
> >>> ld:offset_717_729 owl:sameAs ld:char=717,12 .
> >>> " > test.ttl ; rapper -i turtle test.ttl
> >>>
> >>> 4. implementation is a little bit more difficult, given that :
> >>> $arr = split("_", "offset_717_729") ;
> >>> switch ($arr[0]){
> >>> case 'offset' :
> >>> $begin = $arr[1];
> >>> $end = $arr[2];
> >>> break;
> >>> case 'hash' :
> >>> $clength = $arr[1];
> >>> $slength = $arr[2];
> >>> $hash = $arr[3];
> >>> $rest = /*merge remaining with '_' */
> >>> break;
> >>> }
> >>>
> >>> 5. RFC assumes a certain mime type, i.e. plain text. NIF does have a
> >>> broader assumption.
> >>>> * what is the relation to the W3C media fragment URIs? Did not find a
> >>>> pointer there.
> >>> They are designed for media such as images, video, not strings.
> >>> Potentially, the same principle can be applied, but it is not yet
> >>> engineered/researched.
> >>>> * any plans of standardizing your approach?
> >>> We will do NIF 2.0 as a community standard and finish it in a
> >>> couple of months. It will be published under open licences, so
> >>> anybody W3C or ISO might pick it up, easily. Other than that there
> >>> are plans by several EU projects (see e.g. here
> >>>
> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Jun/0101.html
> )
> >>> and a US project to use it and there are several third party
> >>> implementations, already. We would rather have it adopted first on
> >>> a large scale and then standardized, properly, i.e. W3C. This worked
> >>> quite well for the FOAF project or for RDB2RDF Mappers.
> >>> Chances for fast standardization are not so unlikely, I would assume.
> >>>> We would strongly prefer to just use a standard instead of advocating
> >>>> contenders for one -- if one exists.
> >>> You might want to look at:
> >>>
> http://www.w3.org/community/openannotation/wiki/TextCommentOnWebPage
>
> >>> and the same highlighting here:
> >>>
> http://pcai042.informatik.uni-leipzig.de/~swp12-9/vorprojekt/index.php?annotation_request=http%3A%2F%2Fwww.w3.org%2FDesignIssues%2FLinkedData.html%23hash_10_12_60f02d3b96c55e137e13494cf9a02d06_Semantic%2520Web
>
> >>>
> >>>
> >>> NIF equivalent (4 triples instad of 14 and only one generated uuid):
> >>> ld:hash_10_12_60f02d3b96c55e137e13494cf9a02d06_Semantic%20Web a
> >>> str:String ;
> >>> oa:hasBody [
> >>> oa:annotator
> <mailto:Bob>
> ;
> >>> cnt:chars "Hey Tim, good idea that Semantic Web!" .
> >>> ]
> >>>
> >>> So you might not think in a "contender" way. Approaches are
> >>> complementary. NIF is simpler and the URIs have some features that
> >>> might be wanted (stability, uniqueness, easy to implement).
> >>> This is why I was asking for your *use case* .
> >>>
> >>> Note that: there are still some problems, when annotating DOM with
> >>> URIs, e.g. xPointer is abandoned and was never finished. Xpath has
> >>> its limits and is also expensive (i.e. SAX not possible).
> >>> I think there is no proper solution as of now.
> >>> All the best,
> >>> Sebastian
> >>>
> >>>> Cheers,
> >>>> Denny
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> 2012/5/18 Sebastian Hellmann
> <hellmann@informatik.uni-leipzig.de>
>
> >>>>
> >>>>> Hello again,
> >>>>> maybe the question, I asked was lost, as the text was TL;DR
> >>>>>
> >>>>> I heard that, it is planned to track provenance of facts. e.g.
> >>>>> Berlin has
> >>>>> 3,337,000 citizens found
> >>>>> here:
> http://www.worldatlas.com/**citypops.htm<http://www.worldatlas.com/citypops.htm>
>
> >>>>> Do you have a place where the use case and the requirements are
> >>>>> documented
> >>>>> for this? Or is it out of scope?
> >>>>> Will it be course grained, i.e. website level ? Or fine grained,
> >>>>> i.e. text
> >>>>> paragraph level? See e.g. how Berlin is highlighted here:
> >>>>>
> http://pcai042.informatik.uni-**leipzig.de/~swp12-9/**
>
> >>>>> vorprojekt/index.php?**annotation_request=http%3A%2F%**
> >>>>> 2Fwww.worldatlas.com%**2Fcitypops.htm%23hash_4_30_**
> >>>>> 7449e732716c8e68842289bf2e6667**d5_Berlin%2C%2520Germany%2520-**%25203%2C
> <http://pcai042.informatik.uni-leipzig.de/~swp12-9/vorprojekt/index.php?annotation_request=http%3A%2F%2Fwww.worldatlas.com%2Fcitypops.htm%23hash_4_30_7449e732716c8e68842289bf2e6667d5_Berlin%2C%2520Germany%2520-%25203%2C>
>
> >>>>>
> >>>>> in this very early prototype.
> >>>>>
> >>>>> Could you give me a link were I can read more about any Wikidata
> >>>>> plans
> >>>>> towards this direction?
> >>>>> Sebastian
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 05/16/2012 09:10 AM, Sebastian Hellmann wrote:
> >>>>>
> >>>>>> Dear all,
> >>>>>> (Note: I could not find the document, where your requirements
> >>>>>> regarding
> >>>>>> the tracking of facts on the web are written, so I am giving a
> >>>>>> general
> >>>>>> introduction to NIF. Please send me a link to the document that
> >>>>>> specifies
> >>>>>> your need for tracing facts on the web, thanks)
> >>>>>>
> >>>>>> I would like to point your attention to the URIs used in the NLP
> >>>>>> Interchange Format (NIF).
> >>>>>> NIF-URIs are quite easy to use, understand and implement. NIF has a
> >>>>>> one-triple-per-annotation paradigm. The latest documentation can
> >>>>>> be found
> >>>>>> here:
> >>>>>>
> http://svn.aksw.org/papers/**2012/WWW_NIF/public/string_**ontology.pdf<http://svn.aksw.org/papers/2012/WWW_NIF/public/string_ontology.pdf>
>
> >>>>>>
> >>>>>>
> >>>>>> The basic idea is to use URIs with hash fragment ids to annotate
> >>>>>> or mark
> >>>>>> pages on the web:
> >>>>>> An example is the first occurrence of "Semantic Web" on
> >>>>>>
> http://www.w3.org/**DesignIssues/LinkedData.html<http://www.w3.org/DesignIssues/LinkedData.html>
>
> >>>>>> as highlighted here:
> >>>>>>
> http://pcai042.informatik.uni-**leipzig.de/~swp12-9/**
>
> >>>>>> vorprojekt/index.php?**annotation_request=http%3A%2F%**
> >>>>>> 2Fwww.w3.org%2FDesignIssues%**2FLinkedData.html%23hash_10_**12_**
> >>>>>> 60f02d3b96c55e137e13494cf9a02d**06_Semantic%2520Web
> <http://pcai042.informatik.uni-leipzig.de/~swp12-9/vorprojekt/index.php?annotation_request=http%3A%2F%2Fwww.w3.org%2FDesignIssues%2FLinkedData.html%23hash_10_12_60f02d3b96c55e137e13494cf9a02d06_Semantic%2520Web>
>
> >>>>>>
> >>>>>>
> >>>>>> Here is a NIF example for linking a part of the document to the
> >>>>>> DBpedia
> >>>>>> entry of the Semantic Web:
> >>>>>> <
> http://www.w3.org/**DesignIssues/LinkedData.html#**offset_717_729<http://www.w3.org/DesignIssues/LinkedData.html#offset_717_729>
>
> >>>>>>
> >>>>>> a str:StringInContext ;
> >>>>>> sso:oen
> >>>>>> <
> http://dbpedia.org/resource/**Semantic_Web<http://dbpedia.org/resource/Semantic_Web>
> >
> >>>>>> .
> >>>>>>
> >>>>>>
> >>>>>> We are currently preparing a new draft for the spec 2.0. The old
> >>>>>> one can
> >>>>>> be found here:
> >>>>>>
> http://nlp2rdf.org/nif-1-0/
>
> >>>>>>
> >>>>>> There are several EU projects that intend to use NIF.
> >>>>>> Furthermore, it is
> >>>>>> easier for everybody, if we standardize a Web annotation format
> >>>>>> together.
> >>>>>> Please give feedback of your use cases.
> >>>>>> All the best,
> >>>>>> Sebastian
> >>>>>>
> >>>>>>
> >>>>> --
> >>>>> Dipl. Inf. Sebastian Hellmann
> >>>>> Department of Computer Science, University of Leipzig
> >>>>> Projects:
> http://nlp2rdf.org ,http://dbpedia.org
>
> >>>>> Homepage:
> http://bis.informatik.uni-**leipzig.de/SebastianHellmann<http://bis.informatik.uni-leipzig.de/SebastianHellmann>
>
> >>>>>
> >>>>> Research Group:
> http://aksw.org
>
> >>>>>
> >>>>>
> >>>>> ______________________________**_________________
> >>>>> Wikidata-l mailing list
> >>>>>
> Wikidata-l@lists.wikimedia.org
>
> >>>>>
> https://lists.wikimedia.org/**mailman/listinfo/wikidata-l<https://lists.wikimedia.org/mailman/listinfo/wikidata-l>
>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Wikidata-l mailing list
> >>>>
> Wikidata-l@lists.wikimedia.org
>
> >>>>
> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>
> >>>
> >>>
> >>> --
> >>> Dipl. Inf. Sebastian Hellmann
> >>> Department of Computer Science, University of Leipzig
> >>> Projects:
> http://nlp2rdf.org ,http://dbpedia.org
>
> >>> Homepage:
> http://bis.informatik.uni-leipzig.de/SebastianHellmann
>
> >>> Research Group:
> http://aksw.org
>
> >>>
> >>>
> >>> _______________________________________________
> >>> Wikidata-l mailing list
> >>>
> Wikidata-l@lists.wikimedia.org
>
> >>>
> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>
> >>
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> Wikidata-l mailing list
> >>
> Wikidata-l@lists.wikimedia.org
>
> >>
> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>
> >
> >
> > --
> > Dipl. Inf. Sebastian Hellmann
> > Department of Computer Science, University of Leipzig
> > Projects:
> http://nlp2rdf.org ,http://dbpedia.org
>
> > Homepage:
> http://bis.informatik.uni-leipzig.de/SebastianHellmann
>
> > Research Group:
> http://aksw.org
>
>
>
>
>
Received on Thursday, 21 June 2012 21:10:09 UTC