- From: Timothy Lebo <lebot@rpi.edu>
- Date: Mon, 25 Jun 2012 16:52:04 -0400
- To: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
- Cc: public-rdf-prov@w3.org
Sabastian,
On Jun 22, 2012, at 3:02 AM, Sebastian Hellmann wrote:
> Hi Timothy,
>
> On 06/21/2012 11:09 PM, Timothy Lebo wrote:
>>
>>> in NIF Fragments of resources are used as subject in RDF.
>>> Hence you could consider for inclusion, if it is not a too far stretch, and if there is enough time left.
>>
>> What specifically are you proposing the PROV-WG include?
> Well, if you have a (web) document and you want to express, that a certain part was written by you.
> e.g. I have written (with some exceptions) the beginning of http://wole2012.eurecom.fr/call-papers
> From "This workshop envisions the Semantic..." until "Natural Language Processing and Semantic Web. "
>
> How do you express this with the current work of your group?
If NIF-URIs provide you a way to identify that snippet of the document, then PROV and PROV-O can be used to describe its provenance.
Your writing can be described as the following. Depending on what other things you'd like to say, we can add more PROV assertions.
@prefix prov: <http://www.w3.org/ns/prov#> .
<your-nif-uri-for-that-portion-of-the-document>
prov:wasAttributedTo <http://data.semanticweb.org/person/sebastian-hellmann>;
.
<http://data.semanticweb.org/person/sebastian-hellmann> a prov:Agent, prov:Person .
> NIF-URIs could fill this spot very well.
I agree. Since we haven't spent any effort for conventions on how to identify portions of resource representations, NIF and PROV complement each other nicely.
Regards,
Tim
> All the best,
> Sebastian
>
>>
>> Thanks for pointing out the NIF work, it will be great to reuse existing models for the strings in documents.
>>
>> Regards,
>> Tim Lebo
>>
>>
>>> You could read here for a start: http://lists.wikimedia.org/pipermail/wikidata-l/2012-May/000475.html
>>> or here http://svn.aksw.org/papers/2012/WWW_NIF/public/string_ontology.pdf
>>> All the best,
>>> Sebastian
>>>
>>> -------- Original Message --------
>>> Subject: Re: [Wikidata-l] Provenance tracking on the Web with NIF-URIs
>>> Date: Thu, 21 Jun 2012 20:34:14 +0100
>>> From: Barry Norton<barry.norton@ontotext.com>
>>> To: Sebastian Hellmann<hellmann@informatik.uni-leipzig.de>
>>> CC: Discussion list for the Wikidata project.<wikidata-l@lists.wikimedia.org>
>>>
>>> As excused I wasn't really following your discussion, but indeed if
>>> you're giving URIs to these fragments...
>>>
>>> Barry
>>>
>>>
>>> On 21/06/2012 20:29, Sebastian Hellmann wrote:
>>>> Hi Barry,
>>>>
>>>> On 06/21/2012 08:51 PM, Barry Norton wrote:
>>>>> Sorry to jump in (without really understanding the context), but you
>>>>> guys saw this today, right?
>>>>>
>>> http://www.w3.org/TR/2012/WD-prov-aq-20120619/
>>>
>>>> It seems to be very unrelated. That is only resource-level, right?
>>>> "Fundamentally, provenance information
>>>>
>>> <http://www.w3.org/TR/2012/WD-prov-aq-20120619/#dfn-provenance-information>
>>>
>>>> is /about/ resource
>>>>
>>> <http://www.w3.org/TR/2012/WD-prov-aq-20120619/#dfn-resource>
>>> s." So
>>>> you would need a subject first. How do you say that the fact you just
>>>> added to WikiData comes from a specific fragment of a resource?
>>>> i.e.
>>> http://www.w3.org/DesignIssues/LinkedData.html#offset_717_729
>>> the
>>>> first occurence of "Semantic Web"
>>>>
>>>> Do you suggest, that NIF URIs might be standardized by inclusion in
>>>> the PROV-AQ ? Might work. It could be compatible.
>>>>
>>>> Sebastian
>>>>
>>>>> Barry
>>>>>
>>>>>
>>>>> On 21/06/2012 19:05, Sebastian Hellmann wrote:
>>>>>> Hello Denny,
>>>>>> I was traveling for the past few weeks and can finally answer your
>>>>>> email.
>>>>>> See my comments inline.
>>>>>>
>>>>>> On 05/29/2012 05:25 PM, Denny VrandeÄ?iÄ? wrote:
>>>>>>> Hello Sebastian,
>>>>>>>
>>>>>>>
>>>>>>> Just a few questions - as you note, it is easier if we all use the
>>>>>>> same
>>>>>>> standards, and so I want to ask about the relation to other related
>>>>>>> standards:
>>>>>>> * I understand that you dismiss IETF RFC 5147 because it is not stable
>>>>>>> enough, right?
>>>>>> The offset scheme of NIF is built on this RFC.
>>>>>> So the following would hold:
>>>>>> @prefix ld:
>>> <http://www.w3.org/DesignIssues/LinkedData.html#>
>>> .
>>>>>> @prefix owl:
>>> <http://www.w3.org/2002/07/owl#>
>>> .
>>>>>> ld:offset_717_729 owl:sameAs ld:char=717,12 .
>>>>>>
>>>>>>
>>>>>> We might change the syntax and reuse the RFC syntax, but it has
>>>>>> several issues:
>>>>>> 1. The optional part is not easy to handle, because you would need
>>>>>> to add owl:sameAs statements:
>>>>>>
>>>>>> ld:char=717,12;length=12,UTF-8 owl:sameAs ld:char=717,12;length=12 .
>>>>>> ld:char=717,12;length=12,UTF-8 owl:sameAs ld:char=717,12 .
>>>>>> ld:char=717,12;UTF-8 owl:sameAs ld:char=717,12;length=9876 .
>>>>>>
>>>>>> So theoretically ok, but annoying to implement and check.
>>>>>>
>>>>>> 2. When implementing web services, NIF allows the client to choose
>>>>>> the prefix:
>>>>>>
>>> http://nlp2rdf.lod2.eu/demo/NIFStemmer?input-type=text&nif=true&prefix=http%3A%2F%2Fthis.is%2Fa%2Fslash%2Fprefix%2F&urirecipe=offset&input=President+Obama+is+president
>>> .
>>>>>> returning URIs like
>>> <http://this.is/a/slash/prefix/offset_10_15>
>>>
>>>>>> So RFC 5147 would look like:
>>>>>>
>>> <http://this.is/a/slash/prefix/char=717,12>
>>>
>>> <http://this.is/a/slash/prefix/char=717,12;UTF-8>
>>>
>>>>>> or
>>>>>>
>>> <http://this.is/a/slash/prefix?char=717,12>
>>>
>>> <http://this.is/a/slash/prefix?char=717,12;UTF-8>
>>>
>>>>>> 3. Character like = , prevent the use of prefixes:
>>>>>> echo "@prefix ld:
>>> <http://www.w3.org/DesignIssues/LinkedData.html#>
>>> .
>>>>>> @prefix owl:
>>> <http://www.w3.org/2002/07/owl#>
>>> .
>>>>>> ld:offset_717_729 owl:sameAs ld:char=717,12 .
>>>>>> "> test.ttl ; rapper -i turtle test.ttl
>>>>>>
>>>>>> 4. implementation is a little bit more difficult, given that :
>>>>>> $arr = split("_", "offset_717_729") ;
>>>>>> switch ($arr[0]){
>>>>>> case 'offset' :
>>>>>> $begin = $arr[1];
>>>>>> $end = $arr[2];
>>>>>> break;
>>>>>> case 'hash' :
>>>>>> $clength = $arr[1];
>>>>>> $slength = $arr[2];
>>>>>> $hash = $arr[3];
>>>>>> $rest = /*merge remaining with '_' */
>>>>>> break;
>>>>>> }
>>>>>>
>>>>>> 5. RFC assumes a certain mime type, i.e. plain text. NIF does have a
>>>>>> broader assumption.
>>>>>>> * what is the relation to the W3C media fragment URIs? Did not find a
>>>>>>> pointer there.
>>>>>> They are designed for media such as images, video, not strings.
>>>>>> Potentially, the same principle can be applied, but it is not yet
>>>>>> engineered/researched.
>>>>>>> * any plans of standardizing your approach?
>>>>>> We will do NIF 2.0 as a community standard and finish it in a
>>>>>> couple of months. It will be published under open licences, so
>>>>>> anybody W3C or ISO might pick it up, easily. Other than that there
>>>>>> are plans by several EU projects (see e.g. here
>>>>>>
>>> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Jun/0101.html
>>> )
>>>>>> and a US project to use it and there are several third party
>>>>>> implementations, already. We would rather have it adopted first on
>>>>>> a large scale and then standardized, properly, i.e. W3C. This worked
>>>>>> quite well for the FOAF project or for RDB2RDF Mappers.
>>>>>> Chances for fast standardization are not so unlikely, I would assume.
>>>>>>> We would strongly prefer to just use a standard instead of advocating
>>>>>>> contenders for one -- if one exists.
>>>>>> You might want to look at:
>>>>>>
>>> http://www.w3.org/community/openannotation/wiki/TextCommentOnWebPage
>>>
>>>>>> and the same highlighting here:
>>>>>>
>>> http://pcai042.informatik.uni-leipzig.de/~swp12-9/vorprojekt/index.php?annotation_request=http%3A%2F%2Fwww.w3.org%2FDesignIssues%2FLinkedData.html%23hash_10_12_60f02d3b96c55e137e13494cf9a02d06_Semantic%2520Web
>>>
>>>>>>
>>>>>> NIF equivalent (4 triples instad of 14 and only one generated uuid):
>>>>>> ld:hash_10_12_60f02d3b96c55e137e13494cf9a02d06_Semantic%20Web a
>>>>>> str:String ;
>>>>>> oa:hasBody [
>>>>>> oa:annotator
>>> <mailto:Bob>
>>> ;
>>>>>> cnt:chars "Hey Tim, good idea that Semantic Web!" .
>>>>>> ]
>>>>>>
>>>>>> So you might not think in a "contender" way. Approaches are
>>>>>> complementary. NIF is simpler and the URIs have some features that
>>>>>> might be wanted (stability, uniqueness, easy to implement).
>>>>>> This is why I was asking for your *use case* .
>>>>>>
>>>>>> Note that: there are still some problems, when annotating DOM with
>>>>>> URIs, e.g. xPointer is abandoned and was never finished. Xpath has
>>>>>> its limits and is also expensive (i.e. SAX not possible).
>>>>>> I think there is no proper solution as of now.
>>>>>> All the best,
>>>>>> Sebastian
>>>>>>
>>>>>>> Cheers,
>>>>>>> Denny
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2012/5/18 Sebastian Hellmann
>>> <hellmann@informatik.uni-leipzig.de>
>>>
>>>>>>>> Hello again,
>>>>>>>> maybe the question, I asked was lost, as the text was TL;DR
>>>>>>>>
>>>>>>>> I heard that, it is planned to track provenance of facts. e.g.
>>>>>>>> Berlin has
>>>>>>>> 3,337,000 citizens found
>>>>>>>> here:
>>> http://www.worldatlas.com/**citypops.htm<http://www.worldatlas.com/citypops.htm>
>>>
>>>>>>>> Do you have a place where the use case and the requirements are
>>>>>>>> documented
>>>>>>>> for this? Or is it out of scope?
>>>>>>>> Will it be course grained, i.e. website level ? Or fine grained,
>>>>>>>> i.e. text
>>>>>>>> paragraph level? See e.g. how Berlin is highlighted here:
>>>>>>>>
>>> http://pcai042.informatik.uni-**leipzig.de/~swp12-9/**
>>>
>>>>>>>> vorprojekt/index.php?**annotation_request=http%3A%2F%**
>>>>>>>> 2Fwww.worldatlas.com%**2Fcitypops.htm%23hash_4_30_**
>>>>>>>> 7449e732716c8e68842289bf2e6667**d5_Berlin%2C%2520Germany%2520-**%25203%2C
>>> <http://pcai042.informatik.uni-leipzig.de/~swp12-9/vorprojekt/index.php?annotation_request=http%3A%2F%2Fwww.worldatlas.com%2Fcitypops.htm%23hash_4_30_7449e732716c8e68842289bf2e6667d5_Berlin%2C%2520Germany%2520-%25203%2C>
>>>
>>>>>>>> in this very early prototype.
>>>>>>>>
>>>>>>>> Could you give me a link were I can read more about any Wikidata
>>>>>>>> plans
>>>>>>>> towards this direction?
>>>>>>>> Sebastian
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 05/16/2012 09:10 AM, Sebastian Hellmann wrote:
>>>>>>>>
>>>>>>>>> Dear all,
>>>>>>>>> (Note: I could not find the document, where your requirements
>>>>>>>>> regarding
>>>>>>>>> the tracking of facts on the web are written, so I am giving a
>>>>>>>>> general
>>>>>>>>> introduction to NIF. Please send me a link to the document that
>>>>>>>>> specifies
>>>>>>>>> your need for tracing facts on the web, thanks)
>>>>>>>>>
>>>>>>>>> I would like to point your attention to the URIs used in the NLP
>>>>>>>>> Interchange Format (NIF).
>>>>>>>>> NIF-URIs are quite easy to use, understand and implement. NIF has a
>>>>>>>>> one-triple-per-annotation paradigm. The latest documentation can
>>>>>>>>> be found
>>>>>>>>> here:
>>>>>>>>>
>>> http://svn.aksw.org/papers/**2012/WWW_NIF/public/string_**ontology.pdf<http://svn.aksw.org/papers/2012/WWW_NIF/public/string_ontology.pdf>
>>>
>>>>>>>>>
>>>>>>>>> The basic idea is to use URIs with hash fragment ids to annotate
>>>>>>>>> or mark
>>>>>>>>> pages on the web:
>>>>>>>>> An example is the first occurrence of "Semantic Web" on
>>>>>>>>>
>>> http://www.w3.org/**DesignIssues/LinkedData.html<http://www.w3.org/DesignIssues/LinkedData.html>
>>>
>>>>>>>>> as highlighted here:
>>>>>>>>>
>>> http://pcai042.informatik.uni-**leipzig.de/~swp12-9/**
>>>
>>>>>>>>> vorprojekt/index.php?**annotation_request=http%3A%2F%**
>>>>>>>>> 2Fwww.w3.org%2FDesignIssues%**2FLinkedData.html%23hash_10_**12_**
>>>>>>>>> 60f02d3b96c55e137e13494cf9a02d**06_Semantic%2520Web
>>> <http://pcai042.informatik.uni-leipzig.de/~swp12-9/vorprojekt/index.php?annotation_request=http%3A%2F%2Fwww.w3.org%2FDesignIssues%2FLinkedData.html%23hash_10_12_60f02d3b96c55e137e13494cf9a02d06_Semantic%2520Web>
>>>
>>>>>>>>>
>>>>>>>>> Here is a NIF example for linking a part of the document to the
>>>>>>>>> DBpedia
>>>>>>>>> entry of the Semantic Web:
>>>>>>>>> <
>>> http://www.w3.org/**DesignIssues/LinkedData.html#**offset_717_729<http://www.w3.org/DesignIssues/LinkedData.html#offset_717_729>
>>>
>>>>>>>>> a str:StringInContext ;
>>>>>>>>> sso:oen
>>>>>>>>> <
>>> http://dbpedia.org/resource/**Semantic_Web<http://dbpedia.org/resource/Semantic_Web>
>>>>>>>>> .
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> We are currently preparing a new draft for the spec 2.0. The old
>>>>>>>>> one can
>>>>>>>>> be found here:
>>>>>>>>>
>>> http://nlp2rdf.org/nif-1-0/
>>>
>>>>>>>>> There are several EU projects that intend to use NIF.
>>>>>>>>> Furthermore, it is
>>>>>>>>> easier for everybody, if we standardize a Web annotation format
>>>>>>>>> together.
>>>>>>>>> Please give feedback of your use cases.
>>>>>>>>> All the best,
>>>>>>>>> Sebastian
>>>>>>>>>
>>>>>>>>>
>>>>>>>> --
>>>>>>>> Dipl. Inf. Sebastian Hellmann
>>>>>>>> Department of Computer Science, University of Leipzig
>>>>>>>> Projects:
>>> http://nlp2rdf.org ,http://dbpedia.org
>>>
>>>>>>>> Homepage:
>>> http://bis.informatik.uni-**leipzig.de/SebastianHellmann<http://bis.informatik.uni-leipzig.de/SebastianHellmann>
>>>
>>>>>>>> Research Group:
>>> http://aksw.org
>>>
>>>>>>>>
>>>>>>>> ______________________________**_________________
>>>>>>>> Wikidata-l mailing list
>>>>>>>>
>>> Wikidata-l@lists.wikimedia.org
>>>
>>> https://lists.wikimedia.org/**mailman/listinfo/wikidata-l<https://lists.wikimedia.org/mailman/listinfo/wikidata-l>
>>>
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Wikidata-l mailing list
>>>>>>>
>>> Wikidata-l@lists.wikimedia.org
>>>
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>>>
>>>>>>
>>>>>> --
>>>>>> Dipl. Inf. Sebastian Hellmann
>>>>>> Department of Computer Science, University of Leipzig
>>>>>> Projects:
>>> http://nlp2rdf.org ,http://dbpedia.org
>>>
>>>>>> Homepage:
>>> http://bis.informatik.uni-leipzig.de/SebastianHellmann
>>>
>>>>>> Research Group:
>>> http://aksw.org
>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Wikidata-l mailing list
>>>>>>
>>> Wikidata-l@lists.wikimedia.org
>>>
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Wikidata-l mailing list
>>>>>
>>> Wikidata-l@lists.wikimedia.org
>>>
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>>>
>>>>
>>>> --
>>>> Dipl. Inf. Sebastian Hellmann
>>>> Department of Computer Science, University of Leipzig
>>>> Projects:
>>> http://nlp2rdf.org ,http://dbpedia.org
>>>
>>>> Homepage:
>>> http://bis.informatik.uni-leipzig.de/SebastianHellmann
>>>
>>>> Research Group:
>>> http://aksw.org
>>>
>>>
>>>
>>>
>>>
>>
>>
>
>
> --
> Dipl. Inf. Sebastian Hellmann
> Department of Computer Science, University of Leipzig
> Projects: http://nlp2rdf.org , http://dbpedia.org
> Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
> Research Group: http://aksw.org
>
>
Received on Monday, 25 June 2012 20:52:44 UTC