Re: My best practices for Ontology versioning for http://nlp2rdf.org, was Re: Versioning system for ontologies from Daniel Garijo on 2013-04-22 (semantic-web@w3.org from April 2013)

From: Daniel Garijo <dgarijo@fi.upm.es>
Date: Mon, 22 Apr 2013 19:13:14 +0200
To: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
Cc: Prateek <prateek@knoesis.org>, "semantic-web@w3.org Web" <semantic-web@w3.org>
Message-ID: <CAExK0DcZj+jFRypHt7evgZ72jk08C2mzPOhopAdsu0yua4_T3g@mail.gmail.com>
Interesting, thanks for clarifying!
Best,
Daniel


2013/4/22 Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>

>  Hi Daniel,
> well this matter is quite simple: the .htaccess rules need to be changed.
> At least from:
> RewriteRule ^nif-core$
> /nlp2rdf/ontologies/nif-core/version-1.0/nif-core.ttl [R=303,L]
> to
> RewriteRule ^nif-core/
> /nlp2rdf/ontologies/nif-core/version-1.0/nif-core.ttl [R=303,L]
>
> The behaviour might not be what you would normally expect, as you would
> get a superset (i.e. all tripels and not just those starting with the URI).
> Would this pose a problem? A client would expect to get only 10 triples,
> but would receive the whole ontology.
>
> Would everybody with a result like this?
> curl  -IL
> http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core/String
> > HTTP/1.1 303 See Other
> > Location:
> http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core/version-1.0/nif-core.ttl
>
>
> From a practical perspective it wouldn't matter for small ontologies. For
> large files with thousands of terms using a store (e.g. virtuoso) returning
> just the CBD[1] is probably the best practice. I wouldn't expect that naive
> and ad-hoc crawler implementations implement a cache  in a good way (i.e.
> caching the Location of the 303 redirect) . Chances are better, that the
> cache works better for '#' uris.
>
> So as a guideline, I will add, that anybody should calculate the worst
> case download traffic per client, i.e. my two file are:
> nif-core.owl -> 15610 byte
> nif-core.ttl -> 4050 byte
>
> with 22 unique subjects:
> rapper -g nif-core.ttl | cut -f1 -d '>' | sort -u | wc -l
>
> so a badly implemented crawler can the following traffic damage:
> 22 * 15610 =~ 343.5kb
> 22 *  4050 =~ 89.1kb
>
> which is still acceptable. For larger files, this becomes infeasable. '/'
> with triple store and  CBD might be the best option for larger files. But
> it is not as easy to set up (as it requires more than an Apache web server
> ).
>
> As I see it, the Gene Ontology [2] is using '#' and is practically
> unpublishable as linked data:
> 39292 unique subjects with over 97MB = 3.8 GB
>
> On my webserver, I only have Apache and .htaccess, so a triple store is
> not an option. Maybe I should write a script, which splits the triples into
> separate files? But on the other hand my university has unlimited traffic
> ;)
>
> I will introduce the above rule of thumb in the README, when I have time.
>
> All the best,
> Sebastian
>
>
>
> [1] http://www.w3.org/Submission/CBD/
> [2] http://archive.geneontology.org/latest-termdb/go_daily-termdb.owl.gz
>
> Am 22.04.2013 14:09, schrieb Daniel Garijo:
>
>    Hi Sebastain,
>  I'm glad I could help you.
>  However, I still don't get why the workflow wouldn't work for ontologies
> with "/". Are any of the
>  tools not appropriate?
>  Thanks for sharing the VAD link. I was not aware of the tool.
>  Best,
> Daniel
>
>
> 2013/4/22 Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
>
>>  Ah, yes, you are right. Thanks for your help.
>> I was confused, because DBpedia also shows the instance data for the
>> classes in the HTML interface (i.e. inbound triples):
>> http://dbpedia.org/ontology/PopulatedPlace
>>
>> But of course, this is just a nice add-on for the HTML view. There is
>> actually no "Get all instances of a class" via Linked Data, only via
>> SPARQL.
>>
>> I have also updated
>> https://github.com/NLP2RDF/persistence.uni-leipzig.org#-vs--uris with:
>>
>> There has been an ongoing debate about '#' vs. '/' . We focus on
>> ontologies with '\#' here with URIs like:
>> http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#String
>> Note that ontologies with '/' URIs need to published differently
>> (description not included here).
>>
>>
>> By the way, DBpedia only uses something that looks like Pubby, i.e. the
>> DBpedia VAD, which is written in vsp[1].
>>
>> Thanks again,
>> Sebastian
>>
>> [1]https://github.com/dbpedia/dbpedia-vad-i18n
>>
>>
>> Am 22.04.2013 12:17, schrieb Daniel Garijo:
>>
>> Hi, I'm not sure I see the issue here.
>>
>>
>> 2013/4/22 Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
>>
>>>  Hm, no actually, this issue is quite easy, when it comes to large
>>> databases.
>>>
>>> curl -H "Accept: text/turtle"
>>> "http://dbpedia.org/ontology#PopulatedPlace"<http://dbpedia.org/ontology#PopulatedPlace>
>>> is pretty much the same as:
>>> curl -H "Accept: text/turtle" "http://dbpedia.org/ontology"<http://dbpedia.org/ontology>
>>>
>> But here you are not asking for any instance. You are asking for a
>> document
>>  where the ontology is defined.
>>
>>>
>>> So my questions are:
>>>
>>> 1. What do you think is the expected output of
>>> http://dbpedia.org/ontology ? 300 million triples as turtle?
>>>
>> No. You would see the description of the ontology. In DB-pedia they
>> haven't done such redirection because
>>  they are exposing both terms and classes with Pubby. But note that when
>> you look for a term, no instances
>> are returned.
>>
>>   2. How do you query all instances of type db-ont:PopulatedPlace via
>>> Linked Data ?
>>>
>> Via a SPARQL query:
>>  select ?instance where{
>>  ?instance a db-ont:PopulatedPlace.
>>  }
>>  If you don't want all the instances, then add a "LIMIT". That is why
>> they have a public endpoint, right?
>>
>>  Another example. The recent PROV-O Ontology (with namespace URI
>> http://www.w3.org/ns/prov#).
>>  If I have an endpoint with many prov:Entities published and I want
>> them, I can perform a query
>> as the one I did above. If I want to see the documentation of the term,
>> then I would ask for
>> http://www.w3.org/ns/prov#Entity and I would be redirected to it.
>>  Doing an accept request for turtle to an ontology term would return the
>> owl file of the ontology,
>> not the instances of that term.
>>
>>  Best,
>> Daniel
>>
>>>
>>> q.e.d from my point of view, as you wouldn't get around these practical
>>> problems.
>>>
>>> -- Sebastian
>>>
>>> Am 22.04.2013 11:50, schrieb Daniel Garijo:
>>>
>>>  Dear Sebastian,
>>>  This statement:
>>> "When you publish ontologies without data, you can use '#' . However, if
>>> you want to query instances via Linked Data in a database, you have to use
>>> '/' as DBpedia does for classes:
>>> http://dbpedia.org/ontology/PopulatedPlace"
>>>
>>> is not correct. You can use "#" to query instances via Linked Data
>>> databases. That is just the URI of the type. In fact if DBpedia had chosen
>>>
>>> "http://dbpedia.org/ontology#PopulatedPlace<http://dbpedia.org/ontology/PopulatedPlace>"
>>> instead of its current URI it would still be fine. It doesn't affect the
>>> query.
>>>
>>> I'm not going to enter in the debate of "# vs /", but normally it is a
>>> design decission that has to do more with the size of vocabularies than the
>>> instances.
>>>
>>> Best,
>>> Daniel
>>>
>>>
>>> 2013/4/22 Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
>>>
>>>>  Dear all,
>>>>
>>>> personally, I have been working on this for quite a while and for me
>>>> the best and easiest way is as documented here:
>>>> https://github.com/NLP2RDF/persistence.uni-leipzig.org#readme
>>>>
>>>> They are simple and effective and I couldn't imagine anything more.
>>>>
>>>> Note that I have also secured persistent hosting for the URIs (also an
>>>> important point).
>>>> Feedback welcome, of course.
>>>>
>>>> All the best,
>>>> Sebastian
>>>> Ontology:
>>>> http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#
>>>>  # vs /
>>>>
>>>> When you publish ontologies without data, you can use '#' . However, if
>>>> you want to query instances via Linked Data in a database, you have to use
>>>> '/' as DBpedia does for classes:
>>>> http://dbpedia.org/ontology/PopulatedPlace
>>>>  <https://github.com/NLP2RDF/persistence.uni-leipzig.org#workflow>
>>>> Workflow
>>>>
>>>>    1. I edit the ontologies in turtle syntax with the Geany text
>>>>    editor (or a Turtle editor
>>>>    http://blog.aksw.org/2013/xturtle-turtle-editing-the-eclipse-way ),
>>>>    This allows me to make developers comments using "#" directly in the
>>>>    source, see e.g. nlp2rdf/ontologies/nif-core.ttl
>>>>    2. When I am finished I use rapper (
>>>>    http://librdf.org/raptor/rapper.html) to convert it to rdfxml (
>>>>    nlp2rdf/ontologies/nif-core.owl )
>>>>    3. I am versioning the ontologies in a folder with the version
>>>>    number, e.g. version-1.0 If somebody wants to find old ontologies, she can
>>>>    find them in the GitHub repository, which is linked from the ontology. I
>>>>    assume this is not often required, but it is nice to keep old versions. The
>>>>    old versions should be linked to in the comment of the ontology, see the
>>>>    header of nif-core.ttl
>>>>    4. Then I use git push to push the changes to our server
>>>>    5. (not yet) I use a simple OWL2HTML generator, e.g.
>>>>    https://github.com/specgen/specgen
>>>>    6. add yourself to http://prefix.cc, see e.g. http://prefix.cc/nif
>>>>    7. The versions are switched and published by these .htaccess
>>>>    rules, e.g.
>>>>    RewriteRule .(owl|rdf|html|ttl|nt|txt|md)$ - [L]
>>>>    # (in progress) RewriteCond %{HTTP_ACCEPT} text/html
>>>>    # (in progress) RewriteRule ^nif-core$
>>>>    /nlp2rdf/ontologies/nif-core/version-1.0/nif-core.html [R=303,L]
>>>>
>>>>    RewriteCond %{HTTP_ACCEPT} application/rdf+xml
>>>>    RewriteRule ^nif-core$
>>>>    /nlp2rdf/ontologies/nif-core/version-1.0/nif-core.owl [R=303,L]
>>>>
>>>>    RewriteRule ^nif-core$
>>>>    /nlp2rdf/ontologies/nif-core/version-1.0/nif-core.ttl [R=303,L]
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Am 19.04.2013 16:05, schrieb Prateek:
>>>>
>>>>  Hello all,
>>>>
>>>>  I am trying to identify a system which will provide versioning and
>>>> revision control capabilities specifically for ontologies. Does anyone have
>>>> any experience and idea about which systems can help out or if systems like
>>>> SVN, CVS can do the job?
>>>>
>>>>  Regards
>>>>
>>>>  Prateek
>>>>
>>>>  --
>>>>
>>>> - - - - - - - - - - - - - - - - - - -
>>>> Prateek Jain, Ph. D.
>>>> RSM
>>>> IBM T.J. Watson Research Center
>>>> 1101 Kitchawan Road, 37-244
>>>> Yorktown Heights, NY 10598
>>>> Linkedin: http://www.linkedin.com/in/prateekj
>>>>
>>>>
>>>>
>>>> --
>>>> Dipl. Inf. Sebastian Hellmann
>>>> Department of Computer Science, University of Leipzig
>>>> Projects: http://nlp2rdf.org , http://linguistics.okfn.org ,
>>>> http://dbpedia.org/Wiktionary , http://dbpedia.org
>>>> Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
>>>> Research Group: http://aksw.org
>>>>
>>>
>>>
>>>
>>> --
>>> Dipl. Inf. Sebastian Hellmann
>>> Department of Computer Science, University of Leipzig
>>> Projects: http://nlp2rdf.org , http://linguistics.okfn.org ,
>>> http://dbpedia.org/Wiktionary , http://dbpedia.org
>>> Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
>>> Research Group: http://aksw.org
>>>
>>
>>
>>
>> --
>> Dipl. Inf. Sebastian Hellmann
>> Department of Computer Science, University of Leipzig
>> Projects: http://nlp2rdf.org , http://linguistics.okfn.org ,
>> http://dbpedia.org/Wiktionary , http://dbpedia.org
>> Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
>> Research Group: http://aksw.org
>>
>
>
>
> --
> Dipl. Inf. Sebastian Hellmann
> Department of Computer Science, University of Leipzig
> Projects: http://nlp2rdf.org , http://linguistics.okfn.org ,
> http://dbpedia.org/Wiktionary , http://dbpedia.org
> Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
> Research Group: http://aksw.org
>
Received on Monday, 22 April 2013 17:13:48 UTC