- From: Daniel Garijo <dgarijo@fi.upm.es>
- Date: Mon, 22 Apr 2013 19:13:14 +0200
- To: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
- Cc: Prateek <prateek@knoesis.org>, "semantic-web@w3.org Web" <semantic-web@w3.org>
- Message-ID: <CAExK0DcZj+jFRypHt7evgZ72jk08C2mzPOhopAdsu0yua4_T3g@mail.gmail.com>
Interesting, thanks for clarifying! Best, Daniel 2013/4/22 Sebastian Hellmann <hellmann@informatik.uni-leipzig.de> > Hi Daniel, > well this matter is quite simple: the .htaccess rules need to be changed. > At least from: > RewriteRule ^nif-core$ > /nlp2rdf/ontologies/nif-core/version-1.0/nif-core.ttl [R=303,L] > to > RewriteRule ^nif-core/ > /nlp2rdf/ontologies/nif-core/version-1.0/nif-core.ttl [R=303,L] > > The behaviour might not be what you would normally expect, as you would > get a superset (i.e. all tripels and not just those starting with the URI). > Would this pose a problem? A client would expect to get only 10 triples, > but would receive the whole ontology. > > Would everybody with a result like this? > curl -IL > http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core/String > > HTTP/1.1 303 See Other > > Location: > http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core/version-1.0/nif-core.ttl > > > From a practical perspective it wouldn't matter for small ontologies. For > large files with thousands of terms using a store (e.g. virtuoso) returning > just the CBD[1] is probably the best practice. I wouldn't expect that naive > and ad-hoc crawler implementations implement a cache in a good way (i.e. > caching the Location of the 303 redirect) . Chances are better, that the > cache works better for '#' uris. > > So as a guideline, I will add, that anybody should calculate the worst > case download traffic per client, i.e. my two file are: > nif-core.owl -> 15610 byte > nif-core.ttl -> 4050 byte > > with 22 unique subjects: > rapper -g nif-core.ttl | cut -f1 -d '>' | sort -u | wc -l > > so a badly implemented crawler can the following traffic damage: > 22 * 15610 =~ 343.5kb > 22 * 4050 =~ 89.1kb > > which is still acceptable. For larger files, this becomes infeasable. '/' > with triple store and CBD might be the best option for larger files. But > it is not as easy to set up (as it requires more than an Apache web server > ). > > As I see it, the Gene Ontology [2] is using '#' and is practically > unpublishable as linked data: > 39292 unique subjects with over 97MB = 3.8 GB > > On my webserver, I only have Apache and .htaccess, so a triple store is > not an option. Maybe I should write a script, which splits the triples into > separate files? But on the other hand my university has unlimited traffic > ;) > > I will introduce the above rule of thumb in the README, when I have time. > > All the best, > Sebastian > > > > [1] http://www.w3.org/Submission/CBD/ > [2] http://archive.geneontology.org/latest-termdb/go_daily-termdb.owl.gz > > Am 22.04.2013 14:09, schrieb Daniel Garijo: > > Hi Sebastain, > I'm glad I could help you. > However, I still don't get why the workflow wouldn't work for ontologies > with "/". Are any of the > tools not appropriate? > Thanks for sharing the VAD link. I was not aware of the tool. > Best, > Daniel > > > 2013/4/22 Sebastian Hellmann <hellmann@informatik.uni-leipzig.de> > >> Ah, yes, you are right. Thanks for your help. >> I was confused, because DBpedia also shows the instance data for the >> classes in the HTML interface (i.e. inbound triples): >> http://dbpedia.org/ontology/PopulatedPlace >> >> But of course, this is just a nice add-on for the HTML view. There is >> actually no "Get all instances of a class" via Linked Data, only via >> SPARQL. >> >> I have also updated >> https://github.com/NLP2RDF/persistence.uni-leipzig.org#-vs--uris with: >> >> There has been an ongoing debate about '#' vs. '/' . We focus on >> ontologies with '\#' here with URIs like: >> http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#String >> Note that ontologies with '/' URIs need to published differently >> (description not included here). >> >> >> By the way, DBpedia only uses something that looks like Pubby, i.e. the >> DBpedia VAD, which is written in vsp[1]. >> >> Thanks again, >> Sebastian >> >> [1]https://github.com/dbpedia/dbpedia-vad-i18n >> >> >> Am 22.04.2013 12:17, schrieb Daniel Garijo: >> >> Hi, I'm not sure I see the issue here. >> >> >> 2013/4/22 Sebastian Hellmann <hellmann@informatik.uni-leipzig.de> >> >>> Hm, no actually, this issue is quite easy, when it comes to large >>> databases. >>> >>> curl -H "Accept: text/turtle" >>> "http://dbpedia.org/ontology#PopulatedPlace"<http://dbpedia.org/ontology#PopulatedPlace> >>> is pretty much the same as: >>> curl -H "Accept: text/turtle" "http://dbpedia.org/ontology"<http://dbpedia.org/ontology> >>> >> But here you are not asking for any instance. You are asking for a >> document >> where the ontology is defined. >> >>> >>> So my questions are: >>> >>> 1. What do you think is the expected output of >>> http://dbpedia.org/ontology ? 300 million triples as turtle? >>> >> No. You would see the description of the ontology. In DB-pedia they >> haven't done such redirection because >> they are exposing both terms and classes with Pubby. But note that when >> you look for a term, no instances >> are returned. >> >> 2. How do you query all instances of type db-ont:PopulatedPlace via >>> Linked Data ? >>> >> Via a SPARQL query: >> select ?instance where{ >> ?instance a db-ont:PopulatedPlace. >> } >> If you don't want all the instances, then add a "LIMIT". That is why >> they have a public endpoint, right? >> >> Another example. The recent PROV-O Ontology (with namespace URI >> http://www.w3.org/ns/prov#). >> If I have an endpoint with many prov:Entities published and I want >> them, I can perform a query >> as the one I did above. If I want to see the documentation of the term, >> then I would ask for >> http://www.w3.org/ns/prov#Entity and I would be redirected to it. >> Doing an accept request for turtle to an ontology term would return the >> owl file of the ontology, >> not the instances of that term. >> >> Best, >> Daniel >> >>> >>> q.e.d from my point of view, as you wouldn't get around these practical >>> problems. >>> >>> -- Sebastian >>> >>> Am 22.04.2013 11:50, schrieb Daniel Garijo: >>> >>> Dear Sebastian, >>> This statement: >>> "When you publish ontologies without data, you can use '#' . However, if >>> you want to query instances via Linked Data in a database, you have to use >>> '/' as DBpedia does for classes: >>> http://dbpedia.org/ontology/PopulatedPlace" >>> >>> is not correct. You can use "#" to query instances via Linked Data >>> databases. That is just the URI of the type. In fact if DBpedia had chosen >>> >>> "http://dbpedia.org/ontology#PopulatedPlace<http://dbpedia.org/ontology/PopulatedPlace>" >>> instead of its current URI it would still be fine. It doesn't affect the >>> query. >>> >>> I'm not going to enter in the debate of "# vs /", but normally it is a >>> design decission that has to do more with the size of vocabularies than the >>> instances. >>> >>> Best, >>> Daniel >>> >>> >>> 2013/4/22 Sebastian Hellmann <hellmann@informatik.uni-leipzig.de> >>> >>>> Dear all, >>>> >>>> personally, I have been working on this for quite a while and for me >>>> the best and easiest way is as documented here: >>>> https://github.com/NLP2RDF/persistence.uni-leipzig.org#readme >>>> >>>> They are simple and effective and I couldn't imagine anything more. >>>> >>>> Note that I have also secured persistent hosting for the URIs (also an >>>> important point). >>>> Feedback welcome, of course. >>>> >>>> All the best, >>>> Sebastian >>>> Ontology: >>>> http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core# >>>> # vs / >>>> >>>> When you publish ontologies without data, you can use '#' . However, if >>>> you want to query instances via Linked Data in a database, you have to use >>>> '/' as DBpedia does for classes: >>>> http://dbpedia.org/ontology/PopulatedPlace >>>> <https://github.com/NLP2RDF/persistence.uni-leipzig.org#workflow> >>>> Workflow >>>> >>>> 1. I edit the ontologies in turtle syntax with the Geany text >>>> editor (or a Turtle editor >>>> http://blog.aksw.org/2013/xturtle-turtle-editing-the-eclipse-way ), >>>> This allows me to make developers comments using "#" directly in the >>>> source, see e.g. nlp2rdf/ontologies/nif-core.ttl >>>> 2. When I am finished I use rapper ( >>>> http://librdf.org/raptor/rapper.html) to convert it to rdfxml ( >>>> nlp2rdf/ontologies/nif-core.owl ) >>>> 3. I am versioning the ontologies in a folder with the version >>>> number, e.g. version-1.0 If somebody wants to find old ontologies, she can >>>> find them in the GitHub repository, which is linked from the ontology. I >>>> assume this is not often required, but it is nice to keep old versions. The >>>> old versions should be linked to in the comment of the ontology, see the >>>> header of nif-core.ttl >>>> 4. Then I use git push to push the changes to our server >>>> 5. (not yet) I use a simple OWL2HTML generator, e.g. >>>> https://github.com/specgen/specgen >>>> 6. add yourself to http://prefix.cc, see e.g. http://prefix.cc/nif >>>> 7. The versions are switched and published by these .htaccess >>>> rules, e.g. >>>> RewriteRule .(owl|rdf|html|ttl|nt|txt|md)$ - [L] >>>> # (in progress) RewriteCond %{HTTP_ACCEPT} text/html >>>> # (in progress) RewriteRule ^nif-core$ >>>> /nlp2rdf/ontologies/nif-core/version-1.0/nif-core.html [R=303,L] >>>> >>>> RewriteCond %{HTTP_ACCEPT} application/rdf+xml >>>> RewriteRule ^nif-core$ >>>> /nlp2rdf/ontologies/nif-core/version-1.0/nif-core.owl [R=303,L] >>>> >>>> RewriteRule ^nif-core$ >>>> /nlp2rdf/ontologies/nif-core/version-1.0/nif-core.ttl [R=303,L] >>>> >>>> >>>> >>>> >>>> >>>> >>>> Am 19.04.2013 16:05, schrieb Prateek: >>>> >>>> Hello all, >>>> >>>> I am trying to identify a system which will provide versioning and >>>> revision control capabilities specifically for ontologies. Does anyone have >>>> any experience and idea about which systems can help out or if systems like >>>> SVN, CVS can do the job? >>>> >>>> Regards >>>> >>>> Prateek >>>> >>>> -- >>>> >>>> - - - - - - - - - - - - - - - - - - - >>>> Prateek Jain, Ph. D. >>>> RSM >>>> IBM T.J. Watson Research Center >>>> 1101 Kitchawan Road, 37-244 >>>> Yorktown Heights, NY 10598 >>>> Linkedin: http://www.linkedin.com/in/prateekj >>>> >>>> >>>> >>>> -- >>>> Dipl. Inf. Sebastian Hellmann >>>> Department of Computer Science, University of Leipzig >>>> Projects: http://nlp2rdf.org , http://linguistics.okfn.org , >>>> http://dbpedia.org/Wiktionary , http://dbpedia.org >>>> Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann >>>> Research Group: http://aksw.org >>>> >>> >>> >>> >>> -- >>> Dipl. Inf. Sebastian Hellmann >>> Department of Computer Science, University of Leipzig >>> Projects: http://nlp2rdf.org , http://linguistics.okfn.org , >>> http://dbpedia.org/Wiktionary , http://dbpedia.org >>> Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann >>> Research Group: http://aksw.org >>> >> >> >> >> -- >> Dipl. Inf. Sebastian Hellmann >> Department of Computer Science, University of Leipzig >> Projects: http://nlp2rdf.org , http://linguistics.okfn.org , >> http://dbpedia.org/Wiktionary , http://dbpedia.org >> Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann >> Research Group: http://aksw.org >> > > > > -- > Dipl. Inf. Sebastian Hellmann > Department of Computer Science, University of Leipzig > Projects: http://nlp2rdf.org , http://linguistics.okfn.org , > http://dbpedia.org/Wiktionary , http://dbpedia.org > Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann > Research Group: http://aksw.org >
Received on Monday, 22 April 2013 17:13:48 UTC