- From: Giovanni Tummarello <giovanni.tummarello@deri.org>
- Date: Sat, 9 Jul 2011 23:10:04 +0200
- To: nicolas@abes.fr
- Cc: public-lod@w3.org, Kingsley Idehen <kidehen@openlinksw.com>, "giulio.cesare@gmail.com" <giulio.solaroli@deri.org>
Hi Nicolas, Its getting in Sindice indeed - quite politely e.g. 1 every 5 secs- we'll monitor speed and completeness. iff you think its ok for us to crawl faster please say so via robot.txt directive or just say so http://sindice.com/search?q=book&nq=&fq=domain%3Awww.sudoc.fr&sortbydate=1&interface=advanced at the same time i notice something funny in the markup e.g. if you go with a browser you get redirected to something that has almost no data for example the sitemap contains http://www.sudoc.fr/000000043 if you go there you get redirected to http://www.sudoc.abes.fr/DB=2.1/SRCH?IKT=12&TRM=000000043 which if you put in the inspector http://inspector.sindice.com/inspect?url=http%3A%2F%2Fwww.sudoc.abes.fr%2FDB%3D2.1%2FSRCH%3FIKT%3D12%26TRM%3D000000043#TRIPLES you get very little data however of course if i use the inspector on http://www.sudoc.fr/000000043 i get data http://inspector.sindice.com/inspect?url=http%3A%2F%2Fwww.sudoc.fr%2F000000043&content=&contentType=auto#TRIPLES which however is mostly schema.org data! but in sindice i have lots of RDF data with all sort of other ontologies http://sindice.com/search/page?url=http%3A%2F%2Fwww.sudoc.fr%2F000385123 is there any way you could try to normalize all into a single markup type? i think it would be easier to debug and ultimately better for all.. looking forward to support Giovanni Gio On Fri, Jul 8, 2011 at 1:27 PM, Kingsley Idehen <kidehen@openlinksw.com> wrote: > On 7/8/11 8:31 AM, Yann NICOLAS wrote: > > Le 08/07/2011 01:42, Kingsley Idehen a écrit : > > On 7/7/11 10:17 PM, Yann NICOLAS wrote: > > Bonjour, > > Sudoc [1], the French academic union catalogue maintained by ABES [2], has > just been released as linked open data. > > 10 million bibliographic records are now available as RDF/XML. > > Examples for the Sudoc record whose internal id is 132133520 : > . Resource URI : http://www.sudoc.fr/132133520/id > . Generic document : http://www.sudoc.fr/132133520 (content negotiation is > supported) > > > Great job! > > Is there an RDF dump anywhere? > > > Sorry, we don't provide any dump, as the 10 000 000 files are generated on > the fly from Oracle (stored as XML type + some more tables). > We provide a complete sitemap at > http://www.sudoc.fr/noticesbiblio/sitemap.txt , and we hope that Sindice > will crawl the whole stuff. > Would it help ? > > Any advice welcome, > > Yann > > -- > -- > Yann NICOLAS > Etudes & Projets > ABES > > Okay, no problem with sitemaps as dump alternatives re. getting data > imported into Linked Data hubs such our LOD cloud cache and Sindice etc.. > > > -- > > Regards, > > Kingsley Idehen > President & CEO > OpenLink Software > Web: http://www.openlinksw.com > Weblog: http://www.openlinksw.com/blog/~kidehen > Twitter/Identi.ca: kidehen > > > > >
Received on Saturday, 9 July 2011 21:10:52 UTC