Re: ANN: Sudoc bibliographic ans authority data

Hi Nicolas,

Its getting in Sindice indeed - quite politely e.g. 1 every 5 secs-
we'll monitor speed and completeness. iff you think its ok for us to
crawl faster please say so via robot.txt directive or just say so

http://sindice.com/search?q=book&nq=&fq=domain%3Awww.sudoc.fr&sortbydate=1&interface=advanced

at the same time i notice something funny in the markup e.g. if you go
with a browser you get redirected to something that has almost no data

for example the sitemap contains

http://www.sudoc.fr/000000043

if you go there you get redirected to

http://www.sudoc.abes.fr/DB=2.1/SRCH?IKT=12&TRM=000000043

which if you put in the inspector

http://inspector.sindice.com/inspect?url=http%3A%2F%2Fwww.sudoc.abes.fr%2FDB%3D2.1%2FSRCH%3FIKT%3D12%26TRM%3D000000043#TRIPLES

you get very little data

however of course if i use the inspector on
http://www.sudoc.fr/000000043 i get data

http://inspector.sindice.com/inspect?url=http%3A%2F%2Fwww.sudoc.fr%2F000000043&content=&contentType=auto#TRIPLES

which however is mostly schema.org data!

but in sindice i have lots of RDF data with all sort of other ontologies

http://sindice.com/search/page?url=http%3A%2F%2Fwww.sudoc.fr%2F000385123

is there any way you could try to normalize all into a single markup
type? i think it would be easier to debug and ultimately better for
all..

looking forward to support
Giovanni
Gio


On Fri, Jul 8, 2011 at 1:27 PM, Kingsley Idehen <kidehen@openlinksw.com> wrote:
> On 7/8/11 8:31 AM, Yann NICOLAS wrote:
>
> Le 08/07/2011 01:42, Kingsley Idehen a écrit :
>
> On 7/7/11 10:17 PM, Yann NICOLAS wrote:
>
> Bonjour,
>
> Sudoc [1], the French academic union catalogue maintained by ABES [2], has
> just been released as linked open data.
>
> 10 million bibliographic records are now available as RDF/XML.
>
> Examples for the Sudoc record whose internal id is 132133520 :
> . Resource URI : http://www.sudoc.fr/132133520/id
> . Generic document : http://www.sudoc.fr/132133520 (content negotiation is
> supported)
>
>
> Great job!
>
> Is there an RDF dump anywhere?
>
>
> Sorry, we don't provide any dump, as the 10 000 000 files are generated on
> the fly from Oracle (stored as XML type + some more tables).
> We provide a complete sitemap at
> http://www.sudoc.fr/noticesbiblio/sitemap.txt , and we hope that Sindice
> will crawl the whole stuff.
> Would it help ?
>
> Any advice welcome,
>
> Yann
>
> --
> --
> Yann NICOLAS
> Etudes & Projets
> ABES
>
> Okay, no problem with sitemaps as dump alternatives re. getting data
> imported into Linked Data hubs such our LOD cloud cache and Sindice etc..
>
>
> --
>
> Regards,
>
> Kingsley Idehen	
> President & CEO
> OpenLink Software
> Web: http://www.openlinksw.com
> Weblog: http://www.openlinksw.com/blog/~kidehen
> Twitter/Identi.ca: kidehen
>
>
>
>
>

Received on Saturday, 9 July 2011 21:10:52 UTC