- From: Kingsley Idehen <kidehen@openlinksw.com>
- Date: Mon, 09 Mar 2009 20:15:00 -0400
- To: Daniel Schwabe <dschwabe@inf.puc-rio.br>
- CC: public-lod@w3.org
Daniel Schwabe wrote: > All, > the sitemap.xml solution works IF everybody (or most) have the > robots.txt or the sitemap.xml at the root directory. So, conceptually > speaking, it should be the way to go. > > But a quick test on the LOD cloud returned 404 for many if not most > sites for both sitemap.xml and robots.txt... > Curiously, for many of those without a sitemap.xml, the > <c-name>/sparql URI format to access the SPAQL endpoint DOES work... > > So something is still missing. Either each dataspace mantainer that is > willing to provide the SPARQL endpoint also provides a (even if > minimal) sitemap.xml or voiD description, or at least follows this > convention. > This would greatly enhance the accessibility of the data, and enable > tools to automatically find them as needed... > > Cheers > D Daniel, +1 Clearly we need to document the best practices somewhere :-) Kingsley > > > Sergio Fernández wrote: >> On Sat, 2009-03-07 at 00:36 -0300, Daniel Schwabe wrote: >> >>> I could query the site for its sitemap extension (would it always be >>> <home url>/sitemap.xml? >>> >> >> Yes, you can do it in a programmatic way. But that URL (/sitemap.xml), >> even it's common used, it's not mandatory, so you can't use it as a >> constant. But there is one way, not so direct, but at least one that is >> standard: >> >> 1) From /robots.txt you can take the Sitemap's URL ("Sitemap:" as [1] >> specifies) >> 2) According the extension proposed by DERI [2], you can check if the >> sitemap points a SPARQL enpoint looking for the >> sc:sparqlEndpointLocation element. >> >> Hope that helps. >> >> Best, >> >> [1] http://www.sitemaps.org/protocol.php >> [2] http://sw.deri.org/2007/07/sitemapextension/ >> >> > > -- > Daniel Schwabe > Tel:+55-21-3527 1500 r. 4356 > Fax: +55-21-3527 1530 > http://www.inf.puc-rio.br/~dschwabe Dept. de Informatica, PUC-Rio > R. M. de S. Vicente, 225 > Rio de Janeiro, RJ 22453-900, Brasil > -- Regards, Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen President & CEO OpenLink Software Web: http://www.openlinksw.com
Received on Tuesday, 10 March 2009 00:15:41 UTC