- From: Kingsley Idehen <kidehen@openlinksw.com>
- Date: Mon, 09 Mar 2009 20:15:00 -0400
- To: Daniel Schwabe <dschwabe@inf.puc-rio.br>
- CC: public-lod@w3.org
Daniel Schwabe wrote:
> All,
> the sitemap.xml solution works IF everybody (or most) have the
> robots.txt or the sitemap.xml at the root directory. So, conceptually
> speaking, it should be the way to go.
>
> But a quick test on the LOD cloud returned 404 for many if not most
> sites for both sitemap.xml and robots.txt...
> Curiously, for many of those without a sitemap.xml, the
> <c-name>/sparql URI format to access the SPAQL endpoint DOES work...
>
> So something is still missing. Either each dataspace mantainer that is
> willing to provide the SPARQL endpoint also provides a (even if
> minimal) sitemap.xml or voiD description, or at least follows this
> convention.
> This would greatly enhance the accessibility of the data, and enable
> tools to automatically find them as needed...
>
> Cheers
> D
Daniel,
+1
Clearly we need to document the best practices somewhere :-)
Kingsley
>
>
> Sergio Fernández wrote:
>> On Sat, 2009-03-07 at 00:36 -0300, Daniel Schwabe wrote:
>>
>>> I could query the site for its sitemap extension (would it always be
>>> <home url>/sitemap.xml?
>>>
>>
>> Yes, you can do it in a programmatic way. But that URL (/sitemap.xml),
>> even it's common used, it's not mandatory, so you can't use it as a
>> constant. But there is one way, not so direct, but at least one that is
>> standard:
>>
>> 1) From /robots.txt you can take the Sitemap's URL ("Sitemap:" as [1]
>> specifies)
>> 2) According the extension proposed by DERI [2], you can check if the
>> sitemap points a SPARQL enpoint looking for the
>> sc:sparqlEndpointLocation element.
>>
>> Hope that helps.
>>
>> Best,
>>
>> [1] http://www.sitemaps.org/protocol.php
>> [2] http://sw.deri.org/2007/07/sitemapextension/
>>
>>
>
> --
> Daniel Schwabe
> Tel:+55-21-3527 1500 r. 4356
> Fax: +55-21-3527 1530
> http://www.inf.puc-rio.br/~dschwabe Dept. de Informatica, PUC-Rio
> R. M. de S. Vicente, 225
> Rio de Janeiro, RJ 22453-900, Brasil
>
--
Regards,
Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO
OpenLink Software Web: http://www.openlinksw.com
Received on Tuesday, 10 March 2009 00:15:41 UTC