W3C home > Mailing lists > Public > semantic-web@w3.org > April 2011

Re: SPARLQ endpoint discovery

From: Martin Hepp <martin.hepp@ebusiness-unibw.org>
Date: Mon, 4 Apr 2011 10:14:35 +0200
Cc: Giovanni Tummarello <giovanni.tummarello@deri.org>, Francisco Javier López Pellicer <fjlopez@unizar.es>, semantic-web <semantic-web@w3c.org>
Message-Id: <40479806-F48C-42BB-8FE9-B8C67A09DF33@ebusiness-unibw.org>
To: Richard Cyganiak <richard@cyganiak.de>
Hi all:

Richard raises an important point - since Semantic Sitemaps don't validate in Google tools, it is hard to convince site-owners to use them.

However, there is a work-around: You can publish BOTH a regular sitemap and a semantic sitemap for your site and list both in the robots.txt file.

Google should accept the regular one (you could also submit this to them manually) and ignore the semantic sitemap. RDF-aware crawlers would find both and could prefer the semantic sitemap.

The downside of this approach is that you risk to increase the crawling load on your site. But I would assume you could minimize the overlap of URIs in both - e.g., you do not need to tell Google of your compressed RDF dump file resources.

Best wishes

Martin

On Apr 4, 2011, at 8:53 AM, Richard Cyganiak wrote:

> Hi Giovanni,
> 
> Semanitc Sitemaps seemed like a good idea because it was a very simple extension to standard XML Sitemaps, which are a widely adopted format supported by Google and other major search engines.
> 
> What killed Semantic Sitemaps for me is the fact that adding *any* extension element, even a single line, makes Google reject the Sitemap.
> 
> In practice, XML Sitemaps are not an extensible format.
> 
> On the question of complexity of Sitemaps and VoID: Publishers will get it right if and only if there is a) some serious consumption of the data that publishers actually care about and b) a validator. At the moment neither a) nor b) is given, neither for Semantic Sitemaps nor for VoID.
> 
> Best,
> Richard
> 
> 
> On 3 Apr 2011, at 18:16, Giovanni Tummarello wrote:
> 
>> With the Sitemap extension called Semantic Web Sitemap we did indeed
>> give a very simple alternative.
>> It was also partially adopted
>> 
>> http://www.arnetminer.org/viewpub.do?pid=190125
>> 
>> but what breaks it for that protocol is the part about explaining (to
>> a machine) how to go from a dump  to "linked data publishing" which is
>> a very fuzzy concent as fuzzy as "describe"
>> 
>> the chances of someone getting that file actually right were slim to
>> begin with (we had to correct several times those who tried) and as
>> far as my reports go the chances of getting void right
>> (which is in RDF therefore much less intuitive for human editing than
>> a simple XML like sitemaps) cant get much better.
>> 
>> i personally think a single line in the sitemap.xml file is really
>> what'sneeded so wrt this this part of the extention really does its
>> job. however until there is someone seriously consuming this there
>> wont be a need to standardize.
>> 
>> Gio
>> 
>> 
>> 
>> 
>> On Sun, Apr 3, 2011 at 11:06 AM, Francisco Javier López Pellicer
>> <fjlopez@unizar.es> wrote:
>>> 
>>>> 
>>>> A related question is SPARQL endpoint fingerprinting... Which
>>>> is not necessarily straightforward as often people put them
>>>> behind HTTP reverse proxies that stomp on identifiable
>>>> headers... In principle it would be interesting to do a
>>>> survey to see the relative prevalence of different SPARQL
>>>> implementations.
>>> 
>>> Agree.
>>> 
>>> SPARQL endpoint discovery and SPARQL endpoints fingerprinting could be two
>>> research lines related with the architecture of SemWeb:
>>> 
>>> - Indexing SPARQL enpoint (with/without the help of vocabularies such as
>>> void) -> A hint for knowing the effective size of the SemWeb initiatives
>>> 
>>> - SPARQL endpoint fingerprint identification -> "Market share" analysis of
>>> SPARQL technology pervalence
>>> 
>>> -- fjlopez
>>> 
>>> 
>> 
> 
> 
Received on Monday, 4 April 2011 08:15:10 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 21:45:42 GMT