Re: SPARLQ endpoint discovery from Richard Cyganiak on 2011-04-04 (semantic-web@w3.org from April 2011)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Mon, 4 Apr 2011 14:41:11 +0530
To: Martin Hepp <martin.hepp@ebusiness-unibw.org>
Cc: Giovanni Tummarello <giovanni.tummarello@deri.org>, Francisco Javier López Pellicer <fjlopez@unizar.es>, semantic-web <semantic-web@w3c.org>
Message-Id: <9A44DF1A-D17B-43F7-BE85-A396764FE006@cyganiak.de>
On 4 Apr 2011, at 13:58, Martin Hepp wrote:
> I agree. But it is unlikely that Google will accept semantic sitemaps and it will be hard or impossible to convice SEO consultants to waive a Google-valid sitemap in favor of a semantic sitemap. So as of now, I think it is the best we can get.

Yes, I agree with this assessment.

Richard




> 
> Martin
> 
> On Apr 4, 2011, at 10:21 AM, Richard Cyganiak wrote:
> 
>> Hi Martin,
>> 
>> On 4 Apr 2011, at 13:44, Martin Hepp wrote:
>>> Since Semantic Sitemaps don't validate in Google tools, it is hard to convince site-owners to use them.
>>> 
>>> However, there is a work-around: You can publish BOTH a regular sitemap and a semantic sitemap for your site and list both in the robots.txt file.
>>> 
>>> Google should accept the regular one (you could also submit this to them manually) and ignore the semantic sitemap. RDF-aware crawlers would find both and could prefer the semantic sitemap.
>> 
>> Yes, this works AFAIK. But this style of using Semantic Sitemaps loses their main advantage: being a simple extension of an established format that many webmasters already use.
>> 
>> Best,
>> Richard
>> 
>> 
>> 
>> 
>>> 
>>> The downside of this approach is that you risk to increase the crawling load on your site. But I would assume you could minimize the overlap of URIs in both - e.g., you do not need to tell Google of your compressed RDF dump file resources.
>>> 
>>> Best wishes
>>> 
>>> Martin
>>> 
>>> On Apr 4, 2011, at 8:53 AM, Richard Cyganiak wrote:
>>> 
>>>> Hi Giovanni,
>>>> 
>>>> Semanitc Sitemaps seemed like a good idea because it was a very simple extension to standard XML Sitemaps, which are a widely adopted format supported by Google and other major search engines.
>>>> 
>>>> What killed Semantic Sitemaps for me is the fact that adding *any* extension element, even a single line, makes Google reject the Sitemap.
>>>> 
>>>> In practice, XML Sitemaps are not an extensible format.
>>>> 
>>>> On the question of complexity of Sitemaps and VoID: Publishers will get it right if and only if there is a) some serious consumption of the data that publishers actually care about and b) a validator. At the moment neither a) nor b) is given, neither for Semantic Sitemaps nor for VoID.
>>>> 
>>>> Best,
>>>> Richard
>>>> 
>>>> 
>>>> On 3 Apr 2011, at 18:16, Giovanni Tummarello wrote:
>>>> 
>>>>> With the Sitemap extension called Semantic Web Sitemap we did indeed
>>>>> give a very simple alternative.
>>>>> It was also partially adopted
>>>>> 
>>>>> http://www.arnetminer.org/viewpub.do?pid=190125
>>>>> 
>>>>> but what breaks it for that protocol is the part about explaining (to
>>>>> a machine) how to go from a dump  to "linked data publishing" which is
>>>>> a very fuzzy concent as fuzzy as "describe"
>>>>> 
>>>>> the chances of someone getting that file actually right were slim to
>>>>> begin with (we had to correct several times those who tried) and as
>>>>> far as my reports go the chances of getting void right
>>>>> (which is in RDF therefore much less intuitive for human editing than
>>>>> a simple XML like sitemaps) cant get much better.
>>>>> 
>>>>> i personally think a single line in the sitemap.xml file is really
>>>>> what'sneeded so wrt this this part of the extention really does its
>>>>> job. however until there is someone seriously consuming this there
>>>>> wont be a need to standardize.
>>>>> 
>>>>> Gio
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On Sun, Apr 3, 2011 at 11:06 AM, Francisco Javier López Pellicer
>>>>> <fjlopez@unizar.es> wrote:
>>>>>> 
>>>>>>> 
>>>>>>> A related question is SPARQL endpoint fingerprinting... Which
>>>>>>> is not necessarily straightforward as often people put them
>>>>>>> behind HTTP reverse proxies that stomp on identifiable
>>>>>>> headers... In principle it would be interesting to do a
>>>>>>> survey to see the relative prevalence of different SPARQL
>>>>>>> implementations.
>>>>>> 
>>>>>> Agree.
>>>>>> 
>>>>>> SPARQL endpoint discovery and SPARQL endpoints fingerprinting could be two
>>>>>> research lines related with the architecture of SemWeb:
>>>>>> 
>>>>>> - Indexing SPARQL enpoint (with/without the help of vocabularies such as
>>>>>> void) -> A hint for knowing the effective size of the SemWeb initiatives
>>>>>> 
>>>>>> - SPARQL endpoint fingerprint identification -> "Market share" analysis of
>>>>>> SPARQL technology pervalence
>>>>>> 
>>>>>> -- fjlopez
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>> 
>> 
>
Received on Monday, 4 April 2011 09:11:55 UTC