Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank. from Melvin Carvalho on 2014-07-18 (public-lod@w3.org from July 2014)

From: Melvin Carvalho <melvincarvalho@gmail.com>
Date: Fri, 18 Jul 2014 17:58:19 +0200
To: Mark Fallu <m.fallu@griffith.edu.au>
Cc: Linked Data community <public-lod@w3.org>
Message-ID: <CAKaEYh+p=xyz34hzZmj38KoDG4oYch3tJfAkS0D9vH=xCEiVsA@mail.gmail.com>
On 18 July 2014 14:05, Mark Fallu <m.fallu@griffith.edu.au> wrote:

> I am attempting to understand how the the CoolURI 303 redirect pattern for
> the semantic web (http://www.w3.org/TR/cooluris/) can be implemented
> without negative impact on search engines.
>

Just a quick question:

Is there any reason you want to use 303s?

I personally consider it an anti-pattern.


>
> This pattern appears to allow site content to be indexed, but
> prevents page rank from flowing through internal links due to the use of a
> 303 redirect.
>
> For example in Griffith's Research-Hub:
> http://research-hub.griffith.edu.au
>
> A get request to the URI of Howard Wiseman:
> http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f
>
> Will resolve to different urls based on content negotiation.
>
> For RDF:
> wget --header "Accept: application/rdf+xml"
> http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f
>
> results in a "303 see other" redirect to the RDF version of the entity:
>
> http://research-hub.griffith.edu.au/rdf/n33a4e2d3057476efaff5ce1884564a8f/n33a4e2d3057476efaff5ce1884564a8f.rdf
>
> For HTML:
> wget --header "Accept: text/html"
> http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f
> results in a "303 see other" redirect to the HTML version of the entity
> (our old friend the "display" version:
>
> http://research-hub.griffith.edu.au/display/n33a4e2d3057476efaff5ce1884564a8f
>
> Note: There will never be a HTML page at
> http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f just
> a HTTP response
>
> Links will be presented as the "individual" uri and then redirect to the
> "display" url.
>
> All good so far - this is a perfectly functional example of the Cool URI
> specification at work.  Unfortunately it results in a few issues in
> practice.
>
> If the links we present to the outside world for harvesting eg. via sparql
> endpoint, OAI-PMH or open social widget etc is the canonical "individual"
> URI, clients will be able to get to the "display" url, but the google page
> rank that would normally flow from these external links will not.
>
> The specification of a 303 redirect describes it as:
> http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
>
>> "The response to the request can be found under a different URI and
>> SHOULD be retrieved using a GET method on that resource. This method exists
>> primarily to allow the output of a POST-activated script to redirect the
>> user agent to a selected resource. *The new URI is not a substitute
>> reference for the originally requested resource*. The 303 response MUST
>> NOT be cached, but the response to the second (redirected) request might be
>> cacheable.
>>
>
>
> The different URI SHOULD be given by the Location field in the response.
>> Unless the request method was HEAD, the entity of the response SHOULD
>> contain a short hypertext note with a hyperlink to the new URI(s)."
>
>
> Google correctly implements the specification and does not assign the page
> rank of the "individual" URI to the "display" URL as it is "*not a
> substitute reference for the originally requested resource".*
>
>  The same is true of internal links, a high page rank home page will not
> pass page rank on to "display" urls if the pathway to those urls is via
> "individual" uri links.
>
> I am not sure what the solution is here as it seems the realms of SEO and
> the conventions of the web they are built on are not a good fit for
> semantic web best practice.
>
> The most minimal compromise I can think of is to move away from the use of
> a 303 redirect to a redirect that conserves the flow of google page rank.
>
>    - "302 Found" redirect is the recommended replacement for 303 for
>    clients that do not support HTTP 1.1  and it does allow a certain amount of
>    google page rank to flow.
>    - "301 Moved Permanently" is a poor fit for the Cool URI pattern, but
>    passes on the full page rank of the links.
>    - rewriting all URIs the URL would also work, but would break the
>    coolURI pattern.
>
> The pragmatist in me feels that if we are going to make a change for the
> purposes of SEO, it might as well be the one with best return, i.e. 301
> redirect.
>
> Note: Indexing is not the problem here, content is indexed.  The issue
> relates to page rank not flowing through a 303 redirect.
>
> I have tested and can confirm that 303 redirects are an issue for a number
> of reasons:
>
>    - page rank does not flow through a 303 redirect
>    - page rank can not be assigned from a url to a uri with a
>    rel=canonical tag if URI does a 303 redirect (preventing aggregation of
>    pagerank from external links to URL)
>    - URI and URL are indexed separately
>    - rdfa schema.org representations of URIs do not translate to URL (ie.
>    representation described at URL A, talking about URI B, does not get
>    connected to representation described at URL B)
>    - url parameters are not passed by a 303 redirect.
>    - impact on functinality of google analytics tracking eg. traversing
>    the site is seen as a series of direct page visits.
>
> Essentially - as far as search engines are concerned - every URL and URI
> is an island, with no connections between them.  At best a URL can express
> a rel=canonical back to it's corresponding URI, no pagerank will flow
> through links.
>
> Any guidance you can provide would be appreciated.
>
> --
>
> o-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> | Mark Fallu
> | Manager, Research Data (Acting)
> | Office for Research
> | Bray Centre (N54) 0.10E
> | Griffith University, Nathan Campus
> | Queensland 4111 AUSTRALIA
> |
> | E-mail: m.fallu@griffith.edu.au
> | Mobile:  04177 69778
> | Phone:  +61 (07) 373 52069
> o-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>
Received on Friday, 18 July 2014 15:58:48 UTC