- From: Melvin Carvalho <melvincarvalho@gmail.com>
- Date: Fri, 18 Jul 2014 17:58:19 +0200
- To: Mark Fallu <m.fallu@griffith.edu.au>
- Cc: Linked Data community <public-lod@w3.org>
- Message-ID: <CAKaEYh+p=xyz34hzZmj38KoDG4oYch3tJfAkS0D9vH=xCEiVsA@mail.gmail.com>
On 18 July 2014 14:05, Mark Fallu <m.fallu@griffith.edu.au> wrote: > I am attempting to understand how the the CoolURI 303 redirect pattern for > the semantic web (http://www.w3.org/TR/cooluris/) can be implemented > without negative impact on search engines. > Just a quick question: Is there any reason you want to use 303s? I personally consider it an anti-pattern. > > This pattern appears to allow site content to be indexed, but > prevents page rank from flowing through internal links due to the use of a > 303 redirect. > > For example in Griffith's Research-Hub: > http://research-hub.griffith.edu.au > > A get request to the URI of Howard Wiseman: > http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f > > Will resolve to different urls based on content negotiation. > > For RDF: > wget --header "Accept: application/rdf+xml" > http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f > > results in a "303 see other" redirect to the RDF version of the entity: > > http://research-hub.griffith.edu.au/rdf/n33a4e2d3057476efaff5ce1884564a8f/n33a4e2d3057476efaff5ce1884564a8f.rdf > > For HTML: > wget --header "Accept: text/html" > http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f > results in a "303 see other" redirect to the HTML version of the entity > (our old friend the "display" version: > > http://research-hub.griffith.edu.au/display/n33a4e2d3057476efaff5ce1884564a8f > > Note: There will never be a HTML page at > http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f just > a HTTP response > > Links will be presented as the "individual" uri and then redirect to the > "display" url. > > All good so far - this is a perfectly functional example of the Cool URI > specification at work. Unfortunately it results in a few issues in > practice. > > If the links we present to the outside world for harvesting eg. via sparql > endpoint, OAI-PMH or open social widget etc is the canonical "individual" > URI, clients will be able to get to the "display" url, but the google page > rank that would normally flow from these external links will not. > > The specification of a 303 redirect describes it as: > http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html > >> "The response to the request can be found under a different URI and >> SHOULD be retrieved using a GET method on that resource. This method exists >> primarily to allow the output of a POST-activated script to redirect the >> user agent to a selected resource. *The new URI is not a substitute >> reference for the originally requested resource*. The 303 response MUST >> NOT be cached, but the response to the second (redirected) request might be >> cacheable. >> > > > The different URI SHOULD be given by the Location field in the response. >> Unless the request method was HEAD, the entity of the response SHOULD >> contain a short hypertext note with a hyperlink to the new URI(s)." > > > Google correctly implements the specification and does not assign the page > rank of the "individual" URI to the "display" URL as it is "*not a > substitute reference for the originally requested resource".* > > The same is true of internal links, a high page rank home page will not > pass page rank on to "display" urls if the pathway to those urls is via > "individual" uri links. > > I am not sure what the solution is here as it seems the realms of SEO and > the conventions of the web they are built on are not a good fit for > semantic web best practice. > > The most minimal compromise I can think of is to move away from the use of > a 303 redirect to a redirect that conserves the flow of google page rank. > > - "302 Found" redirect is the recommended replacement for 303 for > clients that do not support HTTP 1.1 and it does allow a certain amount of > google page rank to flow. > - "301 Moved Permanently" is a poor fit for the Cool URI pattern, but > passes on the full page rank of the links. > - rewriting all URIs the URL would also work, but would break the > coolURI pattern. > > The pragmatist in me feels that if we are going to make a change for the > purposes of SEO, it might as well be the one with best return, i.e. 301 > redirect. > > Note: Indexing is not the problem here, content is indexed. The issue > relates to page rank not flowing through a 303 redirect. > > I have tested and can confirm that 303 redirects are an issue for a number > of reasons: > > - page rank does not flow through a 303 redirect > - page rank can not be assigned from a url to a uri with a > rel=canonical tag if URI does a 303 redirect (preventing aggregation of > pagerank from external links to URL) > - URI and URL are indexed separately > - rdfa schema.org representations of URIs do not translate to URL (ie. > representation described at URL A, talking about URI B, does not get > connected to representation described at URL B) > - url parameters are not passed by a 303 redirect. > - impact on functinality of google analytics tracking eg. traversing > the site is seen as a series of direct page visits. > > Essentially - as far as search engines are concerned - every URL and URI > is an island, with no connections between them. At best a URL can express > a rel=canonical back to it's corresponding URI, no pagerank will flow > through links. > > Any guidance you can provide would be appreciated. > > -- > > o-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > | Mark Fallu > | Manager, Research Data (Acting) > | Office for Research > | Bray Centre (N54) 0.10E > | Griffith University, Nathan Campus > | Queensland 4111 AUSTRALIA > | > | E-mail: m.fallu@griffith.edu.au > | Mobile: 04177 69778 > | Phone: +61 (07) 373 52069 > o-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >
Received on Friday, 18 July 2014 15:58:48 UTC