Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank. from Paul Houle on 2014-07-18 (public-lod@w3.org from July 2014)

From: Paul Houle <ontology2@gmail.com>
Date: Fri, 18 Jul 2014 13:02:57 -0400
To: Mark Fallu <m.fallu@griffith.edu.au>
Cc: Linked Data community <public-lod@w3.org>
Message-ID: <CAE__kdSJoMTAHL5Qyy_b8uJVFXauGu6cx6jjEh7KnfWwiYq6EQ@mail.gmail.com>
Frankly I don't care about PageRank,  and these days I don't know if
Google does.  These days Google gets direct sampling of user behavior
through Chrome and Google Analytics,  and this sort of data is
probably much more valuable than the link graph since they know about
things like time-on-page,  query chains,  and things like that.

If anything,  PageRank,  or what people imagine about PageRank has
been harmful to the web because it's created a situation where people
just don't make links to other web sites anymore.  It started with
high profile sites (ex. engadget) that just wanted to be greedy and
not give any PageRank to their competition.  Then you saw people using
the NOFOLLOW attribute because they thought that this too was a way to
be greedy.

Ten years ago I got a lot of emails from people that amounted to "I
will pay you $X if you make a link on page Y to page Z with anchor
text T".  You'd also find SEO firms that would ask for $X a month to
generate Y links to your site.

Recently Google made some changes and they seem to be punishing people
who have inappropriate links so now people get emails like "Would you
please remove the link from page X to page Y" and the new thing is
that SEO firms now want you to pay them $X to remove Y links to your
site.

I think it is all a lot of bull and I make whatever links I like and
figure that Google is going to do whatever it is they are going to do.

ᐧ

On Fri, Jul 18, 2014 at 8:05 AM, Mark Fallu <m.fallu@griffith.edu.au> wrote:
> I am attempting to understand how the the CoolURI 303 redirect pattern for
> the semantic web (http://www.w3.org/TR/cooluris/) can be implemented without
> negative impact on search engines.
>
> This pattern appears to allow site content to be indexed, but prevents page
> rank from flowing through internal links due to the use of a 303 redirect.
>
> For example in Griffith's Research-Hub: http://research-hub.griffith.edu.au
>
> A get request to the URI of Howard Wiseman:
> http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f
>
> Will resolve to different urls based on content negotiation.
>
> For RDF:
> wget --header "Accept: application/rdf+xml"
> http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f
>
> results in a "303 see other" redirect to the RDF version of the entity:
> http://research-hub.griffith.edu.au/rdf/n33a4e2d3057476efaff5ce1884564a8f/n33a4e2d3057476efaff5ce1884564a8f.rdf
>
> For HTML:
> wget --header "Accept: text/html"
> http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f
> results in a "303 see other" redirect to the HTML version of the entity (our
> old friend the "display" version:
> http://research-hub.griffith.edu.au/display/n33a4e2d3057476efaff5ce1884564a8f
>
> Note: There will never be a HTML page at
> http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f
> just a HTTP response
>
> Links will be presented as the "individual" uri and then redirect to the
> "display" url.
>
> All good so far - this is a perfectly functional example of the Cool URI
> specification at work.  Unfortunately it results in a few issues in
> practice.
>
> If the links we present to the outside world for harvesting eg. via sparql
> endpoint, OAI-PMH or open social widget etc is the canonical "individual"
> URI, clients will be able to get to the "display" url, but the google page
> rank that would normally flow from these external links will not.
>
> The specification of a 303 redirect describes it as:
> http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
>>
>> "The response to the request can be found under a different URI and SHOULD
>> be retrieved using a GET method on that resource. This method exists
>> primarily to allow the output of a POST-activated script to redirect the
>> user agent to a selected resource. The new URI is not a substitute reference
>> for the originally requested resource. The 303 response MUST NOT be cached,
>> but the response to the second (redirected) request might be cacheable.
>>
>>
>>
>> The different URI SHOULD be given by the Location field in the response.
>> Unless the request method was HEAD, the entity of the response SHOULD
>> contain a short hypertext note with a hyperlink to the new URI(s)."
>
>
> Google correctly implements the specification and does not assign the page
> rank of the "individual" URI to the "display" URL as it is "not a substitute
> reference for the originally requested resource".
>
> The same is true of internal links, a high page rank home page will not pass
> page rank on to "display" urls if the pathway to those urls is via
> "individual" uri links.
>
> I am not sure what the solution is here as it seems the realms of SEO and
> the conventions of the web they are built on are not a good fit for semantic
> web best practice.
>
> The most minimal compromise I can think of is to move away from the use of a
> 303 redirect to a redirect that conserves the flow of google page rank.
>
> "302 Found" redirect is the recommended replacement for 303 for clients that
> do not support HTTP 1.1  and it does allow a certain amount of google page
> rank to flow.
> "301 Moved Permanently" is a poor fit for the Cool URI pattern, but passes
> on the full page rank of the links.
> rewriting all URIs the URL would also work, but would break the coolURI
> pattern.
>
> The pragmatist in me feels that if we are going to make a change for the
> purposes of SEO, it might as well be the one with best return, i.e. 301
> redirect.
>
> Note: Indexing is not the problem here, content is indexed.  The issue
> relates to page rank not flowing through a 303 redirect.
>
> I have tested and can confirm that 303 redirects are an issue for a number
> of reasons:
>
> page rank does not flow through a 303 redirect
> page rank can not be assigned from a url to a uri with a rel=canonical tag
> if URI does a 303 redirect (preventing aggregation of pagerank from external
> links to URL)
> URI and URL are indexed separately
> rdfa schema.org representations of URIs do not translate to URL (ie.
> representation described at URL A, talking about URI B, does not get
> connected to representation described at URL B)
> url parameters are not passed by a 303 redirect.
> impact on functinality of google analytics tracking eg. traversing the site
> is seen as a series of direct page visits.
>
> Essentially - as far as search engines are concerned - every URL and URI is
> an island, with no connections between them.  At best a URL can express a
> rel=canonical back to it's corresponding URI, no pagerank will flow through
> links.
>
>
> Any guidance you can provide would be appreciated.
>
> --
>
> o-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> | Mark Fallu
> | Manager, Research Data (Acting)
> | Office for Research
> | Bray Centre (N54) 0.10E
> | Griffith University, Nathan Campus
> | Queensland 4111 AUSTRALIA
> |
> | E-mail: m.fallu@griffith.edu.au
> | Mobile:  04177 69778
> | Phone:  +61 (07) 373 52069
> o-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-



-- 
Paul Houle
Expert on Freebase, DBpedia, Hadoop and RDF
(607) 539 6254    paul.houle on Skype   ontology2@gmail.com
Received on Friday, 18 July 2014 17:03:24 UTC