- From: Mark Fallu <m.fallu@griffith.edu.au>
- Date: Sat, 19 Jul 2014 07:00:21 +1000
- To: Paul Houle <ontology2@gmail.com>
- Cc: Linked Data community <public-lod@w3.org>
That is a fair point - but I would still suggest that it is important for search engines to be able to meaningfully interpret: - internal links - rdfa representations that span multiple pages. Cheers, Mark Sent from my iPhone > On 19 Jul 2014, at 3:02 am, Paul Houle <ontology2@gmail.com> wrote: > > Frankly I don't care about PageRank, and these days I don't know if > Google does. These days Google gets direct sampling of user behavior > through Chrome and Google Analytics, and this sort of data is > probably much more valuable than the link graph since they know about > things like time-on-page, query chains, and things like that. > > If anything, PageRank, or what people imagine about PageRank has > been harmful to the web because it's created a situation where people > just don't make links to other web sites anymore. It started with > high profile sites (ex. engadget) that just wanted to be greedy and > not give any PageRank to their competition. Then you saw people using > the NOFOLLOW attribute because they thought that this too was a way to > be greedy. > > Ten years ago I got a lot of emails from people that amounted to "I > will pay you $X if you make a link on page Y to page Z with anchor > text T". You'd also find SEO firms that would ask for $X a month to > generate Y links to your site. > > Recently Google made some changes and they seem to be punishing people > who have inappropriate links so now people get emails like "Would you > please remove the link from page X to page Y" and the new thing is > that SEO firms now want you to pay them $X to remove Y links to your > site. > > I think it is all a lot of bull and I make whatever links I like and > figure that Google is going to do whatever it is they are going to do. > > ᐧ > >> On Fri, Jul 18, 2014 at 8:05 AM, Mark Fallu <m.fallu@griffith.edu.au> wrote: >> I am attempting to understand how the the CoolURI 303 redirect pattern for >> the semantic web (http://www.w3.org/TR/cooluris/) can be implemented without >> negative impact on search engines. >> >> This pattern appears to allow site content to be indexed, but prevents page >> rank from flowing through internal links due to the use of a 303 redirect. >> >> For example in Griffith's Research-Hub: http://research-hub.griffith.edu.au >> >> A get request to the URI of Howard Wiseman: >> http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f >> >> Will resolve to different urls based on content negotiation. >> >> For RDF: >> wget --header "Accept: application/rdf+xml" >> http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f >> >> results in a "303 see other" redirect to the RDF version of the entity: >> http://research-hub.griffith.edu.au/rdf/n33a4e2d3057476efaff5ce1884564a8f/n33a4e2d3057476efaff5ce1884564a8f.rdf >> >> For HTML: >> wget --header "Accept: text/html" >> http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f >> results in a "303 see other" redirect to the HTML version of the entity (our >> old friend the "display" version: >> http://research-hub.griffith.edu.au/display/n33a4e2d3057476efaff5ce1884564a8f >> >> Note: There will never be a HTML page at >> http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f >> just a HTTP response >> >> Links will be presented as the "individual" uri and then redirect to the >> "display" url. >> >> All good so far - this is a perfectly functional example of the Cool URI >> specification at work. Unfortunately it results in a few issues in >> practice. >> >> If the links we present to the outside world for harvesting eg. via sparql >> endpoint, OAI-PMH or open social widget etc is the canonical "individual" >> URI, clients will be able to get to the "display" url, but the google page >> rank that would normally flow from these external links will not. >> >> The specification of a 303 redirect describes it as: >> http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html >>> >>> "The response to the request can be found under a different URI and SHOULD >>> be retrieved using a GET method on that resource. This method exists >>> primarily to allow the output of a POST-activated script to redirect the >>> user agent to a selected resource. The new URI is not a substitute reference >>> for the originally requested resource. The 303 response MUST NOT be cached, >>> but the response to the second (redirected) request might be cacheable. >>> >>> >>> >>> The different URI SHOULD be given by the Location field in the response. >>> Unless the request method was HEAD, the entity of the response SHOULD >>> contain a short hypertext note with a hyperlink to the new URI(s)." >> >> >> Google correctly implements the specification and does not assign the page >> rank of the "individual" URI to the "display" URL as it is "not a substitute >> reference for the originally requested resource". >> >> The same is true of internal links, a high page rank home page will not pass >> page rank on to "display" urls if the pathway to those urls is via >> "individual" uri links. >> >> I am not sure what the solution is here as it seems the realms of SEO and >> the conventions of the web they are built on are not a good fit for semantic >> web best practice. >> >> The most minimal compromise I can think of is to move away from the use of a >> 303 redirect to a redirect that conserves the flow of google page rank. >> >> "302 Found" redirect is the recommended replacement for 303 for clients that >> do not support HTTP 1.1 and it does allow a certain amount of google page >> rank to flow. >> "301 Moved Permanently" is a poor fit for the Cool URI pattern, but passes >> on the full page rank of the links. >> rewriting all URIs the URL would also work, but would break the coolURI >> pattern. >> >> The pragmatist in me feels that if we are going to make a change for the >> purposes of SEO, it might as well be the one with best return, i.e. 301 >> redirect. >> >> Note: Indexing is not the problem here, content is indexed. The issue >> relates to page rank not flowing through a 303 redirect. >> >> I have tested and can confirm that 303 redirects are an issue for a number >> of reasons: >> >> page rank does not flow through a 303 redirect >> page rank can not be assigned from a url to a uri with a rel=canonical tag >> if URI does a 303 redirect (preventing aggregation of pagerank from external >> links to URL) >> URI and URL are indexed separately >> rdfa schema.org representations of URIs do not translate to URL (ie. >> representation described at URL A, talking about URI B, does not get >> connected to representation described at URL B) >> url parameters are not passed by a 303 redirect. >> impact on functinality of google analytics tracking eg. traversing the site >> is seen as a series of direct page visits. >> >> Essentially - as far as search engines are concerned - every URL and URI is >> an island, with no connections between them. At best a URL can express a >> rel=canonical back to it's corresponding URI, no pagerank will flow through >> links. >> >> >> Any guidance you can provide would be appreciated. >> >> -- >> >> o-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >> | Mark Fallu >> | Manager, Research Data (Acting) >> | Office for Research >> | Bray Centre (N54) 0.10E >> | Griffith University, Nathan Campus >> | Queensland 4111 AUSTRALIA >> | >> | E-mail: m.fallu@griffith.edu.au >> | Mobile: 04177 69778 >> | Phone: +61 (07) 373 52069 >> o-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > > > > -- > Paul Houle > Expert on Freebase, DBpedia, Hadoop and RDF > (607) 539 6254 paul.houle on Skype ontology2@gmail.com
Received on Friday, 18 July 2014 21:00:50 UTC