Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.

Hi Michael,

You asked:

How can URIs from sparql endpoints or OAI-PMH contribute to page rank?
>

If party A:
- produces a system that uses 303 based cooluri to describe  their content,
and in addition to webpages expose it to the world via sparql endpoint or
oai-pmh.

and party B:
- harvests information via sparql enpoint or oai-pmh and produces a public
representation of that content that links back to party A.

If the link back is the cooluri that resolves to a page via a 303 redirect
and content negotiation, web spiders etc will not be able to follow that
inbound link.

This means that some of the advantage of being machine harvest-able is
lost.  Sure your content is indexed, but the "authority" that comes from
other people/systems citing your content, reusing your content is greatly
diluted.

Cheers,

Mark


On Sat, Jul 19, 2014 at 1:52 AM, Michael Brunnbauer <brunni@netestate.de>
wrote:

>
> Hello Mark,
>
> I cannot remember this important topic coming up earlier - which is a bit
> disturbing.
>
> The problem would be migitated by people using the URI they see for
> linking.
>
> Why not use the HTML URLs in the HTML pages for internal page rank flow?
>
> How can URIs from sparql endpoints or OAI-PMH contribute to page rank?
>
> A real problem would be RDFa where href also sets the object of a triple.
>
> Regards,
>
> Michael Brunnbauer
>
> On Fri, Jul 18, 2014 at 10:05:17PM +1000, Mark Fallu wrote:
> > If the links we present to the outside world for harvesting eg. via
> sparql
> > endpoint, OAI-PMH or open social widget etc is the canonical "individual"
> > URI, clients will be able to get to the "display" url, but the google
> page
> > rank that would normally flow from these external links will not.
>
>
>
> >
> > The specification of a 303 redirect describes it as:
> > http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
> >
> > > "The response to the request can be found under a different URI and
> SHOULD
> > > be retrieved using a GET method on that resource. This method exists
> > > primarily to allow the output of a POST-activated script to redirect
> the
> > > user agent to a selected resource. *The new URI is not a substitute
> > > reference for the originally requested resource*. The 303 response MUST
> > > NOT be cached, but the response to the second (redirected) request
> might be
> > > cacheable.
> > >
> >
> >
> > The different URI SHOULD be given by the Location field in the response.
> > > Unless the request method was HEAD, the entity of the response SHOULD
> > > contain a short hypertext note with a hyperlink to the new URI(s)."
> >
> >
> > Google correctly implements the specification and does not assign the
> page
> > rank of the "individual" URI to the "display" URL as it is "*not a
> > substitute reference for the originally requested resource".*
> >
> >  The same is true of internal links, a high page rank home page will not
> > pass page rank on to "display" urls if the pathway to those urls is via
> > "individual" uri links.
> >
> > I am not sure what the solution is here as it seems the realms of SEO and
> > the conventions of the web they are built on are not a good fit for
> > semantic web best practice.
> >
> > The most minimal compromise I can think of is to move away from the use
> of
> > a 303 redirect to a redirect that conserves the flow of google page rank.
> >
> >    - "302 Found" redirect is the recommended replacement for 303 for
> >    clients that do not support HTTP 1.1  and it does allow a certain
> amount of
> >    google page rank to flow.
> >    - "301 Moved Permanently" is a poor fit for the Cool URI pattern, but
> >    passes on the full page rank of the links.
> >    - rewriting all URIs the URL would also work, but would break the
> >    coolURI pattern.
> >
> > The pragmatist in me feels that if we are going to make a change for the
> > purposes of SEO, it might as well be the one with best return, i.e. 301
> > redirect.
> >
> > Note: Indexing is not the problem here, content is indexed.  The issue
> > relates to page rank not flowing through a 303 redirect.
> >
> > I have tested and can confirm that 303 redirects are an issue for a
> number
> > of reasons:
> >
> >    - page rank does not flow through a 303 redirect
> >    - page rank can not be assigned from a url to a uri with a
> rel=canonical
> >    tag if URI does a 303 redirect (preventing aggregation of pagerank
> from
> >    external links to URL)
> >    - URI and URL are indexed separately
> >    - rdfa schema.org representations of URIs do not translate to URL
> (ie.
> >    representation described at URL A, talking about URI B, does not get
> >    connected to representation described at URL B)
> >    - url parameters are not passed by a 303 redirect.
> >    - impact on functinality of google analytics tracking eg. traversing
> the
> >    site is seen as a series of direct page visits.
> >
> > Essentially - as far as search engines are concerned - every URL and URI
> is
> > an island, with no connections between them.  At best a URL can express a
> > rel=canonical back to it's corresponding URI, no pagerank will flow
> through
> > links.
> >
> > Any guidance you can provide would be appreciated.
> >
> > --
> >
> > o-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> > | Mark Fallu
> > | Manager, Research Data (Acting)
> > | Office for Research
> > | Bray Centre (N54) 0.10E
> > | Griffith University, Nathan Campus
> > | Queensland 4111 AUSTRALIA
> > |
> > | E-mail: m.fallu@griffith.edu.au
> > | Mobile:  04177 69778
> > | Phone:  +61 (07) 373 52069
> > o-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>
> --
> ++  Michael Brunnbauer
> ++  netEstate GmbH
> ++  Geisenhausener Straße 11a
> ++  81379 München
> ++  Tel +49 89 32 19 77 80
> ++  Fax +49 89 32 19 77 89
> ++  E-Mail brunni@netestate.de
> ++  http://www.netestate.de/
> ++
> ++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
> ++  USt-IdNr. DE221033342
> ++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
> ++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel
>



-- 

o-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
| Mark Fallu
| Manager, Research Data (Acting)
| Office for Research
| Bray Centre (N54) 0.10E
| Griffith University, Nathan Campus
| Queensland 4111 AUSTRALIA
|
| E-mail: m.fallu@griffith.edu.au
| Mobile:  04177 69778
| Phone:  +61 (07) 373 52069
o-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Received on Wednesday, 30 July 2014 05:39:08 UTC