- From: john.walker <john.walker@semaku.com>
- Date: Wed, 23 Jul 2014 16:50:15 +0200 (CEST)
- To: Michael Smethurst <michael.smethurst@bbc.co.uk>
- Cc: "public-lod@w3.org" <public-lod@w3.org>
- Message-ID: <1243004525.980714.1406127015892.open-xchange@oxweb03.eigbox.net>
Hi Michael, Hope the laptop is ok :) So I can think of your 'slash' NIR URI as something similar to a URN: http://www.bbc.co.uk/programmes/b006mw1h/thing It doesn't do much on it's own and *just* acts as an identifier. Using HTTP it can be resolved to a URL via the 303, kind of similar to a URN resolver. Could you explain what you mean by "conneg penalty"? I've set up an application working with 303s and, although I don't consider myself mad, it does add an extra request to every click the user does. Getting the 303 response takes 20 - 25 ms on average, so it's not a big issue in this case (internal company usage). Interestingly enough I just checked a random shortened link off Twitter and it went through no less than 5 HTTP 301/302 redirects (500 ms in total) before getting the HTML. Taking that into consideration a single 303 is not too bad! Regards, John Walker > On July 23, 2014 at 3:55 PM Michael Smethurst <michael.smethurst@bbc.co.uk> > wrote: > > > Oops, dropped laptop :-/ > > Continues.... > > On 23/07/2014 14:50, "Michael Smethurst" <michael.smethurst@bbc.co.uk> > wrote: > > >Hi Bill > > > >Bit of a difficult question to answer because the reality is probably > >still quite disjointed. Various parts of bbc.co.uk: > >- serve linked data > >- store data as rdf (in a triple store) > >- consume (to some extent) linked data > > > >But nowhere are all those things true in one place. So /programmes > >publishes linked data but the backend is a relational database, whereas > >things like sport / olympics are stored as linked data but don't publish > > > >So the 2 parts aren't really coupled > > > >I do half remember lots of conversations about hashes v slashes for > >/programmes and /music but the sites are designed to be quite granular > >(one thing per uri; one uri per thing) so we weren't really dealing with > >lots of things in a document > > > >The linked data platform (our triple store) does use # uris like: > http://www.bbc.co.uk/things/794274f1-d7ea-4ad2-9b36-c46ed55da9bd#id > > > But I'm not best placed to know about the interfaces and queries onto this > and why they chose hashes and not slashes. I'll ask around unless those > people are already on this list... > > Not much help > Sorry > michael > > > >On 23/07/2014 14:19, "Bill Roberts" <bill@swirrl.com> wrote: > > > >>Hi Michael > >> > >>We've tended to use slash URIs where possible, because have found it more > >>convenient when doing URI dereferencing from a triple-store backed site - > >>in which case we essentially do a DESCRIBE on the relevant URI. > >>(So we do 303ing for non-information resources, though in practice in a > >>lot of our applications, the great majority of content is statistical > >>data, which we treat as information resources and respond with 200). > >> > >>How do you organise your data and generation of URI dereferencing > >>responses with hash based URIs? I can see a variety of ways to do it, > >>but I'd be interested to know what you have found most > >>efficient/convenient at the BBC - essentially dealing with the fact that > >>the server doesn't know about what comes after the # > >> > >> > >>Thanks > >> > >>Bill > >> > >>On 23 Jul 2014, at 13:52, Michael Smethurst <michael.smethurst@bbc.co.uk> > >>wrote: > >> > >>> Hello > >>> > >>> (Pretty sure I've made this comment before so please forgive any signs > >>>of > >>> premature senility) > >>> > >>> I think this may be an unfortunate side effect of the conflation of the > >>> 303 ("I can't send that") pattern with the content negotiation ("what > >>> flavour would you like") pattern > >>> > >>> Lots of linked data applications (like dbpedia) seem to couple the two > >>> things together. So you have a "individual" uri which, when you attempt > >>>to > >>> dereference does a 303 *and* conneg in one step to the "display" uri: > >>> /resource > 303+conneg > /data > >>> or > >>> /resource > 303+conneg > /page > >>> > >>> > >>> Many other linked data sites seem to have followed this pattern but it > >>> does seem, to my eyes, broke > >>> > >>> At the BBC we have 3 flavours of uri. I'm not sure if these are the > >>> appropriate / best labels but: > >>> - the non-information resource uri. The uri that refers to the real > >>>world > >>> physical / metaphysical thing > >>> - the generic information resource uri that identifies the document but > >>> not any specific representation of the document > >>> - the representation uri (the html or json or rdf-xml etc) > >>> > >>> We tend to use hashes rather than slashes like > >>> http://www.bbc.co.uk/programmes/b006mw1h#programme > >>> > >>> > >>> But pretending we use slashes for a minute... > >>> > >>> If you requested: > >>> http://www.bbc.co.uk/programmes/b006mw1h/thing > >>> > >>> > >>> You'd get a 303 redirect to the generic document / information resource > >>> uri: > >>> http://www.bbc.co.uk/programmes/b006mw1h > >>> > >>> > >>> Which would then conneg to the appropriate representation which would > >>> still be served from: > >>> http://www.bbc.co.uk/programmes/b006mw1h > >>> > >>> With a content location header of > >>> http://www.bbc.co.uk/programmes/b006mw1h.rdf > >>> > >>> For example > >>> > >>> Whilst the rdf refers to the non-information resource uri when making > >>> assertions about the "thing" this uri is not used elsewhere. All links > >>>in > >>> the html point to the generic document uri not to the non-information > >>> resource uri > >>> > >>> So crawlers like google just follow links from information resource to > >>> information resource and never have to encounter 303s > >>> > >>> Picking up a conneg penalty for every request isn't without problems > >>> (particularly given CDN serving) but picking up a 303 penalty for every > >>> request would be madness and not something we'd ever have been able to > >>> implement > >>> > >>> I do think the dbpedia conflation of 303 with conneg is an unhelpful > >>> anti-pattern that people shouldn't be encouraged to follow. The conneg > >>> part is just REST; "semantics" add the 303 onto that but they're not > >>>doing > >>> the same thing > >>> > >>> Separating 303 from conneg still gives you "thing" vs document > >>>separation, > >>> still maintains cool uris and doesn't kill your servers > >>> > >>> And we've never had a problem with seo > >>> > >>> Hth > >>> michael > >>> > >>> > >>> > >>> > >>> On 18/07/2014 16:52, "Michael Brunnbauer" <brunni@netestate.de> wrote: > >>> > >>>> > >>>> Hello Mark, > >>>> > >>>> I cannot remember this important topic coming up earlier - which is a > >>>>bit > >>>> disturbing. > >>>> > >>>> The problem would be migitated by people using the URI they see for > >>>> linking. > >>>> > >>>> Why not use the HTML URLs in the HTML pages for internal page rank > >>>>flow? > >>>> > >>>> How can URIs from sparql endpoints or OAI-PMH contribute to page rank? > >>>> > >>>> A real problem would be RDFa where href also sets the object of a > >>>>triple. > >>>> > >>>> Regards, > >>>> > >>>> Michael Brunnbauer > >>>> > >>>> On Fri, Jul 18, 2014 at 10:05:17PM +1000, Mark Fallu wrote: > >>>>> If the links we present to the outside world for harvesting eg. via > >>>>> sparql > >>>>> endpoint, OAI-PMH or open social widget etc is the canonical > >>>>> "individual" > >>>>> URI, clients will be able to get to the "display" url, but the google > >>>>> page > >>>>> rank that would normally flow from these external links will not. > >>>> > >>>> > >>>> > >>>>> > >>>>> The specification of a 303 redirect describes it as: > >>>>> http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html > >>>>> > >>>>>> "The response to the request can be found under a different URI and > >>>>> SHOULD > >>>>>> be retrieved using a GET method on that resource. This method exists > >>>>>> primarily to allow the output of a POST-activated script to redirect > >>>>> the > >>>>>> user agent to a selected resource. *The new URI is not a substitute > >>>>>> reference for the originally requested resource*. The 303 response > >>>>> MUST > >>>>>> NOT be cached, but the response to the second (redirected) request > >>>>> might be > >>>>>> cacheable. > >>>>>> > >>>>> > >>>>> > >>>>> The different URI SHOULD be given by the Location field in the > >>>>>response. > >>>>>> Unless the request method was HEAD, the entity of the response > >>>>>>SHOULD > >>>>>> contain a short hypertext note with a hyperlink to the new URI(s)." > >>>>> > >>>>> > >>>>> Google correctly implements the specification and does not assign the > >>>>> page > >>>>> rank of the "individual" URI to the "display" URL as it is "*not a > >>>>> substitute reference for the originally requested resource".* > >>>>> > >>>>> The same is true of internal links, a high page rank home page will > >>>>>not > >>>>> pass page rank on to "display" urls if the pathway to those urls is > >>>>>via > >>>>> "individual" uri links. > >>>>> > >>>>> I am not sure what the solution is here as it seems the realms of SEO > >>>>> and > >>>>> the conventions of the web they are built on are not a good fit for > >>>>> semantic web best practice. > >>>>> > >>>>> The most minimal compromise I can think of is to move away from the > >>>>>use > >>>>> of > >>>>> a 303 redirect to a redirect that conserves the flow of google page > >>>>> rank. > >>>>> > >>>>> - "302 Found" redirect is the recommended replacement for 303 for > >>>>> clients that do not support HTTP 1.1 and it does allow a certain > >>>>> amount of > >>>>> google page rank to flow. > >>>>> - "301 Moved Permanently" is a poor fit for the Cool URI pattern, > >>>>>but > >>>>> passes on the full page rank of the links. > >>>>> - rewriting all URIs the URL would also work, but would break the > >>>>> coolURI pattern. > >>>>> > >>>>> The pragmatist in me feels that if we are going to make a change for > >>>>>the > >>>>> purposes of SEO, it might as well be the one with best return, i.e. > >>>>>301 > >>>>> redirect. > >>>>> > >>>>> Note: Indexing is not the problem here, content is indexed. The > >>>>>issue > >>>>> relates to page rank not flowing through a 303 redirect. > >>>>> > >>>>> I have tested and can confirm that 303 redirects are an issue for a > >>>>> number > >>>>> of reasons: > >>>>> > >>>>> - page rank does not flow through a 303 redirect > >>>>> - page rank can not be assigned from a url to a uri with a > >>>>> rel=canonical > >>>>> tag if URI does a 303 redirect (preventing aggregation of pagerank > >>>>> from > >>>>> external links to URL) > >>>>> - URI and URL are indexed separately > >>>>> - rdfa schema.org representations of URIs do not translate to URL > >>>>> (ie. > >>>>> representation described at URL A, talking about URI B, does not > >>>>>get > >>>>> connected to representation described at URL B) > >>>>> - url parameters are not passed by a 303 redirect. > >>>>> - impact on functinality of google analytics tracking eg. > >>>>>traversing > >>>>> the > >>>>> site is seen as a series of direct page visits. > >>>>> > >>>>> Essentially - as far as search engines are concerned - every URL and > >>>>> URI is > >>>>> an island, with no connections between them. At best a URL can > >>>>>express > >>>>> a > >>>>> rel=canonical back to it's corresponding URI, no pagerank will flow > >>>>> through > >>>>> links. > >>>>> > >>>>> Any guidance you can provide would be appreciated. > >>>>> > >>>>> -- > >>>>> > >>>>> > >>>>>o-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > >>>>> | Mark Fallu > >>>>> | Manager, Research Data (Acting) > >>>>> | Office for Research > >>>>> | Bray Centre (N54) 0.10E > >>>>> | Griffith University, Nathan Campus > >>>>> | Queensland 4111 AUSTRALIA > >>>>> | > >>>>> | E-mail: m.fallu@griffith.edu.au > >>>>> | Mobile: 04177 69778 > >>>>> | Phone: +61 (07) 373 52069 > >>>>> > >>>>>o-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > >>>> > >>>> -- > >>>> ++ Michael Brunnbauer > >>>> ++ netEstate GmbH > >>>> ++ Geisenhausener Straße 11a > >>>> ++ 81379 München > >>>> ++ Tel +49 89 32 19 77 80 > >>>> ++ Fax +49 89 32 19 77 89 > >>>> ++ E-Mail brunni@netestate.de > >>>> ++ http://www.netestate.de/ > >>>> ++ > >>>> ++ Sitz: München, HRB Nr.142452 (Handelsregister B München) > >>>> ++ USt-IdNr. DE221033342 > >>>> ++ Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer > >>>> ++ Prokurist: Dipl. Kfm. (Univ.) Markus Hendel > >>> > >>> > >> > > > >
Received on Wednesday, 23 July 2014 14:50:38 UTC