RE: Address Bar URI from Michael Smethurst on 2011-10-16 (public-lod@w3.org from October 2011)

From: Michael Smethurst <Michael.Smethurst@bbc.co.uk>
Date: Sun, 16 Oct 2011 13:50:16 +0100
To: "Hugh Glaser" <hg@ecs.soton.ac.uk>
Cc: "Norman Gray" <norman@astro.gla.ac.uk>, "Linking Open Data" <public-lod@w3.org>, "Don Cruickshank" <dgc@ecs.soton.ac.uk>
Message-ID: <7A44633A0AA27A4A98B94B10BDF0AC3554C426@bbcxues27.national.core.bbc.co.uk>
Hi Hugh

Apologies for top post; blame webmail :-/

(Using labels as they appear in my head; feel free to translate to labels as they appear in your head)

If you're publishing linked data using 303s *and* the links in your html are targeting the nir uri (as per dbpedia):

<a class="uri" rel="dbpedia-owl:composer" xmlns:dbpedia-owl="http://dbpedia.org/ontology/" href="http://dbpedia.org/resource/Simon_May"><small>dbpedia</small>:Simon_May</a>

then, yes, you  are already exposing 2 sets of uris to the web and to google. A google bot crawling your pages is going to see all your internal links pointing to .../thing/... whilst a user of you site is going to see the result of the 303 (.../page/... or whatever) in their address bar. If they want to blog about your stuff chances are they'll copy and paste into their post the URblah they see in their address bar. So when a google bot crawls their blog it'll see links pointing to .../page/... Google doesn't consolidate pagerank for the two sets of uris for 303s

But if you're publishing with 303s and linking internally to nir uris you've already got a problem. I can just about imagine attempting to convince people that it's worthwhile having a .../thing/... and 303ing it to a .../page/... for the .2% of people who care about the distinction / consume rdf. Attempting to convince the BBC platform people that every request a user makes as they browse round the site should be routed through a 303 would probably get my coffee spike with arsenic. It just isn't going happen. Ever. If the advice from this community is to target html links to the nir uri I think that's going to cause a lot of problems for a lot of publishers...

For BBC linked data stuff we deal with 3 classes of URblah:

1. representation urls (http://www.bbc.co.uk/programmes/b015v0nh.html, http://www.bbc.co.uk/programmes/b015v0nh.mp, http://www.bbc.co.uk/programmes/b015v0nh.json, http://www.bbc.co.uk/programmes/b015v0nh.rdf etc)

2. information resource uris (http://www.bbc.co.uk/programmes/b015v0nh)

3. nir uris (http://www.bbc.co.uk/programmes/b015v0nh#programme)

The first set are never exposed except as content location headers. And the links inside the html all *target* the second set:

<a href="/programmes/b015j0ng" typeof="po:Episode" about="/programmes/b015j0ng#programme"><span class="title" property="dc:title">Episode 3</span></a>

The second set just conneg (with some added device detection) to the first set with no redirection. These are the only uris ever really exposed. And they're the main benefits of doing any of this. Because we don't expose (in the address bar) the representation urls, users link to and share the ir uris. And the people they share with get back a representation appropriate to their needs (for device, serialisation, accessibility, language...)

The third set is only there to be talked about in the rdf/rdfa, for people in the world (possibly confined to this list :-) ) who think the distinction between nirs and irs matters. But it's never exposed in the address bar

So our internal (html) links point to the second set. And the address bar points to the second set. So other people link to us using the second set and we don't leak google juice all over our shoes. And the 3rd set is there for those folks who care

Comparing that to dbedia: all internal links point to the first set, there's then a conneg / 303 dance and the thing that ends up in your address bar is something that's half way between an ir uri and an ir representation url (.../page/... or .../data/...). So definitely exposing 2 sets of URblahs to the web and not exposing the most important one: the information *resource* uri. Which is the most important one because that's the one you want people to be able to share without worrying about whether the representation is appropriate to the needs of the people they're sharing with

IMO anywhere were you end up with /html or .html or .rdf or .mp or .cy or /page or /data exposed in the address bar is broken because the representation returned should be dependent on your accept headers, not the (information) resource you request. Or we're forgetting everything uncle roy ever taught us. Exposing representation in the address bar means your content / stuff can't be shared universally. And seo is just a side effect of being able to share / link universally

So at the risk of being controversial I think the dbpedia publishing pattern is a bit of an anti-pattern and we shouldn't be encouraging other publishers / developers to adopt it

In the myexperient case I'd just say the same: never expose .html in the address bar and expose the ir uri instead (linking to the nir uri with about)

The whole nir / ir distinction just feels like a further level of abstraction on top of the resource / representation abstraction. And we occasionally throw REST out with the bath water...

Lots of linked data publishers using the 303 pattern seem to do the conneg and 303 as one step so:

nir uri > conneg + 303 > ir *represntation* uri

rather than:
nir uri > 303 > ir uri > conneg (without redirect) > ir representation url (+ content location header)

In answer to:
> Actually, users of the old browsers are currently disallowed from copying and pasting the address bar, if what they are after is the NIR or whatever we call it.
that's true. the nir uri is only exposed in the rdf(a). but I'd argue it's a very small minority of users who would understand the distinction / care and the ones that do will all be working with the rdf(a) anyway

And........ possibly more of a blog post than an email. Sorry :-)

Michael


-----Original Message-----
From: Hugh Glaser [mailto:hg@ecs.soton.ac.uk]
Sent: Sat 10/15/2011 2:43 PM
To: Michael Smethurst
Cc: Norman Gray; Linking Open Data; Don Cruickshank
Subject: Re: Address Bar URI
 
Thanks Michael.
Very helpful to bring in the SEO perspective, even on a Friday evening.

On 14 Oct 2011, at 21:28, Michael Smethurst wrote:

> Have to say from a pragmatic point of view that using replaceState to switch between IR and NIR (or whatever we're supposed to call them) URIs feels like bad advice for most developers
> 
> Users in older browsers are going to see (and copy and paste) one set of URIs whilst users of more modern browsers are going to see (and copy and paste) another
Maybe now is not the time to do it - but always being backwards compatible is not great.
Actually, users of the old browsers are currently disallowed from copying and pasting the address bar, if what they are after is the NIR or whatever we call it.
The myexperiment.org site has a real problem with this, and on a real system.
I think currently they have to accept that users do it, and then patch up afterwards (by removing the .html).
> 
> So you end up exposing two sets of URIs to the web and to Google et al. Google only consolidates page rank for inbound links on 301s (and not 302s or 303s) so you'd end up throwing your findability away for an esoteric distinction that no-one quite understands. Or understands but doesn't quite agree with :-)
But I think we are currently exposing two sets of URIs.
If we do the rewrite we will only be exposing one set of URIs to the users.

At first I thought "Oh no", we mustn't compromise SEO, and you describe how rewriting the address bar does.
But now I am afraid I don't understand why it does.
The only change is what the user sees in the Bar - so how would that affect the SEO?
Can you elaborate on how it affects SEO please?
I see that, for example googling '"Hugh Glaser" site:semanticweb.org' gets me
http://data.semanticweb.org/person/hugh-glaser as the top hit, and seems to ignore
http://data.semanticweb.org/person/hugh-glaser/html
> 
> For now cross browser support for pushState and replaceState is pretty shonky [1]. It's useful when product managers demand an "app like experience" because you can do all the shiny ajax stuff without nasty ajax #s and it all looks good on their iDevices. They don't need to know that's not what most people see :-)
> 
> With apologies for bringing up S*E*O on a Friday evening. And that aside it just feels like asking people to add more complexity to sidestep existing complexity that they don't understand / see the need for in the first place...
Remember it was a developer who asked me in the first place, who saw it as an answer to a serious problem he has with the users' interactions.
We should always incline to pushing just a bit more complexity onto the few developers, rather than onto the many, many more users, I think.

Best
Hugh
> 
> 
> [1] http://caniuse.com/#search=replaceState
> 
> 
> -----Original Message-----
> From: public-lod-request@w3.org on behalf of Hugh Glaser
> Sent: Fri 10/14/2011 4:22 PM
> To: Norman Gray
> Cc: Linking Open Data; Don Cruickshank
> Subject: Re: Address Bar URI
> 
> I am really no expert - really, so showing my ignorance here.
> I understand:
> 
> JS:
> window.history.replaceState('Object', 'Title', '/another-new-url');
> will do it happily, but I guess HTML5 is required.
> You can use it to change path and search strings, but not protocol or domain, I understand.
> 
> 
> On 14 Oct 2011, at 15:26, Norman Gray wrote:
> 
> >
> > Hugh, greetings.
> >
> > On 2011 Oct 14, at 13:08, Hugh Glaser wrote:
> >
> >> My colleague, Don Cruickshank asked me if it was good practice to rewrite the URI in the Address Bar to be the NIR, rather than the IR.
> >> I was surprised, but he tells me that it is permitted in HTML5.
> >
> > Can you expand on this a little?
> >
> > Is this some HTML5 cleverness that lets one declare in the HTML what the address bar should display?  Or is it some Javascript kludge^Wgadget that does it, in which case what is the sense in which this is 'permitted' in HTML5 and wasn't before?
> >
> > All the best,
> >
> > Norman
> >
> >
> > --
> > Norman Gray  :  http://nxg.me.uk
> > SUPA School of Physics and Astronomy, University of Glasgow, UK
> >
> 
> --
> Hugh Glaser, 
>               Web and Internet Science
>               Electronics and Computer Science,
>               University of Southampton,
>               Southampton SO17 1BJ
> Work: +44 23 8059 3670, Fax: +44 23 8059 3045
> Mobile: +44 75 9533 4155 , Home: +44 23 8061 5652
> http://www.ecs.soton.ac.uk/~hg/
> 
> 
> 
> 
>  
> 
> http://www.bbc.co.uk
> This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
> If you have received it in error, please delete it from your system.
> Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
> Please note that the BBC monitors e-mails sent or received.
> Further communication will signify your consent to this.

-- 
Hugh Glaser,  
              Web and Internet Science
              Electronics and Computer Science,
              University of Southampton,
              Southampton SO17 1BJ
Work: +44 23 8059 3670, Fax: +44 23 8059 3045
Mobile: +44 75 9533 4155 , Home: +44 23 8061 5652
http://www.ecs.soton.ac.uk/~hg/




http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.
Received on Sunday, 16 October 2011 12:51:59 UTC