RE: Address Bar URI from Michael Smethurst on 2011-10-17 (public-lod@w3.org from October 2011)

From: Michael Smethurst <Michael.Smethurst@bbc.co.uk>
Date: Mon, 17 Oct 2011 06:48:40 +0100
To: "Kingsley Idehen" <kidehen@openlinksw.com>, <public-lod@w3.org>
Message-ID: <7A44633A0AA27A4A98B94B10BDF0AC3554C42A@bbcxues27.national.core.bbc.co.uk>
Hi Kingsley

I've heard you make this argument several times in the past. But I don't understand why. How does it benefit publishers to expose the representation address? How does it benefit consumers?

Ignoring linked data for a moment.... If I ask for some information (an ir uri) over HTTP what I get back back depends on what I ask for and what I choose to accept (serialisation, language). There are benefits to this:

- the same links work for everyone (or as wide a set of people as possible)
- you don't expose multiple uris to the web
- you don't split your google juice

This is just the way http works and what leads to its most important aspect: universal access to information.

(It's also won as an argument. The "one web" approach won as soon as the first product manager eagerly clicked a link in twitter and got bounced to the homepage of their "product" because different serialisations sat on different uris)

The only thing linked data adds to this is the realisation that people want to talk about things that can't be sent over http. So we end up with nir uris to give us something to talk about (in all senses :-) ) and 303s in case anyone is silly enough to request them

Agree completely that the address bar should never expose the nir uri because that doesn't make any sense. Web browsers are information browsers so should expose the uri of the information. Just not the specific serialisation / language

My 2p

Michael


-----Original Message-----
From: public-lod-request@w3.org on behalf of Kingsley Idehen
Sent: Sun 10/16/2011 2:41 PM
To: public-lod@w3.org
Subject: Re: Address Bar URI
 
On 10/16/11 8:50 AM, Michael Smethurst wrote:
>
> Hi Hugh
>
> Apologies for top post; blame webmail :-/
>
> (Using labels as they appear in my head; feel free to translate to 
> labels as they appear in your head)
>
> If you're publishing linked data using 303s *and* the links in your 
> html are targeting the nir uri (as per dbpedia):
>
> <a class="uri" rel="dbpedia-owl:composer" 
> xmlns:dbpedia-owl="http://dbpedia.org/ontology/" 
> href="http://dbpedia.org/resource/Simon_May"><small>dbpedia</small>:Simon_May</a>
>
> then, yes, you  are already exposing 2 sets of uris to the web and to 
> google. A google bot crawling your pages is going to see all your 
> internal links pointing to .../thing/... whilst a user of you site is 
> going to see the result of the 303 (.../page/... or whatever) in their 
> address bar. If they want to blog about your stuff chances are they'll 
> copy and paste into their post the URblah they see in their address 
> bar. So when a google bot crawls their blog it'll see links pointing 
> to .../page/... Google doesn't consolidate pagerank for the two sets 
> of uris for 303s
>
> But if you're publishing with 303s and linking internally to nir uris 
> you've already got a problem. I can just about imagine attempting to 
> convince people that it's worthwhile having a .../thing/... and 303ing 
> it to a .../page/... for the .2% of people who care about the 
> distinction / consume rdf. Attempting to convince the BBC platform 
> people that every request a user makes as they browse round the site 
> should be routed through a 303 would probably get my coffee spike with 
> arsenic. It just isn't going happen. Ever. If the advice from this 
> community is to target html links to the nir uri I think that's going 
> to cause a lot of problems for a lot of publishers...
>
> For BBC linked data stuff we deal with 3 classes of URblah:
>
> 1. representation urls (http://www.bbc.co.uk/programmes/b015v0nh.html, 
> http://www.bbc.co.uk/programmes/b015v0nh.mp, 
> http://www.bbc.co.uk/programmes/b015v0nh.json, 
> http://www.bbc.co.uk/programmes/b015v0nh.rdf etc)
>
> 2. information resource uris (http://www.bbc.co.uk/programmes/b015v0nh)
>
> 3. nir uris (http://www.bbc.co.uk/programmes/b015v0nh#programme)
>
> The first set are never exposed except as content location headers. 
> And the links inside the html all *target* the second set:
>
> <a href="/programmes/b015j0ng" typeof="po:Episode" 
> about="/programmes/b015j0ng#programme"><span class="title" 
> property="dc:title">Episode 3</span></a>
>
> The second set just conneg (with some added device detection) to the 
> first set with no redirection. These are the only uris ever really 
> exposed. And they're the main benefits of doing any of this. Because 
> we don't expose (in the address bar) the representation urls, users 
> link to and share the ir uris. And the people they share with get back 
> a representation appropriate to their needs (for device, 
> serialisation, accessibility, language...)
>
> The third set is only there to be talked about in the rdf/rdfa, for 
> people in the world (possibly confined to this list :-) ) who think 
> the distinction between nirs and irs matters. But it's never exposed 
> in the address bar
>
> So our internal (html) links point to the second set. And the address 
> bar points to the second set. So other people link to us using the 
> second set and we don't leak google juice all over our shoes. And the 
> 3rd set is there for those folks who care
>
> Comparing that to dbedia: all internal links point to the first set, 
> there's then a conneg / 303 dance and the thing that ends up in your 
> address bar is something that's half way between an ir uri and an ir 
> representation url (.../page/... or .../data/...). So definitely 
> exposing 2 sets of URblahs to the web and not exposing the most 
> important one: the information *resource* uri. Which is the most 
> important one because that's the one you want people to be able to 
> share without worrying about whether the representation is appropriate 
> to the needs of the people they're sharing with
>
> IMO anywhere were you end up with /html or .html or .rdf or .mp or .cy 
> or /page or /data exposed in the address bar is broken because the 
> representation returned should be dependent on your accept headers, 
> not the (information) resource you request. Or we're forgetting 
> everything uncle roy ever taught us. Exposing representation in the 
> address bar means your content / stuff can't be shared universally. 
> And seo is just a side effect of being able to share / link universally
>
> So at the risk of being controversial I think the dbpedia publishing 
> pattern is a bit of an anti-pattern and we shouldn't be encouraging 
> other publishers / developers to adopt it
>

In the Web's information space dimension, yes, you could say DBpedia's 
approach is an anti-pattern, that's basically another way of 
articulating what I tried to convey via the 1-3 sequence in one of my 
earlier posts. In the Web's data space dimension DBpedia's approach is 
natural and obvious. Trouble is, a majority of Web users and Developers 
are only gradually beginning to sense the aforementioned data space 
dimension.

We can make the manifestation of the Web's data space dimension 
unobtrusive if indirection is introduced properly which ultimately means:

1. Addresses (URLs) stay in the Address Bar.
2. Actual Object (Resource) Identifiers (generic de-referencable URI 
based Names) are discovered by introspection (human or machine)
3. Accessing Data Objects (Resources) by Name or Address becomes 
optional and should be driven by UX patterns.

We can never negate Name and Address disambiguation, once we are in the 
Web's data space dimension. The use of indirection to solve problems in 
computer science is as old as the subject matter itself :-)


> [SNIP]
>

> Michael
>

Kingsley
>
>
>
> -----Original Message-----
> From: Hugh Glaser [mailto:hg@ecs.soton.ac.uk]
> Sent: Sat 10/15/2011 2:43 PM
> To: Michael Smethurst
> Cc: Norman Gray; Linking Open Data; Don Cruickshank
> Subject: Re: Address Bar URI
>
> Thanks Michael.
> Very helpful to bring in the SEO perspective, even on a Friday evening.
>
> On 14 Oct 2011, at 21:28, Michael Smethurst wrote:
>
> > Have to say from a pragmatic point of view that using replaceState 
> to switch between IR and NIR (or whatever we're supposed to call them) 
> URIs feels like bad advice for most developers
> >
> > Users in older browsers are going to see (and copy and paste) one 
> set of URIs whilst users of more modern browsers are going to see (and 
> copy and paste) another
> Maybe now is not the time to do it - but always being backwards 
> compatible is not great.
> Actually, users of the old browsers are currently disallowed from 
> copying and pasting the address bar, if what they are after is the NIR 
> or whatever we call it.
> The myexperiment.org site has a real problem with this, and on a real 
> system.
> I think currently they have to accept that users do it, and then patch 
> up afterwards (by removing the .html).
> >
> > So you end up exposing two sets of URIs to the web and to Google et 
> al. Google only consolidates page rank for inbound links on 301s (and 
> not 302s or 303s) so you'd end up throwing your findability away for 
> an esoteric distinction that no-one quite understands. Or understands 
> but doesn't quite agree with :-)
> But I think we are currently exposing two sets of URIs.
> If we do the rewrite we will only be exposing one set of URIs to the 
> users.
>
> At first I thought "Oh no", we mustn't compromise SEO, and you 
> describe how rewriting the address bar does.
> But now I am afraid I don't understand why it does.
> The only change is what the user sees in the Bar - so how would that 
> affect the SEO?
> Can you elaborate on how it affects SEO please?
> I see that, for example googling '"Hugh Glaser" site:semanticweb.org' 
> gets me
> http://data.semanticweb.org/person/hugh-glaser as the top hit, and 
> seems to ignore
> http://data.semanticweb.org/person/hugh-glaser/html
> >
> > For now cross browser support for pushState and replaceState is 
> pretty shonky [1]. It's useful when product managers demand an "app 
> like experience" because you can do all the shiny ajax stuff without 
> nasty ajax #s and it all looks good on their iDevices. They don't need 
> to know that's not what most people see :-)
> >
> > With apologies for bringing up S*E*O on a Friday evening. And that 
> aside it just feels like asking people to add more complexity to 
> sidestep existing complexity that they don't understand / see the need 
> for in the first place...
> Remember it was a developer who asked me in the first place, who saw 
> it as an answer to a serious problem he has with the users' interactions.
> We should always incline to pushing just a bit more complexity onto 
> the few developers, rather than onto the many, many more users, I think.
>
> Best
> Hugh
> >
> >
> > [1] http://caniuse.com/#search=replaceState
> >
> >
> > -----Original Message-----
> > From: public-lod-request@w3.org on behalf of Hugh Glaser
> > Sent: Fri 10/14/2011 4:22 PM
> > To: Norman Gray
> > Cc: Linking Open Data; Don Cruickshank
> > Subject: Re: Address Bar URI
> >
> > I am really no expert - really, so showing my ignorance here.
> > I understand:
> >
> > JS:
> > window.history.replaceState('Object', 'Title', '/another-new-url');
> > will do it happily, but I guess HTML5 is required.
> > You can use it to change path and search strings, but not protocol 
> or domain, I understand.
> >
> >
> > On 14 Oct 2011, at 15:26, Norman Gray wrote:
> >
> > >
> > > Hugh, greetings.
> > >
> > > On 2011 Oct 14, at 13:08, Hugh Glaser wrote:
> > >
> > >> My colleague, Don Cruickshank asked me if it was good practice to 
> rewrite the URI in the Address Bar to be the NIR, rather than the IR.
> > >> I was surprised, but he tells me that it is permitted in HTML5.
> > >
> > > Can you expand on this a little?
> > >
> > > Is this some HTML5 cleverness that lets one declare in the HTML 
> what the address bar should display?  Or is it some Javascript 
> kludge^Wgadget that does it, in which case what is the sense in which 
> this is 'permitted' in HTML5 and wasn't before?
> > >
> > > All the best,
> > >
> > > Norman
> > >
> > >
> > > --
> > > Norman Gray  : http://nxg.me.uk
> > > SUPA School of Physics and Astronomy, University of Glasgow, UK
> > >
> >
> > --
> > Hugh Glaser,
> >               Web and Internet Science
> >               Electronics and Computer Science,
> >               University of Southampton,
> >               Southampton SO17 1BJ
> > Work: +44 23 8059 3670, Fax: +44 23 8059 3045
> > Mobile: +44 75 9533 4155 , Home: +44 23 8061 5652
> > http://www.ecs.soton.ac.uk/~hg/ <http://www.ecs.soton.ac.uk/%7Ehg/>
> >
> >
> >
> >
> >
> >
> > http://www.bbc.co.uk
> > This e-mail (and any attachments) is confidential and may contain 
> personal views which are not the views of the BBC unless specifically 
> stated.
> > If you have received it in error, please delete it from your system.
> > Do not use, copy or disclose the information in any way nor act in 
> reliance on it and notify the sender immediately.
> > Please note that the BBC monitors e-mails sent or received.
> > Further communication will signify your consent to this.
>
> --
> Hugh Glaser,
>               Web and Internet Science
>               Electronics and Computer Science,
>               University of Southampton,
>               Southampton SO17 1BJ
> Work: +44 23 8059 3670, Fax: +44 23 8059 3045
> Mobile: +44 75 9533 4155 , Home: +44 23 8061 5652
> http://www.ecs.soton.ac.uk/~hg/ <http://www.ecs.soton.ac.uk/%7Ehg/>
>
>
>
>
> http://www.bbc.co.uk
> This e-mail (and any attachments) is confidential and may contain 
> personal views which are not the views of the BBC unless specifically 
> stated.
> If you have received it in error, please delete it from your system.
> Do not use, copy or disclose the information in any way nor act in 
> reliance on it and notify the sender immediately.
> Please note that the BBC monitors e-mails sent or received.
> Further communication will signify your consent to this. 


-- 

Regards,

Kingsley Idehen	
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen






http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.
Received on Monday, 17 October 2011 05:50:47 UTC