Re: non opaque primary topics

On 2013-05-09, Henry Story wrote:

> And if I wanted to find out the sense of 
> <urn:oid:1.3.6.1.4.1.12798.1.2049.1.497> I'd have no way of knowing, 
> not without say crawling the whole web and then doing something like 
> statistical analysis on how the term is used in the documents I got 
> access to.

When you get an opaque identifier like that, you got it from somewhere. 
It makes sense that that same source also tells you where to get the 
information you want, but does so using a mechanism which doesn't 
overload a single URI for two different purposes. This would solve your 
problem as well as, say, WebID, I'd be happy with it, and it would also 
work in a situation where even the "primary" definition of some term is 
distributed over many locations.

Or, if you're working in a closed environment, you already have an out 
of band means of accessing (or verifying the absence of) that 
information.

Or, maybe you just have to do with the fact that you don't know. That's 
why SemWeb is built on the open world assumption: you don't always know 
everything and you should be fine with that.

Or, you could actually try to solve the underlying problem. The trouble 
is, if SemWeb is to work like we'd want it to, everybody has to be able 
to say anything about anything anywhere, using a shared vocabulary. That 
means that by its nature, information on e.g. TimBL will come from 
distributed sources. When that is the case, the mapping WebID calls 
sense (meaning) is necessarily one to many, and the administration of 
that mapping has to be distributed as well: we can't have TimBL deciding 
whether his profile of himself includes :seeAlso's to my feed which e.g. 
calls his statements or perhaps self-declared age into question. (The 
reputation problem being something a friend of mine is already gnawing 
at.) The same goes even more forcefully for things like the Moon, which 
people would have tremendous temptation to rename if the URN was tied to 
the URL of even a "primary" profile.

So, if you want something approaching a Net wide closed world, you do 
have to (conceptually) crawl all of the SemWeb in any case; you do have 
to have an API which gives you everything anybody ever said about an 
abstract identifier denoting TimBL/Moon. It's how you implement that API 
that is the question. Obviously you could do what Google does and just 
brute force the problem by pulling/crawling the actual documents. Or go 
even further and just condition the semantics on everybody 
flooding/pushing everything they have to say to everybody -- what is a 
few terabytes per day between friends? But of course the most reasonable 
thing would be to put in some kind of optimized name resolution service, 
that is to say, really solve this particular instance of the distributed 
URN-to-URL mapping problem for once.

No, that's not easy to do, but it isn't about boiling the oceans either. 
We do have things like DHT's, reputation systems and whatnot to handle 
the query load and even most of the update and access control problems 
inherent in what we're trying to do. It's just that somebody has to 
build the damn thing. I believe chipping away at the problem using 
interim measures such as dual function URI's just entrenches bad data 
modelling and name space management habits, and makes it so much more 
difficult to solve the real problem when the time finally arrives to add 
a fully resolving transport to the layer cake.

> That is why building a linked data web with urn's is hopeless.
>
> For more on the sense/reference distinction see the WebID spec:
> https://dvcs.w3.org/hg/WebID/raw-file/tip/spec/identity-respec.html#overview

I'm familiar with WebID. Obviously I believe it's misguided.
-- 
Sampo Syreeni, aka decoy - decoy@iki.fi, http://decoy.iki.fi/front
+358-50-5756111, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2

Received on Friday, 10 May 2013 18:33:54 UTC