- From: Sampo Syreeni <decoy@iki.fi>
- Date: Fri, 10 May 2013 21:32:18 +0300 (EEST)
- To: Henry Story <henry.story@bblfish.net>
- cc: Tim Berners-Lee <timbl@w3.org>, Melvin Carvalho <melvincarvalho@gmail.com>, Semantic Web <semantic-web@w3.org>
On 2013-05-09, Henry Story wrote: > And if I wanted to find out the sense of > <urn:oid:1.3.6.1.4.1.12798.1.2049.1.497> I'd have no way of knowing, > not without say crawling the whole web and then doing something like > statistical analysis on how the term is used in the documents I got > access to. When you get an opaque identifier like that, you got it from somewhere. It makes sense that that same source also tells you where to get the information you want, but does so using a mechanism which doesn't overload a single URI for two different purposes. This would solve your problem as well as, say, WebID, I'd be happy with it, and it would also work in a situation where even the "primary" definition of some term is distributed over many locations. Or, if you're working in a closed environment, you already have an out of band means of accessing (or verifying the absence of) that information. Or, maybe you just have to do with the fact that you don't know. That's why SemWeb is built on the open world assumption: you don't always know everything and you should be fine with that. Or, you could actually try to solve the underlying problem. The trouble is, if SemWeb is to work like we'd want it to, everybody has to be able to say anything about anything anywhere, using a shared vocabulary. That means that by its nature, information on e.g. TimBL will come from distributed sources. When that is the case, the mapping WebID calls sense (meaning) is necessarily one to many, and the administration of that mapping has to be distributed as well: we can't have TimBL deciding whether his profile of himself includes :seeAlso's to my feed which e.g. calls his statements or perhaps self-declared age into question. (The reputation problem being something a friend of mine is already gnawing at.) The same goes even more forcefully for things like the Moon, which people would have tremendous temptation to rename if the URN was tied to the URL of even a "primary" profile. So, if you want something approaching a Net wide closed world, you do have to (conceptually) crawl all of the SemWeb in any case; you do have to have an API which gives you everything anybody ever said about an abstract identifier denoting TimBL/Moon. It's how you implement that API that is the question. Obviously you could do what Google does and just brute force the problem by pulling/crawling the actual documents. Or go even further and just condition the semantics on everybody flooding/pushing everything they have to say to everybody -- what is a few terabytes per day between friends? But of course the most reasonable thing would be to put in some kind of optimized name resolution service, that is to say, really solve this particular instance of the distributed URN-to-URL mapping problem for once. No, that's not easy to do, but it isn't about boiling the oceans either. We do have things like DHT's, reputation systems and whatnot to handle the query load and even most of the update and access control problems inherent in what we're trying to do. It's just that somebody has to build the damn thing. I believe chipping away at the problem using interim measures such as dual function URI's just entrenches bad data modelling and name space management habits, and makes it so much more difficult to solve the real problem when the time finally arrives to add a fully resolving transport to the layer cake. > That is why building a linked data web with urn's is hopeless. > > For more on the sense/reference distinction see the WebID spec: > https://dvcs.w3.org/hg/WebID/raw-file/tip/spec/identity-respec.html#overview I'm familiar with WebID. Obviously I believe it's misguided. -- Sampo Syreeni, aka decoy - decoy@iki.fi, http://decoy.iki.fi/front +358-50-5756111, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
Received on Friday, 10 May 2013 18:33:54 UTC