Re: Should WebIDs denote people or accounts? from Kingsley Idehen on 2014-05-19 (public-webid@w3.org from May 2014)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Mon, 19 May 2014 08:04:29 -0400
To: public-webid@w3.org
Message-ID: <5379F34D.6010201@openlinksw.com>
On 5/18/14 8:31 PM, Sandro Hawke wrote:
>> How do you know that two IRIs denote the same thing without an 
>> owl:sameAs relation? Or without participation in an IFP based 
>> relation? How do you arrive a such conclusions?
>>
>> If a WebID doesn't resolve to an Identity Card (or Profile Document) 
>> comprised of owl:sameAs or IFP based relations, how can you claim 
>> coreference? You only know that two or more IRIs denote the same 
>> thing by way of discernible and comprehensible relations.
>
> You're putting the burden of proof in the wrong place.

An Identity Card holds Identity claims.

Verifying the claims in an Identity Card is handled by an ACL or Policy 
system. One that's capable of making sense of the claims and then 
applying them to ACL and Policy tests en route to determining Trust.

A WebID is like your Passport Number.

A WebID-Profile is like you Passport.

The WebID-TLS protocol is a protocol used by the Passport Issuer (this 
entity has a Trust relationship with Immigration Services).

>
> You (and the rest of of the WebID community, including me until about 
> 5 days ago) model the world in such a way that if your access-control 
> reasoner ever got hold of some forbidden knowledge (the perfectly 
> correct fact that two my WebIDs co-refer) it would do the wrong thing.

Please don't speak for us (OpenLink Software) as I know that simply 
isn't the case with our ACL engine. You see, you are making the same old 
mistakes that tend permeate these efforts. As I told you, we actually 
start our implementations from the point of vulnerability. You get to 
understand the point of vulnerability when you understand the concepts 
behind a specification.

I spent a lot of time drumming home the fact that we needed a conceptual 
guide for WebID so that we simply wouldn't end up here i.e., you come 
along and assume everyone has implemented exactly the same thing.

If we spent more time performing interoperability tests of 
implementations, others would have also come to realize these issues 
too, and factor that into their work.

As far as I know, we are the only one's performing serious WebID-TLS 
based ACLs testing against ourselves. Thus, you really need to factor 
that into your implementation assumptions re. ACLs, which for all 
intents an purposes isn't as far as I know generally standardized etc..

>
> That sounds to me like a fundamentally flawed design for an access 
> control system.

The ACL system is yet another component distinct from WebID, 
WebID-Profile Documents, and WebID-TLS. There are not the same thing, 
they are loosely coupled. You can have many different authentication 
protocols and ACL systems working with WebIDs. In fact, that's how 
things will pant out in the future.

> I don't have to show exactly how it's going to get hold of that data. 
> Rather, to show the system is reasonably secure, you have to show it's 
> vanishingly unlikely that the reasoner ever could come across that data.

You don't publish what you don't want to be manhandled. The problem is 
that all the systems today overreach without understanding the 
implications of said actions.

I don't hide my Email Address because:

1. I sign my emails
2. I have sophisticated mail filtering schemes that basically leverage 
the power of RDF.

>
>>>
>>> What you're talking about is whether a machine might be able to 
>>> figure out that truth.
>>
>> No, I am saying that you determine the truth from the relations that 
>> represent the claim.
>>
>>>
>>> If I have two different WebIDs that denote me, and you grant access 
>>> to one of them, it's true a machine might not immediately figure out 
>>> that that other one also denotes me and should be granted equal 
>>> access.  But if it ever did, it would be correct in doing so. 
>>
>> Only if it applied inference and reasoning to specific kinds of 
>> relations. It can't just jump to such conclusions. You don't do that 
>> in the real-world so what does it somehow have to be the case in the 
>> digital realm?
>>
>
> It's not out of the question someone might state the same 
> foaf:homepage for both their WebIDs, or any of a variety of other true 
> facts.

Human beings make mistakes. You can't model for eradicating Human 
mistakes. What you can do is make systems that reduce the probability of 
said mistakes. Our systems minimize the amount of personally 
identifiable information that goes into a profile document. We take an 
ultra conservative approach bearing in mind that folks make mistakes 
when they don't fully understand the implications of their actions.

>
> If they did that, and it resulted in an access violation, I'd point 
> the finger of blame at the design of the system (using WebIDs to 
> denote people), not the user who provided that true data.

A WebID-TLS based authentication service should be able to distinguish 
between a homepage and a WebID. If it can't do that, then the 
implementation is at fault, not the WebID, WebID-Profile, WebID-TLS specs.

>
>>> And I'm betting, with machines getting access to more and more data 
>>> all the time, and doing more and more reasoning with it, it would 
>>> figure that out pretty soon.
>>
>> Email Address are ample for reconciling coreferences. Thus, if an 
>> email address in the object of an appropriate relation, then 
>> coreference can be discerned and applied where relevant etc..
>>>
>>> It sounds like you're proposing building an authorization 
>>> infrastructure that relies on machines not doing exactly what we're 
>>> trying to get them to do everywhere else.  Sounds a bit like trying 
>>> to hold back a river with your hands.
>>
>> Quite the contrary, I am saying there is a method to all of this, in 
>> the context of WebID, WebID-Profile, WebID-TLS, and ACLs etc.. This 
>> items are loosely coupled and nothing we've discussed so far makes a 
>> defensible case for now catapulting a WebID from an HTTP URI that 
>> denotes an Agent to one that denotes an Account. We don't have this 
>> kind of problem at all.
>>
>
> You keep saying that, but you haven't explained how we can be assured 
> that facts stated with regard to one of my WebIDs will never end up 
> correctly -- but harmfully -- applied to one of my other WebIDs.

I have, and I repeat:

1. owl:sameAs claims are signed by way of reified statements that 
include relations that incorporate signature

2.signed claims by way of incorporation of the multiple WebIDs in Cert. 
SAN or via inlined claims using data: extension

3. not reasoning on owl:sameAs or IFP relations.

Today, I believe #3 is the norm. We support 1-3 in our products. In 
addition, we can factor the Cert. Issuer and many other factors into our 
ACL processing.

If we hadn't spent all this time on actual ACL testing, you would 
actually come to realize how we have factored these issues an more into 
our actual implementation of an RDF based ACL engine that's capable of 
working with WebID-TLS.


>
>>>
>>>>>
>>>>> To avoid that undesired fate, I think you need WebIDs to denote 
>>>>> personas.
>>>>
>>>> No, a persona is derived from the claims that coalesce around an 
>>>> identifier. A persona is a form of identification. A collection of 
>>>> RDF claims give you a persona.
>>>>
>>>>>    As I mentioned, those personas might be software agents, but 
>>>>> they are clearly not people.
>>>>
>>>> WebIDs denote Agents. An Agent could be a Person, Organization, or 
>>>> Machine (soft or hard). You can make identification oriented claims 
>>>> in a Profile Document using RDF based on a WebID.
>>>>
>>>
>>> The question is, what kind of triples are being written with WebIDs,
>>
>> None.
>>
>> A WebID is an HTTP URI that denotes an Agent.
>>
>> Basically,
>>
>> ## Turtle Start ##
>>
>> <#WebID>
>> a <#HttpURI> ;
>> <#denotes> [ a foaf:Agent ] .
>>
>> <#HttpURI>
>> a <#Identifier> .
>>
>> ## Turtle End ##
>
> Personally I don't find this kind of content useful.

I am just explaining what I understand a WebID to be i.e., an HTTP URI 
that denotes an Agent.

> I prefer to keep Turtle for showing the actual data that would be in a 
> running system.

No, not in this case, hence the example. I am drilling down to the 
foundation of the assertion. If we claim that a WebID denotes an Agent, 
then we can express that fact in Turtle or any other RDF notation.

> Like the triples which use WebIDs to guide your access control system. 
> If I added the foaf:homepage triples I mentioned, and your system did 
> OWL RL (for example) wouldn't it grant access to the wrong WebID (in 
> addition to the right one)?

See my earlier comments about reasoning which has always been controlled 
in our products. An ACL engine can't just infer coreference without any 
kind of configurable inference modality. That's an exploit that 
compromises the system, period.

BTW -- This issue has been raised and discussed over the years re. WebID 
and WebID-TLS (from the days of FOAF+SSL).

>
>>> and what happens when machines figure out all my WebIDs denote me? 
>> Now, we have a WebID-Profile document which describes what a WebID 
>> denotes. That document is comprised of claims which may or may no 
>> indicate co-reference via owl:sameAs and/or IFP based relations 
>> (e.g., foaf:mbox). None of this means a WebID denotes an Account.
>
> I'm not saying it DOES denote an account, just that it SHOULD, in 
> order to get the persona-separation that people demand.

The Persona separation is already in place. You don't seem to want to 
accept that fact. I say that because your claim is only true if we now 
conflate a WebID (Identifier) and the WebID-Profile (document) combined 
with all RDF claims as being processed as gospel. An "Account" is 
another type of thing. A "Persona" is what's discerned from the claims 
in an Identity Card or Profile Document.

I can make an ACL rule in our system that decides you are only trusted 
if your homepage is referenced in a least one blog post or a blog post 
that is associated with the tag "#WebID" or whatever. The ACL system 
processes relations expressed using RDF. It doesn't have any hard-coded 
behavior and it has the option to override certain relation semantics.

All things being equal, you will see a live online shopping system based 
on WebID, WebID-Profile, WebID-TLS, and our ACL system. I would be happy 
to see you break it.

>
> It seems clear to me that using WebIDs to denote people is an actively 
> dangerous and harmful design.

Using an HTTP URI to denote an Agent is an actively dangerous and 
harmful design?

> Either it should be fixed or WebIDs should be scrapped.    Or, of 
> course, you can show how I'm wrong.

How can I show you that you are wrong when you don't seem to be willing 
to make an actual example i.e.., negate an existing WebID-TLS based 
system by getting access to a protected resource. Would you be ready to 
try something as practical as that, bearing in mind current ACL systems 
aren't even supposed to support owl:sameAs and IFP relations based 
reasoning, by default.


>
>>
>> The fact that tools can figure out that an IFP based relation with a 
>> mailto: scheme URI as it object is a way to triangulate coreference 
>> still has no bearing on the case for a WebID denoting an Account.
>>
>>> Are you really being careful with every triple you write using 
>>> WebIDs to make sure it will still be exactly what you want to say 
>>> when a reasoner adds more triples just like it using my other WebIDs?
>>
>> Absolutely!!
>>
>> Even when dealing with owl:sameAs, we implement a verifier that won't 
>> even invoke an reasoning and inference if those statements are signed 
>> by the WebID-Profile document author. Or if those claims aren't part 
>> of the certificate (e.g., multiple WebIDs in SAN or using the Data 
>> extension to embed Turtle Notation based RDF claims in the certificate).
>>
>>>
>>> It sounds to me like you are not.   It sounds to me like you're just 
>>> assuming that certain valid inferences will never be made.
>>
>> Of course not, as per comment above.
>>
>
> You're saying the inferences will never be made because the reasoners 
> will never get hold of the data that would support the conclusion that 
> both my WebIDs denote the same person?

I am saying even if it did, subject to modality, it wouldn't necessary 
perform the inference and reasoning in question (i.e., the claims 
expressed in the RDF statements it receives). To us good design includes 
understanding that more often than not, stuff actually goes wrong. I am 
enthusiastic about open standards but very pessimistic in regards to  
actual design and code implementation.

>   I don't think system should ever be built on assumptions like that.  
> It's not just insecure, but it forces us to carefully limit the flow 
> of information between systems which trust each other and operate on 
> behalf of the same persona.

Again, that isn't what I am saying. I am saying: claims that are 
*logically truthful* aren't *necessarily factual* in the context of an 
ACL system. An ACL system operator should be the final arbiter as to 
what's factual. Thus, owl:sameAs and IFP relations aren't gospel, they 
too are claims which may or may not have sway in regards to actual Trust 
determination.

I believe in "context fluidity" and "context lenses" above all else. As 
far as I know, that's how the real-world tends to work too.

>
>>>
-- 

Regards,

Kingsley Idehen 
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Attachments

application/pkcs7-signature attachment: S/MIME Cryptographic Signature
Received on Monday, 19 May 2014 12:04:53 UTC