Re: Should WebIDs denote people or accounts? from Sandro Hawke on 2014-05-19 (public-webid@w3.org from May 2014)

From: Sandro Hawke <sandro@w3.org>
Date: Mon, 19 May 2014 09:23:18 -0400
To: Kingsley Idehen <kidehen@openlinksw.com>, public-webid@w3.org
Message-ID: <537A05C6.3030205@w3.org>
Long message, to which I don't have time to write a full reply right 
now.   So I'll just respond to a few key points for now.

On 05/19/2014 08:04 AM, Kingsley Idehen wrote:
> On 5/18/14 8:31 PM, Sandro Hawke wrote:
>>> How do you know that two IRIs denote the same thing without an 
>>> owl:sameAs relation? Or without participation in an IFP based 
>>> relation? How do you arrive a such conclusions?
>>>
>>> If a WebID doesn't resolve to an Identity Card (or Profile Document) 
>>> comprised of owl:sameAs or IFP based relations, how can you claim 
>>> coreference? You only know that two or more IRIs denote the same 
>>> thing by way of discernible and comprehensible relations.
>>
>> You're putting the burden of proof in the wrong place.
>
> An Identity Card holds Identity claims.
>
> Verifying the claims in an Identity Card is handled by an ACL or 
> Policy system. One that's capable of making sense of the claims and 
> then applying them to ACL and Policy tests en route to determining Trust.
>
> A WebID is like your Passport Number.
>
> A WebID-Profile is like you Passport.
>
> The WebID-TLS protocol is a protocol used by the Passport Issuer (this 
> entity has a Trust relationship with Immigration Services).
>
>>
>> You (and the rest of of the WebID community, including me until about 
>> 5 days ago) model the world in such a way that if your access-control 
>> reasoner ever got hold of some forbidden knowledge (the perfectly 
>> correct fact that two my WebIDs co-refer) it would do the wrong thing.
>
> Please don't speak for us (OpenLink Software) as I know that simply 
> isn't the case with our ACL engine. You see, you are making the same 
> old mistakes that tend permeate these efforts. As I told you, we 
> actually start our implementations from the point of vulnerability. 
> You get to understand the point of vulnerability when you understand 
> the concepts behind a specification.
>
> I spent a lot of time drumming home the fact that we needed a 
> conceptual guide for WebID so that we simply wouldn't end up here 
> i.e., you come along and assume everyone has implemented exactly the 
> same thing.
>
> If we spent more time performing interoperability tests of 
> implementations, others would have also come to realize these issues 
> too, and factor that into their work.
>
> As far as I know, we are the only one's performing serious WebID-TLS 
> based ACLs testing against ourselves. Thus, you really need to factor 
> that into your implementation assumptions re. ACLs, which for all 
> intents an purposes isn't as far as I know generally standardized etc..
>

I'm sorry for suggesting that OpenLink's software was in any way 
insecure or poorly designed.    Knowing you and your company I'm 
confident that's not the case.   I was being careless in my argument, 
conflating two different system designs (as explained below).   I'll try 
to be much more careful in the future.

>>
>> That sounds to me like a fundamentally flawed design for an access 
>> control system.
>
> The ACL system is yet another component distinct from WebID, 
> WebID-Profile Documents, and WebID-TLS. There are not the same thing, 
> they are loosely coupled. You can have many different authentication 
> protocols and ACL systems working with WebIDs. In fact, that's how 
> things will pant out in the future.
>
>> I don't have to show exactly how it's going to get hold of that data. 
>> Rather, to show the system is reasonably secure, you have to show 
>> it's vanishingly unlikely that the reasoner ever could come across 
>> that data.
>
> You don't publish what you don't want to be manhandled. The problem is 
> that all the systems today overreach without understanding the 
> implications of said actions.
>
> I don't hide my Email Address because:
>
> 1. I sign my emails
> 2. I have sophisticated mail filtering schemes that basically leverage 
> the power of RDF.
>
>>
>>>>
>>>> What you're talking about is whether a machine might be able to 
>>>> figure out that truth.
>>>
>>> No, I am saying that you determine the truth from the relations that 
>>> represent the claim.
>>>
>>>>
>>>> If I have two different WebIDs that denote me, and you grant access 
>>>> to one of them, it's true a machine might not immediately figure 
>>>> out that that other one also denotes me and should be granted equal 
>>>> access.  But if it ever did, it would be correct in doing so. 
>>>
>>> Only if it applied inference and reasoning to specific kinds of 
>>> relations. It can't just jump to such conclusions. You don't do that 
>>> in the real-world so what does it somehow have to be the case in the 
>>> digital realm?
>>>
>>
>> It's not out of the question someone might state the same 
>> foaf:homepage for both their WebIDs, or any of a variety of other 
>> true facts.
>
> Human beings make mistakes. You can't model for eradicating Human 
> mistakes. What you can do is make systems that reduce the probability 
> of said mistakes. Our systems minimize the amount of personally 
> identifiable information that goes into a profile document. We take an 
> ultra conservative approach bearing in mind that folks make mistakes 
> when they don't fully understand the implications of their actions.
>
>>
>> If they did that, and it resulted in an access violation, I'd point 
>> the finger of blame at the design of the system (using WebIDs to 
>> denote people), not the user who provided that true data.
>
> A WebID-TLS based authentication service should be able to distinguish 
> between a homepage and a WebID. If it can't do that, then the 
> implementation is at fault, not the WebID, WebID-Profile, WebID-TLS 
> specs.
>
>>
>>>> And I'm betting, with machines getting access to more and more data 
>>>> all the time, and doing more and more reasoning with it, it would 
>>>> figure that out pretty soon.
>>>
>>> Email Address are ample for reconciling coreferences. Thus, if an 
>>> email address in the object of an appropriate relation, then 
>>> coreference can be discerned and applied where relevant etc..
>>>>
>>>> It sounds like you're proposing building an authorization 
>>>> infrastructure that relies on machines not doing exactly what we're 
>>>> trying to get them to do everywhere else.  Sounds a bit like trying 
>>>> to hold back a river with your hands.
>>>
>>> Quite the contrary, I am saying there is a method to all of this, in 
>>> the context of WebID, WebID-Profile, WebID-TLS, and ACLs etc.. This 
>>> items are loosely coupled and nothing we've discussed so far makes a 
>>> defensible case for now catapulting a WebID from an HTTP URI that 
>>> denotes an Agent to one that denotes an Account. We don't have this 
>>> kind of problem at all.
>>>
>>
>> You keep saying that, but you haven't explained how we can be assured 
>> that facts stated with regard to one of my WebIDs will never end up 
>> correctly -- but harmfully -- applied to one of my other WebIDs.
>
> I have, and I repeat:
>
> 1. owl:sameAs claims are signed by way of reified statements that 
> include relations that incorporate signature
>
> 2.signed claims by way of incorporation of the multiple WebIDs in 
> Cert. SAN or via inlined claims using data: extension
>
> 3. not reasoning on owl:sameAs or IFP relations.
>
> Today, I believe #3 is the norm. We support 1-3 in our products. In 
> addition, we can factor the Cert. Issuer and many other factors into 
> our ACL processing.
>
> If we hadn't spent all this time on actual ACL testing, you would 
> actually come to realize how we have factored these issues an more 
> into our actual implementation of an RDF based ACL engine that's 
> capable of working with WebID-TLS.
>
>
>>
>>>>
>>>>>>
>>>>>> To avoid that undesired fate, I think you need WebIDs to denote 
>>>>>> personas.
>>>>>
>>>>> No, a persona is derived from the claims that coalesce around an 
>>>>> identifier. A persona is a form of identification. A collection of 
>>>>> RDF claims give you a persona.
>>>>>
>>>>>>    As I mentioned, those personas might be software agents, but 
>>>>>> they are clearly not people.
>>>>>
>>>>> WebIDs denote Agents. An Agent could be a Person, Organization, or 
>>>>> Machine (soft or hard). You can make identification oriented 
>>>>> claims in a Profile Document using RDF based on a WebID.
>>>>>
>>>>
>>>> The question is, what kind of triples are being written with WebIDs,
>>>
>>> None.
>>>
>>> A WebID is an HTTP URI that denotes an Agent.
>>>
>>> Basically,
>>>
>>> ## Turtle Start ##
>>>
>>> <#WebID>
>>> a <#HttpURI> ;
>>> <#denotes> [ a foaf:Agent ] .
>>>
>>> <#HttpURI>
>>> a <#Identifier> .
>>>
>>> ## Turtle End ##
>>
>> Personally I don't find this kind of content useful.
>
> I am just explaining what I understand a WebID to be i.e., an HTTP URI 
> that denotes an Agent.
>
>> I prefer to keep Turtle for showing the actual data that would be in 
>> a running system.
>
> No, not in this case, hence the example. I am drilling down to the 
> foundation of the assertion. If we claim that a WebID denotes an 
> Agent, then we can express that fact in Turtle or any other RDF notation.
>
>> Like the triples which use WebIDs to guide your access control 
>> system. If I added the foaf:homepage triples I mentioned, and your 
>> system did OWL RL (for example) wouldn't it grant access to the wrong 
>> WebID (in addition to the right one)?
>
> See my earlier comments about reasoning which has always been 
> controlled in our products. An ACL engine can't just infer coreference 
> without any kind of configurable inference modality. That's an exploit 
> that compromises the system, period.
>

This is the core of the issue, and it may just be a point on which we 
have to agree to disagree.

I think systems should be designed so that giving them more correct 
information will do no harm other than possibly cause performance problems.

I might be able to make a principled argument for this preference of 
mine, but it's kind of a separate issue.

> BTW -- This issue has been raised and discussed over the years re. 
> WebID and WebID-TLS (from the days of FOAF+SSL).
>
>>
>>>> and what happens when machines figure out all my WebIDs denote me? 
>>> Now, we have a WebID-Profile document which describes what a WebID 
>>> denotes. That document is comprised of claims which may or may no 
>>> indicate co-reference via owl:sameAs and/or IFP based relations 
>>> (e.g., foaf:mbox). None of this means a WebID denotes an Account.
>>
>> I'm not saying it DOES denote an account, just that it SHOULD, in 
>> order to get the persona-separation that people demand.
>
> The Persona separation is already in place. You don't seem to want to 
> accept that fact. I say that because your claim is only true if we now 
> conflate a WebID (Identifier) and the WebID-Profile (document) 
> combined with all RDF claims as being processed as gospel. An 
> "Account" is another type of thing. A "Persona" is what's discerned 
> from the claims in an Identity Card or Profile Document.
>
> I can make an ACL rule in our system that decides you are only trusted 
> if your homepage is referenced in a least one blog post or a blog post 
> that is associated with the tag "#WebID" or whatever. The ACL system 
> processes relations expressed using RDF. It doesn't have any 
> hard-coded behavior and it has the option to override certain relation 
> semantics.
>
> All things being equal, you will see a live online shopping system 
> based on WebID, WebID-Profile, WebID-TLS, and our ACL system. I would 
> be happy to see you break it.
>
>>
>> It seems clear to me that using WebIDs to denote people is an 
>> actively dangerous and harmful design.
>
> Using an HTTP URI to denote an Agent is an actively dangerous and 
> harmful design?
>
>> Either it should be fixed or WebIDs should be scrapped.    Or, of 
>> course, you can show how I'm wrong.
>
> How can I show you that you are wrong when you don't seem to be 
> willing to make an actual example i.e.., negate an existing WebID-TLS 
> based system by getting access to a protected resource. Would you be 
> ready to try something as practical as that, bearing in mind current 
> ACL systems aren't even supposed to support owl:sameAs and IFP 
> relations based reasoning, by default.
>
>
>>
>>>
>>> The fact that tools can figure out that an IFP based relation with a 
>>> mailto: scheme URI as it object is a way to triangulate coreference 
>>> still has no bearing on the case for a WebID denoting an Account.
>>>
>>>> Are you really being careful with every triple you write using 
>>>> WebIDs to make sure it will still be exactly what you want to say 
>>>> when a reasoner adds more triples just like it using my other WebIDs?
>>>
>>> Absolutely!!
>>>
>>> Even when dealing with owl:sameAs, we implement a verifier that 
>>> won't even invoke an reasoning and inference if those statements are 
>>> signed by the WebID-Profile document author. Or if those claims 
>>> aren't part of the certificate (e.g., multiple WebIDs in SAN or 
>>> using the Data extension to embed Turtle Notation based RDF claims 
>>> in the certificate).
>>>
>>>>
>>>> It sounds to me like you are not.   It sounds to me like you're 
>>>> just assuming that certain valid inferences will never be made.
>>>
>>> Of course not, as per comment above.
>>>
>>
>> You're saying the inferences will never be made because the reasoners 
>> will never get hold of the data that would support the conclusion 
>> that both my WebIDs denote the same person?
>
> I am saying even if it did, subject to modality, it wouldn't necessary 
> perform the inference and reasoning in question (i.e., the claims 
> expressed in the RDF statements it receives). To us good design 
> includes understanding that more often than not, stuff actually goes 
> wrong. I am enthusiastic about open standards but very pessimistic in 
> regards to  actual design and code implementation.
>
>>   I don't think system should ever be built on assumptions like 
>> that.  It's not just insecure, but it forces us to carefully limit 
>> the flow of information between systems which trust each other and 
>> operate on behalf of the same persona.
>
> Again, that isn't what I am saying. I am saying: claims that are 
> *logically truthful* aren't *necessarily factual* in the context of an 
> ACL system. 

To my understanding, "logically truthful" and "necessarily factual" are 
synonyms.

What I think you're saying is: some claims that are logically truthful 
will never be present in the ACL system's knowledge base.

> An ACL system operator should be the final arbiter as to what's 
> factual. Thus, owl:sameAs and IFP relations aren't gospel, they too 
> are claims which may or may not have sway in regards to actual Trust 
> determination.
>

In my view, a system should always "trust" information that it knows to 
be "true".

I'm uncomfortable with a design where a system has to consider some 
triples to be untrusted, even though they are true.  I think that's what 
you're proposing.

I know systems do have to sometimes disregard triples they know to be 
true for performance reasons, but I'd rather keep that to being just 
about performance, not about security.


> I believe in "context fluidity" and "context lenses" above all else. 
> As far as I know, that's how the real-world tends to work too.
>

Combining information from different contexts is very hard, and one of 
the great things about RDF is you don't have to do it nearly as much.

So, to summarize where I think we are:

* We agree it's important to have the functionality where a person can 
login different ways and get access to different things
* You say those are different WebIDs and different Personas; I say those 
are different Accounts
* You keep them distinct by limiting inference
* I think it's better to keep them distinct by having them be distinct 
resources in the RDF modelling

        -- Sandro

>>
>>>>
Received on Monday, 19 May 2014 13:23:27 UTC