Measuring usability (was Re: Responses to "Draft of charter for NextWebOnt (Proposed) Working Group") from Bijan Parsia on 2007-01-16 (public-owl-dev@w3.org from January to March 2007)

From: Bijan Parsia <bparsia@cs.man.ac.uk>
Date: Tue, 16 Jan 2007 12:41:07 +0000
To: Phillip Lord <phillip.lord@newcastle.ac.uk>
Cc: public-owl-dev@w3.org
Message-Id: <681072A4-57E3-4AF0-B192-0CCF75E6CC9F@cs.man.ac.uk>
On 15 Jan 2007, at 11:44, Phillip Lord wrote:
[snip]
>   JH> if the WG wants my opinion, they should remove that topic from
>   JH> the scope - but if that's not viaable, then expect that there
>   JH> will be those of us who insist that the WG pay attention to
>   JH> things that are not "measurable" and require us trusting
>   JH> people's instincts and experiences, which makes many formalists
>   JH> very nervous.
>
> Usability is just as measurable as tractability, I would say.

I agree that Uli overstated the case a bit, but I think this  
overstates things the other way.

Usability is more *difficult* to measure than worst case complexity  
because it is highly multidimensional, and typically requires  
experiments. Usability is sensitive to, among many other things:  
task, user, and support environment (tools, books, accessible gurus).  
Even when one has good data it can be difficult to interpret and even  
more difficult to generalize.

This doesn't mean that usability considerations can't be brought into  
play, but that one has to be very careful about the sorts of claims  
one makes and with what degree of confidence. *Language* usability is  
a particularly tough nut to achieve consensus on.

(Of course, the connection between worst case complexity and  
effective scalability is also highly sensitive to particular  
implementations and implementation techniques as well as problem,  
which, itself, is sensitive to application area, users, etc. Worst  
case complexity is just some data, and it needs interpretation.)

> As far
> as I understand it, the complexity of solving a DL is determined by
> it's expressivity. But as a user of DLs I don't actually care about
> the complexity, rather how fast the reasoner runs which is just not
> the same thing

If you look at the tractable fragments document:
	<http://owl1_1.cs.manchester.ac.uk/tractable.html>

You'll see that it's not merely a presentation of the worst case  
complexity of various DL. (Compare with the description logic  
complexity navigator:
	<http://www.cs.man.ac.uk/~ezolin/logic/complexity.html>)

It selects several DLs that seem useful and usable and have both a  
complexity that suggests that certain implementation techniques will  
be both straightforward and effective. E.g., DL Lite can be realized  
*on top* of a relational database system without heroics. This is  
interesting, I think, because it *suggests* that a relational  
database vendor will be able to provide a *robust* DL Lite  
implementation *as a part* of their relational system and thus make  
use of all the implementation effort they put into it as well as  
adapting the middleware. *If* DL Lite is expressive enough for a  
sizable class of uses, then this is a big win.

So, I think when evaluating usabilty of a species, one has to  
evaluate the probability of good tool support. The TF document  
provides *some* (non-conclusive) information about that.

BTW, I don't think any "theorists", at least, any one involved in  
these debates, conflate worse case complexity with effectively   
implementability.

> (or if they work at all, which was quite a while for
> OWL-DL).
>
> There are a number of ways in which usability could be tested. We
> could gather up a set of curated ontologies and find out which
> constructs are used most often; naive users could be surveyed with
> descriptions of the constructs and questions about the implications,
> to see which attract most frequent confusion.

Data like this is often useful (though I'm not sure that focusing on  
naive users, for example, is the best choice; how naive is naive? for  
some classes of users you want to avoid the notion of "constructs"  
altogether and teachability is only one aspect of usability).

> Of course, these measures will not be perfect. They will fail to give
> an exact answer about what is good and what is not good expressivity
> for the users; but, then, provable complexity of the different
> expressivities doesn't give you an exact answer as to what is going to
> run fast in practise which is what most people actually care about.

No one said that. However, it does give you a more exact answer, esp.  
with regard to implementation choices. And it's a good starting  
place. And that work (with implementation and experiments) is further  
along.

When people present comparable usability data, I'll be comparably  
impressed :) (And I've worked on surveys and user studies, fwiw, in  
this area.)

However, in both cases, if vendors (i.e., any tool builder) aren't  
willing to support the fragment, it's not really helpful to identify it.

> As for making formalists nervous, hey, well, somethings you just have
> to live with. They can take beta-blockers or something.

I don't see that this sort of language is helpful, esp. as its  
entirely gratuitous. If you look at the quote you quoted, the  
alternative to "making the formalists nervous" was "trusting people's  
instincts and experiences," *not* trusting formal usability results.  
(I *think* Jim was overstating for rhetorical effect, but I also  
think it was rather unfair to Uli's point -- after all, I think it's  
perfectly correct to say that leaving out qualified cardinality  
restrictions was a usabiilty-motived *usability* bug). I would prefer  
in these discussions that we didn't blithely work in caricatures of  
groups of people's psychological states. Am I a formalist? I've has  
as much input as anyone into these documents and I've worked on  
reasoner implementations and optimization, complexity results,  
surface syntax design, visualization, editor design and  
implementation, taught naive users, etc. etc., oh, and built  
ontologies (for real, well, for real-ish -- mostly at the conceptual  
modeling scope, e.g., OWL-S).

Designing a useful language requires marshalling a lot of data *and*  
making some educated, inconclusively supported, judgments. Often, the  
data just won't be enough to even make such a judgement (but only, at  
best, a guess), in which cases, I prefer to let what people are  
willing to materially support determine those aspects of design  
(since, such support is another critical aspect of getting something  
usable). The TF document is a *starting* place for such discussion. I  
don't see that the information there is so very scorn worthy.

Cheers,
Bijan.
Received on Tuesday, 16 January 2007 12:40:57 UTC