Dilbert example - defining hasCubicle from Dan Brickley on 2011-10-13 (public-rdf-wg@w3.org from October 2011)

From: Dan Brickley <danbri@danbri.org>
Date: Thu, 13 Oct 2011 17:34:54 +0100
To: Pat Hayes <phayes@ihmc.us>, RDF WG <public-rdf-wg@w3.org>
Message-ID: <CAFNgM+bDqRd4FsXxzvdYTXcGO7EWdenfabU-J_Fb0oAHHg8_hw@mail.gmail.com>
Pat, (well, everyone; but triggered by Pat's comments)

You're suggesting if I read you right, that RDF shouldn't be written
in ways that make it's truth context-dependent; e.g. that a 'date of
birth' property is preferable by far to an 'age' property.

Below is a sketch of a reasonably common descriptive scenario. Could
you maybe suggest a modelling / descriptive idiom that avoids these
problems? I hope it anchors some of the
issues we've discussed in a small enough example that might be turned
into concrete decision test cases or example documentation.

Dan

-----

Theory and Practice

Consider an RDF vocabulary for describing office assignments in the cartoon
universe inhabited by Dilbert <http://en.wikipedia.org/wiki/Dilbert>.
First I describe the universe, then some ways in
which we might summarise what's going on using RDF graph descriptions.
I would love to get a sense for any
'best practice' claims here. Personally I see no single best way to
deal with this, only different and annoying tradeoffs.

So --- this is a fictional highly simplified company in which workers
each are assigned to occupy exactly one cubicle,
and in which every cubicle has at most one assigned worker. Cubicles
may also sometimes
be empty.

* Every 3 months, the Pointy-haired boss
<http://en.wikipedia.org/wiki/List_of_Dilbert_characters#Pointy-haired_boss>
  has a strategic re-organization, and re-assigns workers to cubicles.
* He does this in a memo dictated to Dogbert, who will take the boss's
vague and forgetful instructions and compare them
  to an Excel spreadsheet. This, cleaned up, eventually becomes an
emailed Word .doc sent to the all-staff@ mailing list.
  The word document is basically a table of room moves, it is headed
with a date and in bold type "EFFECTIVE
  IMMEDIATELY", usually mailed out mid-evening and read by staff the
next morning.
* In practice, employees move their stuff to the new cubicles over the
course of a few days; longer if they're
  on holiday or off sick. Phone numbers are fixed later, hopefully. As
are name badges etc.
* But generally the move takes place the day after the word file is
circulated, and at any one point, a given
  cubicle can be fairly said to have at most one official occupant worker.

So let's try to model this in RDF/RDFS/OWL.

First, we can talk about the employees. Let's make a class, 'Employee'.

In the company systems, each employee has an ID, which is 'e-' plus an
integer. Once assigned, these are
never re-assigned, even if the employee leaves or dies.

We also need to talk about the office space units, the cubes or
'Cubicles'. Let's forget for now that
the furniture is movable, and treat each Cubicle as if it lasts
forever. Maybe they are even somehow symbolic
cubicle names, and the furniture that embodies them can be moved
around to diferent office locations. But we
don't try modelling that for now.

In the company systems, each cubicle has an ID, which is 'c-' plus an
integer. Once assigned, these are
never re-assigned, even if the cubicle becomes in any sense de-activated.

Let's represent these as IRIs. Three employees, three cubicles.

 * http://example.com/e-1
 * http://example.com/e-2
 * http://example.com/e-3
 * http://example.com/c-1000
 * http://example.com/c-1001
 * http://example.com/c-1002

We can describe the names of employees. Cubicicles also have informal
names. Let's say that neither change, ever.

 * e-1 name 'Alice'
 * e-2 name 'Bob'
 * e-3 name 'Charlie'
 * c-1000 'The Einstein Suite'.
 * c-1001 'The doghouse'.
 * c-1002 'Helpdesk'.

Describing these in RDF is pretty straightforward.

Let's now describe room assignments.

At the beginning of 2011 Alice (e-1) is in c-1000; Bob (e-2) is in
c-1001; Charlie (e-3) is in c-1002. How can
we represent this in RDF?

We define an RDF/RDFS/OWL relationship type aka property, called eg:hasCubicle

Let's say our corporate ontologist comes up with this schematic
description of cubicle assignments:

 * eg:hasCubicle has a domain of eg:Employee, a range of eg:Cubicle.
 * it is an owl:FunctionalProperty, because any Employee has at most
one Cubicle related via hasCubicle.
 * it is an owl:InverseFunctionalProperty, because any Cubicle is the
value of hasCubicle for no more than one Employee.

So... at beginning of 2011 it would be truthy to assert these RDF claims:

 * <http://example.com/e-1> <http://example.com/hasCubicle>
<http://example.com/c-1000> .
 * <http://example.com/e-2> <http://example.com/hasCubicle>
<http://example.com/c-1001> .
 * <http://example.com/e-3> <http://example.com/hasCubicle>
<http://example.com/c-1002> .

Now, come March 10th, everyone at the company receives an all-staff
email from Dogbert, with cubicle reassignments.
Amongst other changes, Alice and Bob are swapping cubicles, and
Charlie stays in c-1002.

Within a week or so (let's say by March 20th to be sure) The cubicle
moves are all made real, in terms
of where people are supposed to be based, where they are, and where
their stuff and phone line routings are.

The fictional world by March 20th 2011 is now truthily described by
the following claims:

 * <http://example.com/e-1> <http://example.com/hasCubicle>
<http://example.com/c-1001> .
 * <http://example.com/e-2> <http://example.com/hasCubicle>
<http://example.com/c-1000> .
 * <http://example.com/e-3> <http://example.com/hasCubicle>
<http://example.com/c-1002> .


Questions / view from Named Graphs.

1. Was it a mistake, bad modelling style etc, to describe things with
'hasCubicle'? Should we have instead
described a date-stamped 'CubicleAssignmentEvent' that mentions for
example the roles of Dogbert, Alice,
and some Cubicle? Is there a 'better' way to describe things? Is this
an acceptable way to describe things?

2. How should we express then the notion that each employee has at
most one cubicle and vice versa? Is this
appropriate material to try to capture in OWL?

3. How should a SPARQL store or TriG++ document capture the different
graphs describing the evolving state of the
company's office-space allocations?

4. Can we offer any practical but machine-readable metadata that helps
indicate to consuming applications
the potential problems that might come from merging different graphs
that use this modelling style?
For example, can we write any useful definition for a class of
property "TimeVolatileProperty" that could help
people understand risk of merging different RDF graphs using 'hasCubicle'?

5. Can the 'snapshot of the world-as-it-now-is' view and the
'transaction / event log view' be equal citizens, stored in the same
RDF store, and can metadata / manifest / table of contents info for
that store be used to make the information usefully exploitable and
reasonably truthy?
Received on Thursday, 13 October 2011 16:35:23 UTC