Re: Dilbert example - defining hasCubicle from David Wood on 2011-10-14 (public-rdf-wg@w3.org from October 2011)

From: David Wood <david@3roundstones.com>
Date: Thu, 13 Oct 2011 21:43:39 -0400
To: Dan Brickley <danbri@danbri.org>
Cc: Pat Hayes <phayes@ihmc.us>, RDF WG <public-rdf-wg@w3.org>
Message-Id: <7A81725E-7B94-406A-9CE9-691413C8DA51@3roundstones.com>
Hi Dan,

Unfortunately, your scenario doesn't have an explicit requirement for any temporal data.  Why not just update all the cubical assignments when they change?  Even Jeremy's sandwich delivery requirement could be satisfied by a trivial SPARQL query that lists current cube assignments.

One could concoct a scenario in which Wally (Dilbert's notoriously vindictive and lazy co-worker) is collecting data with which to report the Pointy Haired Boss to the board for extraneous cubical arranging, but that seems contrived (because it is).

I propose that the age requirement is a more appropriate scenario to start with.  Each employee has their birthday recorded.  Dogbert wishes to send de-motivational birthday greetings and constructs a SPARQL query to discover which employees should get a card on any given day.  The query would use some SPARQL 1.1 features to calculate who has a birthday.  We could explain why a birthday is recorded and not employee age.

It would be beneficial to also do something that is hard or impossible with an RDBMS.  Perhaps the scenario could be extended with another dataset, this one kept by the Pointy Haired Boss.  Eventually, someone (probably Dogbert) would merge the two datasets to satisfy some new query that can only be answered across both datasets.

The Boss's dataset tracks which employees are behind in their work.  This is where temporal data comes in, because he wants to keep a history and query who the worst employee is for all time, not just the current time.  Like Dogbert's dataset, the Boss uses the same employee ids, making later merging easy.

The Boss might track projects, their start dates, their anticipated end dates and their actual end dates.  This is like a real-world Gantt chart, but simplified.  Each project involves one or more employees.  To follow your lead, each project id starts with p- and an integer:

<http://example.com/p-120>
  hasStartDate '2011-03-12'^^xsd:Date ;
  hasPlannedEndDate '2011-05-28'^^xsd:Date ;
  hasActualEndDate '2011-08-31'^^xsd:Date ;
  assignedEmployee <http://example.com/e-1> .

Additional employees could use duplicated assignedEmployee properties.

I think your same questions to Pat apply, but the scenario seems less contrived to me.

The merged data from the two datasets could be used to justify, e.g., the firing of older employees by Dogbert before they could claim a pension.

Just my two cents (pence) at the end of a long day.  Please ignore if I am no longer making sense.

Regards,
Dave


On Oct 13, 2011, at 12:34, Dan Brickley <danbri@danbri.org> wrote:

> Pat, (well, everyone; but triggered by Pat's comments)
> 
> You're suggesting if I read you right, that RDF shouldn't be written
> in ways that make it's truth context-dependent; e.g. that a 'date of
> birth' property is preferable by far to an 'age' property.
> 
> Below is a sketch of a reasonably common descriptive scenario. Could
> you maybe suggest a modelling / descriptive idiom that avoids these
> problems? I hope it anchors some of the
> issues we've discussed in a small enough example that might be turned
> into concrete decision test cases or example documentation.
> 
> Dan
> 
> -----
> 
> Theory and Practice
> 
> Consider an RDF vocabulary for describing office assignments in the cartoon
> universe inhabited by Dilbert <http://en.wikipedia.org/wiki/Dilbert>.
> First I describe the universe, then some ways in
> which we might summarise what's going on using RDF graph descriptions.
> I would love to get a sense for any
> 'best practice' claims here. Personally I see no single best way to
> deal with this, only different and annoying tradeoffs.
> 
> So --- this is a fictional highly simplified company in which workers
> each are assigned to occupy exactly one cubicle,
> and in which every cubicle has at most one assigned worker. Cubicles
> may also sometimes
> be empty.
> 
> * Every 3 months, the Pointy-haired boss
> <http://en.wikipedia.org/wiki/List_of_Dilbert_characters#Pointy-haired_boss>
>  has a strategic re-organization, and re-assigns workers to cubicles.
> * He does this in a memo dictated to Dogbert, who will take the boss's
> vague and forgetful instructions and compare them
>  to an Excel spreadsheet. This, cleaned up, eventually becomes an
> emailed Word .doc sent to the all-staff@ mailing list.
>  The word document is basically a table of room moves, it is headed
> with a date and in bold type "EFFECTIVE
>  IMMEDIATELY", usually mailed out mid-evening and read by staff the
> next morning.
> * In practice, employees move their stuff to the new cubicles over the
> course of a few days; longer if they're
>  on holiday or off sick. Phone numbers are fixed later, hopefully. As
> are name badges etc.
> * But generally the move takes place the day after the word file is
> circulated, and at any one point, a given
>  cubicle can be fairly said to have at most one official occupant worker.
> 
> So let's try to model this in RDF/RDFS/OWL.
> 
> First, we can talk about the employees. Let's make a class, 'Employee'.
> 
> In the company systems, each employee has an ID, which is 'e-' plus an
> integer. Once assigned, these are
> never re-assigned, even if the employee leaves or dies.
> 
> We also need to talk about the office space units, the cubes or
> 'Cubicles'. Let's forget for now that
> the furniture is movable, and treat each Cubicle as if it lasts
> forever. Maybe they are even somehow symbolic
> cubicle names, and the furniture that embodies them can be moved
> around to diferent office locations. But we
> don't try modelling that for now.
> 
> In the company systems, each cubicle has an ID, which is 'c-' plus an
> integer. Once assigned, these are
> never re-assigned, even if the cubicle becomes in any sense de-activated.
> 
> Let's represent these as IRIs. Three employees, three cubicles.
> 
> * http://example.com/e-1
> * http://example.com/e-2
> * http://example.com/e-3
> * http://example.com/c-1000
> * http://example.com/c-1001
> * http://example.com/c-1002
> 
> We can describe the names of employees. Cubicicles also have informal
> names. Let's say that neither change, ever.
> 
> * e-1 name 'Alice'
> * e-2 name 'Bob'
> * e-3 name 'Charlie'
> * c-1000 'The Einstein Suite'.
> * c-1001 'The doghouse'.
> * c-1002 'Helpdesk'.
> 
> Describing these in RDF is pretty straightforward.
> 
> Let's now describe room assignments.
> 
> At the beginning of 2011 Alice (e-1) is in c-1000; Bob (e-2) is in
> c-1001; Charlie (e-3) is in c-1002. How can
> we represent this in RDF?
> 
> We define an RDF/RDFS/OWL relationship type aka property, called eg:hasCubicle
> 
> Let's say our corporate ontologist comes up with this schematic
> description of cubicle assignments:
> 
> * eg:hasCubicle has a domain of eg:Employee, a range of eg:Cubicle.
> * it is an owl:FunctionalProperty, because any Employee has at most
> one Cubicle related via hasCubicle.
> * it is an owl:InverseFunctionalProperty, because any Cubicle is the
> value of hasCubicle for no more than one Employee.
> 
> So... at beginning of 2011 it would be truthy to assert these RDF claims:
> 
> * <http://example.com/e-1> <http://example.com/hasCubicle>
> <http://example.com/c-1000> .
> * <http://example.com/e-2> <http://example.com/hasCubicle>
> <http://example.com/c-1001> .
> * <http://example.com/e-3> <http://example.com/hasCubicle>
> <http://example.com/c-1002> .
> 
> Now, come March 10th, everyone at the company receives an all-staff
> email from Dogbert, with cubicle reassignments.
> Amongst other changes, Alice and Bob are swapping cubicles, and
> Charlie stays in c-1002.
> 
> Within a week or so (let's say by March 20th to be sure) The cubicle
> moves are all made real, in terms
> of where people are supposed to be based, where they are, and where
> their stuff and phone line routings are.
> 
> The fictional world by March 20th 2011 is now truthily described by
> the following claims:
> 
> * <http://example.com/e-1> <http://example.com/hasCubicle>
> <http://example.com/c-1001> .
> * <http://example.com/e-2> <http://example.com/hasCubicle>
> <http://example.com/c-1000> .
> * <http://example.com/e-3> <http://example.com/hasCubicle>
> <http://example.com/c-1002> .
> 
> 
> Questions / view from Named Graphs.
> 
> 1. Was it a mistake, bad modelling style etc, to describe things with
> 'hasCubicle'? Should we have instead
> described a date-stamped 'CubicleAssignmentEvent' that mentions for
> example the roles of Dogbert, Alice,
> and some Cubicle? Is there a 'better' way to describe things? Is this
> an acceptable way to describe things?
> 
> 2. How should we express then the notion that each employee has at
> most one cubicle and vice versa? Is this
> appropriate material to try to capture in OWL?
> 
> 3. How should a SPARQL store or TriG++ document capture the different
> graphs describing the evolving state of the
> company's office-space allocations?
> 
> 4. Can we offer any practical but machine-readable metadata that helps
> indicate to consuming applications
> the potential problems that might come from merging different graphs
> that use this modelling style?
> For example, can we write any useful definition for a class of
> property "TimeVolatileProperty" that could help
> people understand risk of merging different RDF graphs using 'hasCubicle'?
> 
> 5. Can the 'snapshot of the world-as-it-now-is' view and the
> 'transaction / event log view' be equal citizens, stored in the same
> RDF store, and can metadata / manifest / table of contents info for
> that store be used to make the information usefully exploitable and
> reasonably truthy?
>
Received on Friday, 14 October 2011 02:32:45 UTC