Re: Dilbert example - defining hasCubicle from Jeremy Carroll on 2011-10-14 (public-rdf-wg@w3.org from October 2011)

From: Jeremy Carroll <jeremy@topquadrant.com>
Date: Fri, 14 Oct 2011 13:56:11 -0700
To: public-rdf-wg@w3.org
Message-ID: <4E98A1EB.20105@topquadrant.com>
I think Alex's example is an excellent case where HTTP caching is a 
sufficient solution. The copy of the assignments is a cache, and if you 
make an HTTP cache without following the HTTP recommended caching 
mechanism, and it goes wrong .... fix the damn code, not the model.

Jeremy




On 10/14/2011 8:13 AM, David Wood wrote:
> Hi Alex,
>
> On Oct 14, 2011, at 11:06, Alex Hall wrote:
>
>> On Thu, Oct 13, 2011 at 9:43 PM, David Wood <david@3roundstones.com 
>> <mailto:david@3roundstones.com>> wrote:
>>
>>     Hi Dan,
>>
>>     Unfortunately, your scenario doesn't have an explicit requirement
>>     for any temporal data.  Why not just update all the cubical
>>     assignments when they change?  Even Jeremy's sandwich delivery
>>     requirement could be satisfied by a trivial SPARQL query that
>>     lists current cube assignments.
>>
>>
>> I think there's a very clear requirement for temporal data here. 
>>  Sure, you could update all the cubical assignments *in your own 
>> database* when they change, but once you publish those assignments 
>> you can't un-publish them.  If somebody copies those assignments and 
>> stores them somewhere else (e.g. corporate headquarters collects 
>> cubical assignments from all departments to create a company-wide 
>> directory) then you now have the very real possibility of that copy 
>> of the data becoming out of sync with the most current version.
>>
>> Recording that on Jan 1, Alice was assigned cube 1000 and on March 1, 
>> Alice was assigned cube 1001 is much more resistant to becoming out 
>> of sync like this, because the context is in the data so you don't 
>> have to carry it around as metadata.  The obvious downside is that it 
>> makes queries to find the current state of things more complicated.
>
>
> Yes, good point.  So you suggest a cubicleChangeEvent class and the 
> recording of each event?  I agree.
>
> Regards,
> Dave
>
>
>>
>> I've seen this sort of event-based modeling in real-world systems, 
>> e.g. an HR system defines a person's salary as the value of the 
>> salary property of the most recent SalaryEvent recorded for that person.
>>
>> -Alex
>>
>>
>>     One could concoct a scenario in which Wally (Dilbert's
>>     notoriously vindictive and lazy co-worker) is collecting data
>>     with which to report the Pointy Haired Boss to the board for
>>     extraneous cubical arranging, but that seems contrived (because
>>     it is).
>>
>>     I propose that the age requirement is a more appropriate scenario
>>     to start with.  Each employee has their birthday recorded.
>>      Dogbert wishes to send de-motivational birthday greetings and
>>     constructs a SPARQL query to discover which employees should get
>>     a card on any given day.  The query would use some SPARQL 1.1
>>     features to calculate who has a birthday.  We could explain why a
>>     birthday is recorded and not employee age.
>>
>>     It would be beneficial to also do something that is hard or
>>     impossible with an RDBMS.  Perhaps the scenario could be extended
>>     with another dataset, this one kept by the Pointy Haired Boss.
>>      Eventually, someone (probably Dogbert) would merge the two
>>     datasets to satisfy some new query that can only be answered
>>     across both datasets.
>>
>>     The Boss's dataset tracks which employees are behind in their
>>     work.  This is where temporal data comes in, because he wants to
>>     keep a history and query who the worst employee is for all time,
>>     not just the current time.  Like Dogbert's dataset, the Boss uses
>>     the same employee ids, making later merging easy.
>>
>>     The Boss might track projects, their start dates, their
>>     anticipated end dates and their actual end dates.  This is like a
>>     real-world Gantt chart, but simplified.  Each project involves
>>     one or more employees.  To follow your lead, each project id
>>     starts with p- and an integer:
>>
>>     <http://example.com/p-120>
>>      hasStartDate '2011-03-12'^^xsd:Date ;
>>      hasPlannedEndDate '2011-05-28'^^xsd:Date ;
>>      hasActualEndDate '2011-08-31'^^xsd:Date ;
>>      assignedEmployee <http://example.com/e-1> .
>>
>>     Additional employees could use duplicated assignedEmployee
>>     properties.
>>
>>     I think your same questions to Pat apply, but the scenario seems
>>     less contrived to me.
>>
>>     The merged data from the two datasets could be used to justify,
>>     e.g., the firing of older employees by Dogbert before they could
>>     claim a pension.
>>
>>     Just my two cents (pence) at the end of a long day.  Please
>>     ignore if I am no longer making sense.
>>
>>     Regards,
>>     Dave
>>
>>
>>     On Oct 13, 2011, at 12:34, Dan Brickley <danbri@danbri.org
>>     <mailto:danbri@danbri.org>> wrote:
>>
>>     > Pat, (well, everyone; but triggered by Pat's comments)
>>     >
>>     > You're suggesting if I read you right, that RDF shouldn't be
>>     written
>>     > in ways that make it's truth context-dependent; e.g. that a
>>     'date of
>>     > birth' property is preferable by far to an 'age' property.
>>     >
>>     > Below is a sketch of a reasonably common descriptive scenario.
>>     Could
>>     > you maybe suggest a modelling / descriptive idiom that avoids these
>>     > problems? I hope it anchors some of the
>>     > issues we've discussed in a small enough example that might be
>>     turned
>>     > into concrete decision test cases or example documentation.
>>     >
>>     > Dan
>>     >
>>     > -----
>>     >
>>     > Theory and Practice
>>     >
>>     > Consider an RDF vocabulary for describing office assignments in
>>     the cartoon
>>     > universe inhabited by Dilbert
>>     <http://en.wikipedia.org/wiki/Dilbert>.
>>     > First I describe the universe, then some ways in
>>     > which we might summarise what's going on using RDF graph
>>     descriptions.
>>     > I would love to get a sense for any
>>     > 'best practice' claims here. Personally I see no single best way to
>>     > deal with this, only different and annoying tradeoffs.
>>     >
>>     > So --- this is a fictional highly simplified company in which
>>     workers
>>     > each are assigned to occupy exactly one cubicle,
>>     > and in which every cubicle has at most one assigned worker.
>>     Cubicles
>>     > may also sometimes
>>     > be empty.
>>     >
>>     > * Every 3 months, the Pointy-haired boss
>>     >
>>     <http://en.wikipedia.org/wiki/List_of_Dilbert_characters#Pointy-haired_boss>
>>     >  has a strategic re-organization, and re-assigns workers to
>>     cubicles.
>>     > * He does this in a memo dictated to Dogbert, who will take the
>>     boss's
>>     > vague and forgetful instructions and compare them
>>     >  to an Excel spreadsheet. This, cleaned up, eventually becomes an
>>     > emailed Word .doc sent to the all-staff@ mailing list.
>>     >  The word document is basically a table of room moves, it is headed
>>     > with a date and in bold type "EFFECTIVE
>>     >  IMMEDIATELY", usually mailed out mid-evening and read by staff the
>>     > next morning.
>>     > * In practice, employees move their stuff to the new cubicles
>>     over the
>>     > course of a few days; longer if they're
>>     >  on holiday or off sick. Phone numbers are fixed later,
>>     hopefully. As
>>     > are name badges etc.
>>     > * But generally the move takes place the day after the word file is
>>     > circulated, and at any one point, a given
>>     >  cubicle can be fairly said to have at most one official
>>     occupant worker.
>>     >
>>     > So let's try to model this in RDF/RDFS/OWL.
>>     >
>>     > First, we can talk about the employees. Let's make a class,
>>     'Employee'.
>>     >
>>     > In the company systems, each employee has an ID, which is 'e-'
>>     plus an
>>     > integer. Once assigned, these are
>>     > never re-assigned, even if the employee leaves or dies.
>>     >
>>     > We also need to talk about the office space units, the cubes or
>>     > 'Cubicles'. Let's forget for now that
>>     > the furniture is movable, and treat each Cubicle as if it lasts
>>     > forever. Maybe they are even somehow symbolic
>>     > cubicle names, and the furniture that embodies them can be moved
>>     > around to diferent office locations. But we
>>     > don't try modelling that for now.
>>     >
>>     > In the company systems, each cubicle has an ID, which is 'c-'
>>     plus an
>>     > integer. Once assigned, these are
>>     > never re-assigned, even if the cubicle becomes in any sense
>>     de-activated.
>>     >
>>     > Let's represent these as IRIs. Three employees, three cubicles.
>>     >
>>     > * http://example.com/e-1
>>     > * http://example.com/e-2
>>     > * http://example.com/e-3
>>     > * http://example.com/c-1000
>>     > * http://example.com/c-1001
>>     > * http://example.com/c-1002
>>     >
>>     > We can describe the names of employees. Cubicicles also have
>>     informal
>>     > names. Let's say that neither change, ever.
>>     >
>>     > * e-1 name 'Alice'
>>     > * e-2 name 'Bob'
>>     > * e-3 name 'Charlie'
>>     > * c-1000 'The Einstein Suite'.
>>     > * c-1001 'The doghouse'.
>>     > * c-1002 'Helpdesk'.
>>     >
>>     > Describing these in RDF is pretty straightforward.
>>     >
>>     > Let's now describe room assignments.
>>     >
>>     > At the beginning of 2011 Alice (e-1) is in c-1000; Bob (e-2) is in
>>     > c-1001; Charlie (e-3) is in c-1002. How can
>>     > we represent this in RDF?
>>     >
>>     > We define an RDF/RDFS/OWL relationship type aka property,
>>     called eg:hasCubicle
>>     >
>>     > Let's say our corporate ontologist comes up with this schematic
>>     > description of cubicle assignments:
>>     >
>>     > * eg:hasCubicle has a domain of eg:Employee, a range of eg:Cubicle.
>>     > * it is an owl:FunctionalProperty, because any Employee has at most
>>     > one Cubicle related via hasCubicle.
>>     > * it is an owl:InverseFunctionalProperty, because any Cubicle
>>     is the
>>     > value of hasCubicle for no more than one Employee.
>>     >
>>     > So... at beginning of 2011 it would be truthy to assert these
>>     RDF claims:
>>     >
>>     > * <http://example.com/e-1> <http://example.com/hasCubicle>
>>     > <http://example.com/c-1000> .
>>     > * <http://example.com/e-2> <http://example.com/hasCubicle>
>>     > <http://example.com/c-1001> .
>>     > * <http://example.com/e-3> <http://example.com/hasCubicle>
>>     > <http://example.com/c-1002> .
>>     >
>>     > Now, come March 10th, everyone at the company receives an all-staff
>>     > email from Dogbert, with cubicle reassignments.
>>     > Amongst other changes, Alice and Bob are swapping cubicles, and
>>     > Charlie stays in c-1002.
>>     >
>>     > Within a week or so (let's say by March 20th to be sure) The
>>     cubicle
>>     > moves are all made real, in terms
>>     > of where people are supposed to be based, where they are, and where
>>     > their stuff and phone line routings are.
>>     >
>>     > The fictional world by March 20th 2011 is now truthily described by
>>     > the following claims:
>>     >
>>     > * <http://example.com/e-1> <http://example.com/hasCubicle>
>>     > <http://example.com/c-1001> .
>>     > * <http://example.com/e-2> <http://example.com/hasCubicle>
>>     > <http://example.com/c-1000> .
>>     > * <http://example.com/e-3> <http://example.com/hasCubicle>
>>     > <http://example.com/c-1002> .
>>     >
>>     >
>>     > Questions / view from Named Graphs.
>>     >
>>     > 1. Was it a mistake, bad modelling style etc, to describe
>>     things with
>>     > 'hasCubicle'? Should we have instead
>>     > described a date-stamped 'CubicleAssignmentEvent' that mentions for
>>     > example the roles of Dogbert, Alice,
>>     > and some Cubicle? Is there a 'better' way to describe things?
>>     Is this
>>     > an acceptable way to describe things?
>>     >
>>     > 2. How should we express then the notion that each employee has at
>>     > most one cubicle and vice versa? Is this
>>     > appropriate material to try to capture in OWL?
>>     >
>>     > 3. How should a SPARQL store or TriG++ document capture the
>>     different
>>     > graphs describing the evolving state of the
>>     > company's office-space allocations?
>>     >
>>     > 4. Can we offer any practical but machine-readable metadata
>>     that helps
>>     > indicate to consuming applications
>>     > the potential problems that might come from merging different
>>     graphs
>>     > that use this modelling style?
>>     > For example, can we write any useful definition for a class of
>>     > property "TimeVolatileProperty" that could help
>>     > people understand risk of merging different RDF graphs using
>>     'hasCubicle'?
>>     >
>>     > 5. Can the 'snapshot of the world-as-it-now-is' view and the
>>     > 'transaction / event log view' be equal citizens, stored in the
>>     same
>>     > RDF store, and can metadata / manifest / table of contents info for
>>     > that store be used to make the information usefully exploitable and
>>     > reasonably truthy?
>>     >
>>
>>
>
Received on Friday, 14 October 2011 20:56:35 UTC