Re: Data modelling from Andy Seaborne on 2012-06-19 (public-ldp-wg@w3.org from June 2012)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Tue, 19 Jun 2012 11:54:56 +0100
To: public-ldp-wg@w3.org
Message-ID: <4FE05A80.3020308@epimorphics.com>
On 18/06/12 17:16, Erik.Wilde@emc.com wrote:
> hello andy.
>
> thanks a lot for your feedback!
>> On the other hand, "excessive modelling" can produce something unusable.
>>   It is hard balance. Some basic guidance by example would probably be
>> very helpful to developers.
>
> without going into the details of your comments (which are very relevant),
> maybe the interesting question is whether the platform should talk about
> "the things themselves" at all. coming from a SOA perspective, i don't
> think it should even try. we should focus on the service layer, and it is
> up to the service design to decide how to communicate information about
> resources. for example, http://tools.ietf.org/html/rfc4287#section-4.2.15
> is fuzzy ("indicating the most recent instant in time when an entry or
> feed was modified in a way the publisher considers significant.") for a
> reason: as a consumer, i don't know how data is handled in the back-end,
> and that's a feature. i am interested to learn about relevant events
> published through the service i am consuming. if a typo in a news story is
> corrected, please don't notify me, i really don't want to know. if,
> however, major things happen, i want to know. so the question of what
> we're communicating on the platform's service level should be completely
> decoupled from what we're managing in the back-end, and how you translate
> the data layer into the service level is a question of a service's design,
> and not something that the platform should or even can define.

Hi Erik,

Interesting point about SOAI think the submission is already in this 
place though because there is a vocabulary for the BPRs and especially 
the BPCs.  By using dcterms:modified or dcterms:title, we need to be 
clear what is the subject resource.

We came across this when modelling services, where we wanted to talk 
about the service and the information about the service, then link 
across that information (federated - so multiple identifiers for the 
same thing just happen naturally).

A similar style comes up in geospatial databases where there is a intent 
to avoid talking about the real thing, just the virtual abstraction.

The FRBR model [0] has something to say about this as well; hopefully 
some one in the WG knows more about this.

In geospatial databases, at least in ones I've looked at, there is 
conflation of the record about the thing and the thing itself, indeed 
the language is usually to avoid the real work item at all.  It works 
because there is also an assumption that each physical object modelled 
has one and only one record about it.  The fact that some fields of the 
record are about the metadata and some the object itself does not matter 
(much).

But this has limitations when you consider linking across two databases. 
  How can an application take an identifier for the "Bridge of Sighs" 
[1] in one geo DB (say, of global places) and use it to link to another 
(say, UK specific).  How can it also say that X thinks it is in Venice 
and Y thinks it is Cambridge [2] or Oxford [3] (which isn't even called 
the Bridge of Sighs!).

[1] http://en.wikipedia.org/wiki/Bridge_of_Sighs
[2] http://en.wikipedia.org/wiki/Bridge_of_Sighs_%28Cambridge%29
[3] http://en.wikipedia.org/wiki/Bridge_of_Sighs_%28Oxford%29

"atom:updated" recognizes the issue that the time may be the entry or 
the feed.  They are closely linked.  But in the wider use of an LDP, 
e.g. the stocks example of the submission, the record about the resource 
and the resource itself are not so closely linked.

So just the need to talk about the BPR/BPC and the resource the BPR is 
about will raise these issues.  We are defining a "service" here in the 
container service.

EricP's touches on this - it's about reuse of vocabulary:

EricP wrote:
> Every time I use a service of some sort, I read some specification to
> give me detailed instructions on how to construct the input and parse
> the output. While reading that spec, I have to grok the data model of
> the service architect (or at least of the person who wrote the docs),
> translate my application data into that model, and parse the response
> back into my model. Many services don't even use the same model for
> input and output.

In this case, the specification to read is the DC terms vocabulary.   It 
makes it clear that dcterms:modified and dcterms:title refer to the same 
resource (their word) when it's the same subject (which is a AWWW-ism).

In the examples I gave, the vocabulary is used in different ways at 
different points. It's conflating the resource being described and the 
metadata record about the resource (resource here is the Dublin Core 
resources - remember the subject of the triples is the same therefore it 
is the same resource regardless of anything else).


There are many ways to address this - there may well be others as well 
(my list got longer as I wrote it!)

1/ specifically defined predicates that so we have ":recordModified" and 
":resourceModified".  This feels like it will get messy; it does 
preclude reusing other vocabularies not designed for this.

A variation of this might be the "senses" idea that is currently 
emerging for httpRange-14.

2/ weak predicates, like the atom approach ":changed".  Atom is in a 
slightly different position of the feed and the entry are more closely 
linked but if we are federating then I'm not sure this will stand up.

3/ having two URIs - one for the record, one for the resource. They can 
be in the same message body.

4/ Protocol solutions are out - but the GET/MGET (Patrick Stickler) idea 
might have worked for this.

(Hmm - I seem to have settled on saying "record" not metatdata - analogy 
to libraries with card indexes (record) and books on shelves (resources)?).

	Andy
Received on Tuesday, 19 June 2012 10:55:26 UTC