Re: "shape" as a relationship, not a class from Irene Polikoff on 2015-02-22 (public-data-shapes-wg@w3.org from February 2015)

From: Irene Polikoff <irene@topquadrant.com>
Date: Sat, 21 Feb 2015 19:51:34 -0500
To: "kcoyle@kcoyle.net" <kcoyle@kcoyle.net>
Cc: "public-data-shapes-wg@w3.org" <public-data-shapes-wg@w3.org>
Message-Id: <714D7F1F-6D97-41DB-B243-E602150DAA69@topquadrant.com>
The need to keep versioning/provenance information about the data is well known and exists in many domains and applications.

It is very common for database records to have fields for, let's say, person's name, birthdate, the date they received the driver license on and so on and the 'created by', 'created on', 'modified by', 'modified on' fields. The latter fields are understood as 'data management' information. It is about the data, not about Alice. This has been supported by systems forever, yet I don't believe there has been a need to create another ID to make this distinction apparent.

While there are many objects we keep data about (books, people, weather etc.) and a great variety of properties are needed to describe them, the other type of properties (the data management kind) are pretty standard across all objects and there are not many of them. If more clarity is needed, it could be accomplished by, for example, having a special vocabulary/namespace, for the data management properties. Like PROVO? Or keeping this information in a separate graph that is intended for data management.

If one feels a need to create a separate URI for the purpose of tracking data provenance, one needs to consider that, unlike relational databases where an entire record gets created or updated, RDF is granular. If you want to know who and when created or changed information about a book, you probably need to know it on a triple by triple basis. Because I could have modified its title yesterday and someone else could have modified its author today. These are two separate data entry operations with different authors and time stamps.

Irene


> On Feb 21, 2015, at 11:41 AM, Karen Coyle <kcoyle@kcoyle.net> wrote:
> 
> In library data we do run into a problem of "what does the URI identify"
> - and I don't think the philosophical distinction of RWO is necessary to
> understand this.
> 
> My metadata describes a book. A book has an author, a title, various
> topics. There is a URI that identifies the book that is the subject of
> those triples. I also want to say when the metadata was created and by
> whom. That requires a different URI because the subject is different -
> the subject is the metadata, not the book itself. The cataloger who
> creates the metadata is not the creator of the book; the title is the
> title of the book, not the title of the metadata. Because the URI rule
> is that each URI identifies one and only one "thing", I need separate
> URIs for these two things.
> 
> This becomes more difficult when you use the same URI for both, as is
> often the case with web pages. But the principle is the same -- it's the
> difference between what the web page is ABOUT and the web page itself.
> Those are two different things, and the fact that the whole http
> range-14 thing was developed to respond to that shows me that the web
> page "case" has some particular problems. But I clearly could need to
> distinguish between Alice and the metadata or web page about Alice. The
> date of the creation of the DMV record about Alice is not the same as
> the data of creation of Alice. The creator of the post to a Facebook
> page is not the creator of Alice.
> 
> I see this not as a question of which is the RWO, but what is the
> subject of the data. That sidesteps the philosophical issue, and in my
> mind the distinction is usually quite easy to define.
> 
> kc
> 
>> On 2/20/15 8:49 AM, Irene Polikoff wrote:
>> Thanks Harold. Unfortunately, it doesn’t. I must be very dense :)
>> 
>> I’ve read this before - many times. One reason this doesn’t do anything
>> for me is that I see the distinctions described here as trivial,
>> uninteresting and not realistic.
>> 
>> In the information world (and this is what we are dealing with here - bits
>> and bytes), we never deal with the real, living and breathing Alice. We
>> can’t. We are dealing with some digital identifier of Alice’s information
>> record. Information about Alice can be contained in multiple places - her
>> employer has some data about her, IRS (if she leaves in the US) has some
>> data about her, DMV has some data about her and so on. All these systems
>> have some data validation constraints. One may require that there must be
>> a phone number, another one doesn’t. These are not web documents - yet
>> they have strong data quality requirements.
>> 
>> Also, there is not a single home page that renders some information about
>> Alice. She could have her own web site or a blog. She could also have a
>> Facebook page and a LinkedIn page. Her employer may have a web page for
>> her on their web site. Some pages may be static, some may be generated
>> from the information in some database that has a set of records (or even
>> RDF triples) about Alice. The pages could have some constraints, but they
>> are not in any way more likely to have constraints than the underlying
>> data.
>> 
>> The question for me remains, so what? What is the “therefore”?
>> 
>> Irene
>> 
>>> On 2/20/15, 11:17 AM, "Solbrig, Harold R." <Solbrig.Harold@mayo.edu> wrote:
>>> 
>>> Would:
>>> 
>>> http://www.w3.org/TR/cooluris/#semweb
>>> 
>>> Help?
>>> 
>>> Section 3.1 describes the situation: "Bob may not like the look of the
>>> homepage, but fancy
>>> the person Alice. So two URIs are needed, one for Alice, one for the
>>> homepage or
>>> a RDF document describing Alice."
>>> 
>>> Paraphrasing, "I may thank that the shape of Alice's (RDF) home page does
>>> not conform to my requirements.  This is NOT about Alice, it is about the
>>> description"
>>> 
>>> 
>>>> On 2/20/15, 9:53 AM, "Irene Polikoff" <irene@topquadrant.com> wrote:
>>>> 
>>>> I believe that ³real word object² in the Semantic Web speak doesn¹t mean
>>>> that it has a physical representation. It is also a concept.
>>>> 
>>>> In that sense, a user account is as much of a real world thing as a
>>>> person. One can create a class User Account to say that a user account
>>>> can
>>>> be created by someone (system administrator), that it has valid from and
>>>> to dates and that it is an account of some person, etc.
>>>> 
>>>> As for web documents, there can be a web document presenting information
>>>> about a person as much as there can be a web document presenting
>>>> information about a user account. And there could be multiple ways to
>>>> render information about either a person or a user account.
>>>> 
>>>> I have to say that while conceptually I understand the distinction
>>>> between
>>>> ³real things² and ³information resources², I still don¹t understand the
>>>> practical application of the distinction after much reading. To me, the
>>>> distinction has to do with some very particular viewpoint that is
>>>> somewhat
>>>> esoteric. After all, we are dealing with the world of data and software.
>>>> We can¹t process anything, but information.
>>>> 
>>>> Since I was struggling with this, I thought that may be making this
>>>> distinction is really important for dereferencing (not that other, non
>>>> Semantic Web systems don¹t display web documents) and I am missing some
>>>> technical knowledge to get the ³aha². So, a year ago I¹ve asked three
>>>> separate senior developers/technical architects who had shallow exposure
>>>> to RDF but didn¹t come from the Semantic Web community to read on this
>>>> subject and tell me if they understood it and could explain it. All three
>>>> couldn¹t make sense of it. They just thought it was irrelevant. These
>>>> folks are all fairly bright and capable with 7 or more years of technical
>>>> experience.
>>>> 
>>>> This is a limited experiment, for sure, but so far it confirms Holger¹s
>>>> view that this is not something people care about or need to understand.
>>>> 
>>>> Irene
>>>> 
>>>>> On 2/20/15, 10:15 AM, "Arthur Ryman" <ryman@ca.ibm.com> wrote:
>>>>> 
>>>>> Holger Knublauch <holger@topquadrant.com> wrote on 02/08/2015 05:36:32
>>>>> PM:
>>>>> 
>>>>>> ... I am afraid the distinction
>>>>>> between real-world objects and their representation drifts into
>>>>>> theoretical realms that nobody outside of the RDF world seems to care
>>>>>> about (and rightfully so).
>>>>> 
>>>>> Holger,
>>>>> 
>>>>> The distinction is important in some cases because if you fail to make
>>>>> the
>>>>> distinction, then when you read the RDF, it sounds like nonsense. The
>>>>> classic example is the distinction between a person and a user account
>>>>> owned by that person. A person is a RWO and should have a URI that is
>>>>> different that the user account, which is an information resource (a web
>>>>> document).
>>>>> 
>>>>> A web document can have properties such as creator (a person), creation
>>>>> date, modification date, etc. It makes sense to say that a user account
>>>>> document has a modification date, but it is nonsense to say that the
>>>>> person who owns the user account has that modification date (barring
>>>>> coincidental plastic surgery on that date). FOAF makes this clear. This
>>>>> whole topic is nicely discussed in [1], which is co-authored by your
>>>>> newest colleague.
>>>>> 
>>>>> [1] http://www.w3.org/TR/cooluris/
> 
> -- 
> Karen Coyle
> kcoyle@kcoyle.net http://kcoyle.net
> m: 1-510-435-8234
> skype: kcoylenet/+1-510-984-3600
>
Received on Sunday, 22 February 2015 00:52:08 UTC