Re: A rant about the terminology debate from Kingsley Idehen on 2012-08-24 (public-rdf-wg@w3.org from August 2012)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Fri, 24 Aug 2012 07:32:33 -0400
To: public-rdf-wg@w3.org
Message-ID: <50376651.1020108@openlinksw.com>
On 8/24/12 7:10 AM, Sandro Hawke wrote:
> On 08/24/2012 05:31 AM, Richard Cyganiak wrote:
>> Sandro,
>>
>> On 24 Aug 2012, at 03:53, Sandro Hawke wrote:
>>> People will need some way linguistically to distinguish X from Y
>>> People will instinctively say X to clarify they mean Y
>>> People rarely mean to only be talking about X, and when they do, 
>>> they should put the word Y in there
>>> People should be able to use X when they want to clarify they're 
>>> talking about Y
>> That kind of argument is irrelevant. People talk the way they talk. 
>> They won't pay attention to what we write anyways for the most part. 
>> We're not in the business of helping them to express themselves.
>>
>> We are in the business of extending an existing technical 
>> specification with clear definitions of a few additional concepts in 
>> order to promote consistency between W3C specifications and to 
>> promote interoperability between implementations that already use 
>> these concepts in some form.
>>
>> We'll never get anywhere with this discussion if we entertain the 
>> crazy notion that we can magically solve all uncertainty and 
>> enlighten the RDF community with a Great Global Renaming.
>
> That's a strawman.  I'm not saying names will magically solve 
> anything.  It sounds like our disagreement is a matter of degree. I 
> believe our choice of terms will affect the quality & uptake of our 
> specs by X%, while you believe it is a smaller amount Y%. Perhaps X=30 
> and Y=10.
>
> This is complicated by the fact that names interact with mental 
> models, so sometimes when we're discussing names, I think it turns out 
> to be a proxy for mental models.     And as we change our models, our 
> opinions on the best names are likely to change.
>
> So, let's put this back in the box for now, and hopefully when we've 
> got the model solved, we can spend one telecon talking over proposals 
> and make a final decision.   (I suppose we should give it an ISSUE 
> number, or broaden ISSUE-14 to include this.)
>
> I updated http://www.w3.org/2011/rdf-wg/wiki/Graph_Terminology and made
> http://www.w3.org/2011/rdf-wg/wiki/Graph_Terminology/Options and I'm 
> quite happy to let the matter drop for as long as possible.
>
>       -- Sandro

Sandro,

How would you map g-box, g-snap, and g-text in formal relational DBMS 
terminology? Such a mapping would help many. Basically, mapping to 
relations, sets of tuples, and notation.

Kingsley
>
>> Best,
>> Richard
>>
>>
>> On 24 Aug 2012, at 03:53, Sandro Hawke wrote:
>>
>>> On 08/23/2012 11:22 AM, Richard Cyganiak wrote:
>>>> On 23 Aug 2012, at 16:00, Sandro Hawke wrote:
>>>>>> You proposed to redefine "graph" by splitting it into two 
>>>>>> separate concepts, a mutable and an immutable one.
>>>>>>
>>>>>> I propose to instead redefine "named graph" in the same way, by 
>>>>>> splitting it into two separate concepts, a mutable and immutable 
>>>>>> one.
>>>>> You lost me here, sorry.   What's the use case for an immutable 
>>>>> named graph?
>>>> I guess I should have said "abstract named graph", sorry if that 
>>>> caused confusion. Abstract IRI-graph-pairs. The thing that SPARQL 
>>>> queries operate over.
>>>>
>>>>> And it sounds like you're suggesting "mutable named graph" as the 
>>>>> official term for g-box.  Is that right?
>>>> Almost. My definition of "mutable named graph" would be:
>>>>
>>>> "A *mutable named graph* is a resource, denoted by an IRI, that has 
>>>> a mutable association with an (abstract, immutable) RDF graph. The 
>>>> RDF graph is also known as the *state* of the mutable named graph."
>>>>
>>>> The key points are:
>>>>
>>>> 1) we insist that it is a resource, so the kind of thing denoted by 
>>>> IRIs
>>>> 2) we insist that it is actually denoted by some IRI
>>>> 3) it essentially has a mutable slot that contains an RDF graph
>>>>
>>>> This means it can cover both the terms "RDF space/g-box" and the 
>>>> term "(name, slot) pair" from the diagram in [1].
>>>>
>>>> I repeat my assertion that there is no need to ever talk about 
>>>> unnamed g-boxes.
>>> Yeah, this makes sense, but it's not my first or second choice in 
>>> naming proposals.   Probably wouldn't help to go into why/why not at 
>>> this point.
>>>
>>>>>>> I think the key elements are : (1) we stop using "RDF Graph" as the
>>>>>>> canonical, precise term for a g-snap;
>>>>>> I disagree; "RDF graph" is a perfectly fine term.
>>>>> I wish.   I can live with it, but I think it's hardly "fine". 
>>>>> People use it wrong all the time; they say "RDF graph" and mean a 
>>>>> mutable and/or distinct set of RDF triples.
>>>> I think by actually defining proper terms for these other things, 
>>>> and by clarifying that "graph" can mean "any of the above", we make 
>>>> a solid step towards improving the situation.
>>> I think you're saying "RDF graph"==g-snap; "graph"=g-snap/or/g-box.
>>>
>>> I have a problem with this.  I think people will need some way 
>>> linguistically to distinguish "graph" in the RDF world from "graph" 
>>> in the wider world, and the natural way to do that is to add the 
>>> modifier, "RDF".   So people will instinctively say "RDF graph" to 
>>> clarify they mean "graph" in the RDF sense (not a bar chart or 
>>> something).    But with your proposal, they've now accidentally 
>>> changed to talking about g-snaps.
>>>
>>> I think people rarely mean to only be talking about g-snaps, and 
>>> when they do, they can/should put the word "abstract" in there.   I 
>>> also think the presence or absence of the modifier "RDF" shouldn't 
>>> affect the semantics of the term -- people should be able to use it 
>>> when they want to clarify they're talking about RDF, without it 
>>> otherwise affecting the meaning.
>>>
>>>    -- Sandro
>>>
>>>> Best,
>>>> Richard
>>>>
>>>> [1] http://www.w3.org/2012/08/RDFNG.html#fig1
>>>>
>>>>
>>>>
>>>>> I'm not saying we have to solve this problem, or that we can, but 
>>>>> I think it would be helpful if we could and I think this proposal 
>>>>> is our best bet.
>>>>>
>>>>>       -- Sandro
>>>>>
>>>>>> But we can stress that "RDF graph" is an abstract, unnamed, 
>>>>>> immutable graph, and that when we talk about "graphs" in general 
>>>>>> then we may sometimes mean named ones that may or may not be 
>>>>>> mutable.
>>>>>>
>>>>>>> (2) we pick terms for g-box and g-snap that convey the idea that 
>>>>>>> they are two different kinds of "graphs";
>>>>>> I disagree; I believe that there is never any need to talk about 
>>>>>> *unnamed* g-boxes; all the g-boxes we want to talk about are 
>>>>>> named. Therefore, a term like "mutable named graph" is sufficient 
>>>>>> to say all that needs to be said about g-boxes.
>>>>>>
>>>>>>> (3) we use "graph" if/when we don't mind being ambiguous about 
>>>>>>> g-box/g-snap.
>>>>>> I'd rephrase that: We can use "graph" if/when we don't mind being 
>>>>>> ambiguous about g-snap/abstract-named-graph/mutable-named-graph. 
>>>>>> For example when we say, "SPARQL Update can be used to copy data 
>>>>>> from one graph to another". In that case we mean 
>>>>>> mutable-named-graph.
>>>>>>
>>>>>>> On your details....  let me start with:  to you, can you have a 
>>>>>>> named
>>>>>>> graph that's not in a dataset (or graph store)?
>>>>>> As defined in SPARQL (named graph == IRI-graph-pair), no.
>>>>>>
>>>>>> But if we allow a term such as "mutable named graph", then yes. A 
>>>>>> Turtle document on the Web is a "mutable named graph", in that 
>>>>>> sense. It doesn't have to be in any particular dataset. Well, 
>>>>>> it's in the Web, and for me it makes sense to speak of the entire 
>>>>>> web as a "mutable RDF dataset".
>>>>>>
>>>>>> Best,
>>>>>> Richard
>>>>>>
>>>>>>
>>>>>>> I don't usually hear the term used outside SPARQL, so I don't 
>>>>>>> have much of an ear for that usage.
>>>>>>       -- Sandro
>>>>>>
>>>>>>> Best,
>>>>>>> Richard
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 22 Aug 2012, at 18:07, Sandro Hawke wrote:
>>>>>>>
>>>>>>>> On 08/21/2012 03:33 AM, Andy Seaborne wrote:
>>>>>>>>> On 20/08/12 16:30, Sandro Hawke wrote:
>>>>>>>>>> If it wouldn't cause SPARQL too many problems, I'd suggest we 
>>>>>>>>>> should do
>>>>>>>>>> the same with dataset, and even allow a dataset to be a kind 
>>>>>>>>>> of graph, I
>>>>>>>>>> think, so that the world at large can use the word term "RDF 
>>>>>>>>>> dataset"
>>>>>>>>>> for any collection of RDF data (whether or not it's segmented 
>>>>>>>>>> into named
>>>>>>>>>> graphs).
>>>>>>>>> That would be problematic.  "RDF Dataset" is a specifically 
>>>>>>>>> defined term.  "Dataset" we can be loose about (c.f. VoiD) ; 
>>>>>>>>> "RDF Dataset" is stressing the tie to a particular 
>>>>>>>>> definition.  You might as well mix properties and triples if 
>>>>>>>>> you're going to mix things of different "shape".
>>>>>>>> In the telecon, I mentioned on irc the term "bacronym" but what 
>>>>>>>> I meant was "retronym".   These are terms like "cow milk" that 
>>>>>>>> arise once some term ("milk") becomes ambiguous (eg because of 
>>>>>>>> soy milk, almond milk, rice milk, etc).  See
>>>>>>>>
>>>>>>>> I take the "radical proposal" to be the recognition that some 
>>>>>>>> terms are ambiguous and we need to make retronyms to 
>>>>>>>> disambiguate them.
>>>>>>>>
>>>>>>>> Here's a revised proposal:
>>>>>>>>
>>>>>>>>    - We pick terms like "Abstract RDF Graph" (gsnap) and 
>>>>>>>> "Maintained RDF Graph" (gbox) that fit the retronym model.   It 
>>>>>>>> makes it easy, when someone says "graph" or "RDF Graph", to 
>>>>>>>> think/ask, "do you mean abstract or maintained?"     (I don't 
>>>>>>>> find these terms quite as ontologically comfortable as g-snap 
>>>>>>>> and g-box/space/data-source, because it makes them both be 
>>>>>>>> subclasses of "graph", but I think this approach  works better 
>>>>>>>> for the community.)
>>>>>>>>
>>>>>>>>    - We clarify that in all W3C specs to date, "RDF Graph" 
>>>>>>>> means "Abstract RDF Graph"
>>>>>>>>
>>>>>>>>    - Going forward, we avoid using the term "RDF Graph", using 
>>>>>>>> either Abstract Graph or Maintained Graph  (with or without 
>>>>>>>> "RDF" in there).   Or just "graph" when we don't care which kind.
>>>>>>>>
>>>>>>>> I think that much of the confusion around the term "named 
>>>>>>>> graph" comes from a lack of clarity around whether what is 
>>>>>>>> meant is a "named abstract graph" or a "named maintained 
>>>>>>>> graph".   I think the latter is much more common; the 
>>>>>>>> difference doesn't manifest in SPARQL 1.0 because it doesn't 
>>>>>>>> consider the idea of data changing. In my mind, this proposal 
>>>>>>>> is our best chance for being able to coherently keep using the 
>>>>>>>> term "named graph", which seems to be very popular.
>>>>>>>>
>>>>>>>> BTW, I think we might also want to define "Frozen" graph, which 
>>>>>>>> is a maintained graph in the sense that it exists in a 
>>>>>>>> computer's storage, but which is required to never change.    
>>>>>>>> This is, I think, mostly what PROV wants to use.
>>>>>>>>
>>>>>>>>      -- Sandro
>>>>>>>>
>>>
>>
>
>
>
>


-- 

Regards,

Kingsley Idehen	
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Attachments

application/pkcs7-signature attachment: S/MIME Cryptographic Signature
Received on Friday, 24 August 2012 11:30:58 UTC