Re: A rant about the terminology debate from Sandro Hawke on 2012-08-24 (public-rdf-wg@w3.org from August 2012)

From: Sandro Hawke <sandro@w3.org>
Date: Fri, 24 Aug 2012 12:09:30 -0400
To: Kingsley Idehen <kidehen@openlinksw.com>
CC: public-rdf-wg@w3.org
Message-ID: <5037A73A.3040607@w3.org>
On 08/24/2012 07:32 AM, Kingsley Idehen wrote:
> On 8/24/12 7:10 AM, Sandro Hawke wrote:
>> On 08/24/2012 05:31 AM, Richard Cyganiak wrote:
>>> Sandro,
>>>
>>> On 24 Aug 2012, at 03:53, Sandro Hawke wrote:
>>>> People will need some way linguistically to distinguish X from Y
>>>> People will instinctively say X to clarify they mean Y
>>>> People rarely mean to only be talking about X, and when they do, 
>>>> they should put the word Y in there
>>>> People should be able to use X when they want to clarify they're 
>>>> talking about Y
>>> That kind of argument is irrelevant. People talk the way they talk. 
>>> They won't pay attention to what we write anyways for the most part. 
>>> We're not in the business of helping them to express themselves.
>>>
>>> We are in the business of extending an existing technical 
>>> specification with clear definitions of a few additional concepts in 
>>> order to promote consistency between W3C specifications and to 
>>> promote interoperability between implementations that already use 
>>> these concepts in some form.
>>>
>>> We'll never get anywhere with this discussion if we entertain the 
>>> crazy notion that we can magically solve all uncertainty and 
>>> enlighten the RDF community with a Great Global Renaming.
>>
>> That's a strawman.  I'm not saying names will magically solve 
>> anything.  It sounds like our disagreement is a matter of degree. I 
>> believe our choice of terms will affect the quality & uptake of our 
>> specs by X%, while you believe it is a smaller amount Y%. Perhaps 
>> X=30 and Y=10.
>>
>> This is complicated by the fact that names interact with mental 
>> models, so sometimes when we're discussing names, I think it turns 
>> out to be a proxy for mental models.     And as we change our models, 
>> our opinions on the best names are likely to change.
>>
>> So, let's put this back in the box for now, and hopefully when we've 
>> got the model solved, we can spend one telecon talking over proposals 
>> and make a final decision.   (I suppose we should give it an ISSUE 
>> number, or broaden ISSUE-14 to include this.)
>>
>> I updated http://www.w3.org/2011/rdf-wg/wiki/Graph_Terminology and made
>> http://www.w3.org/2011/rdf-wg/wiki/Graph_Terminology/Options and I'm 
>> quite happy to let the matter drop for as long as possible.
>>
>>       -- Sandro
>
> Sandro,
>
> How would you map g-box, g-snap, and g-text in formal relational DBMS 
> terminology? Such a mapping would help many. Basically, mapping to 
> relations, sets of tuples, and notation.
>

I'm not really fluent in RDBMS theory terminology.   I do know the 
terminology database app developers use, though, I think -- the kind of 
stuff you find in the Oracle or MySQL manuals (talking about "tables" 
instead of "relations").   In that terminology, I'd say:

   g-box: table (or view)
   g-snap: dump of a table (or view)
   g-snap: not something one normally deals with; either:
       - a state of a table; or
       - a value which is the set of all the rows in a table.

This is more of an analogy than a real correspondence, since a table row 
is not the same thing as an RDF triple, in general.    (You could make a 
Subject/Property/Value table, but the data typing of the value wouldn't 
work right, in general.)

     -- Sandro

> Kingsley
>>
>>> Best,
>>> Richard
>>>
>>>
>>> On 24 Aug 2012, at 03:53, Sandro Hawke wrote:
>>>
>>>> On 08/23/2012 11:22 AM, Richard Cyganiak wrote:
>>>>> On 23 Aug 2012, at 16:00, Sandro Hawke wrote:
>>>>>>> You proposed to redefine "graph" by splitting it into two 
>>>>>>> separate concepts, a mutable and an immutable one.
>>>>>>>
>>>>>>> I propose to instead redefine "named graph" in the same way, by 
>>>>>>> splitting it into two separate concepts, a mutable and immutable 
>>>>>>> one.
>>>>>> You lost me here, sorry.   What's the use case for an immutable 
>>>>>> named graph?
>>>>> I guess I should have said "abstract named graph", sorry if that 
>>>>> caused confusion. Abstract IRI-graph-pairs. The thing that SPARQL 
>>>>> queries operate over.
>>>>>
>>>>>> And it sounds like you're suggesting "mutable named graph" as the 
>>>>>> official term for g-box.  Is that right?
>>>>> Almost. My definition of "mutable named graph" would be:
>>>>>
>>>>> "A *mutable named graph* is a resource, denoted by an IRI, that 
>>>>> has a mutable association with an (abstract, immutable) RDF graph. 
>>>>> The RDF graph is also known as the *state* of the mutable named 
>>>>> graph."
>>>>>
>>>>> The key points are:
>>>>>
>>>>> 1) we insist that it is a resource, so the kind of thing denoted 
>>>>> by IRIs
>>>>> 2) we insist that it is actually denoted by some IRI
>>>>> 3) it essentially has a mutable slot that contains an RDF graph
>>>>>
>>>>> This means it can cover both the terms "RDF space/g-box" and the 
>>>>> term "(name, slot) pair" from the diagram in [1].
>>>>>
>>>>> I repeat my assertion that there is no need to ever talk about 
>>>>> unnamed g-boxes.
>>>> Yeah, this makes sense, but it's not my first or second choice in 
>>>> naming proposals.   Probably wouldn't help to go into why/why not 
>>>> at this point.
>>>>
>>>>>>>> I think the key elements are : (1) we stop using "RDF Graph" as 
>>>>>>>> the
>>>>>>>> canonical, precise term for a g-snap;
>>>>>>> I disagree; "RDF graph" is a perfectly fine term.
>>>>>> I wish.   I can live with it, but I think it's hardly "fine". 
>>>>>> People use it wrong all the time; they say "RDF graph" and mean a 
>>>>>> mutable and/or distinct set of RDF triples.
>>>>> I think by actually defining proper terms for these other things, 
>>>>> and by clarifying that "graph" can mean "any of the above", we 
>>>>> make a solid step towards improving the situation.
>>>> I think you're saying "RDF graph"==g-snap; "graph"=g-snap/or/g-box.
>>>>
>>>> I have a problem with this.  I think people will need some way 
>>>> linguistically to distinguish "graph" in the RDF world from "graph" 
>>>> in the wider world, and the natural way to do that is to add the 
>>>> modifier, "RDF".   So people will instinctively say "RDF graph" to 
>>>> clarify they mean "graph" in the RDF sense (not a bar chart or 
>>>> something).    But with your proposal, they've now accidentally 
>>>> changed to talking about g-snaps.
>>>>
>>>> I think people rarely mean to only be talking about g-snaps, and 
>>>> when they do, they can/should put the word "abstract" in there.   I 
>>>> also think the presence or absence of the modifier "RDF" shouldn't 
>>>> affect the semantics of the term -- people should be able to use it 
>>>> when they want to clarify they're talking about RDF, without it 
>>>> otherwise affecting the meaning.
>>>>
>>>>    -- Sandro
>>>>
>>>>> Best,
>>>>> Richard
>>>>>
>>>>> [1] http://www.w3.org/2012/08/RDFNG.html#fig1
>>>>>
>>>>>
>>>>>
>>>>>> I'm not saying we have to solve this problem, or that we can, but 
>>>>>> I think it would be helpful if we could and I think this proposal 
>>>>>> is our best bet.
>>>>>>
>>>>>>       -- Sandro
>>>>>>
>>>>>>> But we can stress that "RDF graph" is an abstract, unnamed, 
>>>>>>> immutable graph, and that when we talk about "graphs" in general 
>>>>>>> then we may sometimes mean named ones that may or may not be 
>>>>>>> mutable.
>>>>>>>
>>>>>>>> (2) we pick terms for g-box and g-snap that convey the idea 
>>>>>>>> that they are two different kinds of "graphs";
>>>>>>> I disagree; I believe that there is never any need to talk about 
>>>>>>> *unnamed* g-boxes; all the g-boxes we want to talk about are 
>>>>>>> named. Therefore, a term like "mutable named graph" is 
>>>>>>> sufficient to say all that needs to be said about g-boxes.
>>>>>>>
>>>>>>>> (3) we use "graph" if/when we don't mind being ambiguous about 
>>>>>>>> g-box/g-snap.
>>>>>>> I'd rephrase that: We can use "graph" if/when we don't mind 
>>>>>>> being ambiguous about 
>>>>>>> g-snap/abstract-named-graph/mutable-named-graph. For example 
>>>>>>> when we say, "SPARQL Update can be used to copy data from one 
>>>>>>> graph to another". In that case we mean mutable-named-graph.
>>>>>>>
>>>>>>>> On your details....  let me start with:  to you, can you have a 
>>>>>>>> named
>>>>>>>> graph that's not in a dataset (or graph store)?
>>>>>>> As defined in SPARQL (named graph == IRI-graph-pair), no.
>>>>>>>
>>>>>>> But if we allow a term such as "mutable named graph", then yes. 
>>>>>>> A Turtle document on the Web is a "mutable named graph", in that 
>>>>>>> sense. It doesn't have to be in any particular dataset. Well, 
>>>>>>> it's in the Web, and for me it makes sense to speak of the 
>>>>>>> entire web as a "mutable RDF dataset".
>>>>>>>
>>>>>>> Best,
>>>>>>> Richard
>>>>>>>
>>>>>>>
>>>>>>>> I don't usually hear the term used outside SPARQL, so I don't 
>>>>>>>> have much of an ear for that usage.
>>>>>>>       -- Sandro
>>>>>>>
>>>>>>>> Best,
>>>>>>>> Richard
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 22 Aug 2012, at 18:07, Sandro Hawke wrote:
>>>>>>>>
>>>>>>>>> On 08/21/2012 03:33 AM, Andy Seaborne wrote:
>>>>>>>>>> On 20/08/12 16:30, Sandro Hawke wrote:
>>>>>>>>>>> If it wouldn't cause SPARQL too many problems, I'd suggest 
>>>>>>>>>>> we should do
>>>>>>>>>>> the same with dataset, and even allow a dataset to be a kind 
>>>>>>>>>>> of graph, I
>>>>>>>>>>> think, so that the world at large can use the word term "RDF 
>>>>>>>>>>> dataset"
>>>>>>>>>>> for any collection of RDF data (whether or not it's 
>>>>>>>>>>> segmented into named
>>>>>>>>>>> graphs).
>>>>>>>>>> That would be problematic.  "RDF Dataset" is a specifically 
>>>>>>>>>> defined term.  "Dataset" we can be loose about (c.f. VoiD) ; 
>>>>>>>>>> "RDF Dataset" is stressing the tie to a particular 
>>>>>>>>>> definition. You might as well mix properties and triples if 
>>>>>>>>>> you're going to mix things of different "shape".
>>>>>>>>> In the telecon, I mentioned on irc the term "bacronym" but 
>>>>>>>>> what I meant was "retronym". These are terms like "cow milk" 
>>>>>>>>> that arise once some term ("milk") becomes ambiguous (eg 
>>>>>>>>> because of soy milk, almond milk, rice milk, etc).  See
>>>>>>>>>
>>>>>>>>> I take the "radical proposal" to be the recognition that some 
>>>>>>>>> terms are ambiguous and we need to make retronyms to 
>>>>>>>>> disambiguate them.
>>>>>>>>>
>>>>>>>>> Here's a revised proposal:
>>>>>>>>>
>>>>>>>>>    - We pick terms like "Abstract RDF Graph" (gsnap) and 
>>>>>>>>> "Maintained RDF Graph" (gbox) that fit the retronym model.   
>>>>>>>>> It makes it easy, when someone says "graph" or "RDF Graph", to 
>>>>>>>>> think/ask, "do you mean abstract or maintained?"     (I don't 
>>>>>>>>> find these terms quite as ontologically comfortable as g-snap 
>>>>>>>>> and g-box/space/data-source, because it makes them both be 
>>>>>>>>> subclasses of "graph", but I think this approach  works better 
>>>>>>>>> for the community.)
>>>>>>>>>
>>>>>>>>>    - We clarify that in all W3C specs to date, "RDF Graph" 
>>>>>>>>> means "Abstract RDF Graph"
>>>>>>>>>
>>>>>>>>>    - Going forward, we avoid using the term "RDF Graph", using 
>>>>>>>>> either Abstract Graph or Maintained Graph  (with or without 
>>>>>>>>> "RDF" in there).   Or just "graph" when we don't care which kind.
>>>>>>>>>
>>>>>>>>> I think that much of the confusion around the term "named 
>>>>>>>>> graph" comes from a lack of clarity around whether what is 
>>>>>>>>> meant is a "named abstract graph" or a "named maintained 
>>>>>>>>> graph".   I think the latter is much more common; the 
>>>>>>>>> difference doesn't manifest in SPARQL 1.0 because it doesn't 
>>>>>>>>> consider the idea of data changing. In my mind, this proposal 
>>>>>>>>> is our best chance for being able to coherently keep using the 
>>>>>>>>> term "named graph", which seems to be very popular.
>>>>>>>>>
>>>>>>>>> BTW, I think we might also want to define "Frozen" graph, 
>>>>>>>>> which is a maintained graph in the sense that it exists in a 
>>>>>>>>> computer's storage, but which is required to never change.    
>>>>>>>>> This is, I think, mostly what PROV wants to use.
>>>>>>>>>
>>>>>>>>>      -- Sandro
>>>>>>>>>
>>>>
>>>
>>
>>
>>
>>
>
>
Received on Friday, 24 August 2012 16:10:30 UTC