Re: A rant about the terminology debate from Kingsley Idehen on 2012-08-24 (public-rdf-wg@w3.org from August 2012)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Fri, 24 Aug 2012 14:45:53 -0400
To: public-rdf-wg@w3.org
Message-ID: <5037CBE1.10201@openlinksw.com>
On 8/24/12 12:09 PM, Sandro Hawke wrote:
> On 08/24/2012 07:32 AM, Kingsley Idehen wrote:
>> On 8/24/12 7:10 AM, Sandro Hawke wrote:
>>> On 08/24/2012 05:31 AM, Richard Cyganiak wrote:
>>>> Sandro,
>>>>
>>>> On 24 Aug 2012, at 03:53, Sandro Hawke wrote:
>>>>> People will need some way linguistically to distinguish X from Y
>>>>> People will instinctively say X to clarify they mean Y
>>>>> People rarely mean to only be talking about X, and when they do, 
>>>>> they should put the word Y in there
>>>>> People should be able to use X when they want to clarify they're 
>>>>> talking about Y
>>>> That kind of argument is irrelevant. People talk the way they talk. 
>>>> They won't pay attention to what we write anyways for the most 
>>>> part. We're not in the business of helping them to express themselves.
>>>>
>>>> We are in the business of extending an existing technical 
>>>> specification with clear definitions of a few additional concepts 
>>>> in order to promote consistency between W3C specifications and to 
>>>> promote interoperability between implementations that already use 
>>>> these concepts in some form.
>>>>
>>>> We'll never get anywhere with this discussion if we entertain the 
>>>> crazy notion that we can magically solve all uncertainty and 
>>>> enlighten the RDF community with a Great Global Renaming.
>>>
>>> That's a strawman.  I'm not saying names will magically solve 
>>> anything.  It sounds like our disagreement is a matter of degree. I 
>>> believe our choice of terms will affect the quality & uptake of our 
>>> specs by X%, while you believe it is a smaller amount Y%. Perhaps 
>>> X=30 and Y=10.
>>>
>>> This is complicated by the fact that names interact with mental 
>>> models, so sometimes when we're discussing names, I think it turns 
>>> out to be a proxy for mental models.     And as we change our 
>>> models, our opinions on the best names are likely to change.
>>>
>>> So, let's put this back in the box for now, and hopefully when we've 
>>> got the model solved, we can spend one telecon talking over 
>>> proposals and make a final decision.   (I suppose we should give it 
>>> an ISSUE number, or broaden ISSUE-14 to include this.)
>>>
>>> I updated http://www.w3.org/2011/rdf-wg/wiki/Graph_Terminology and made
>>> http://www.w3.org/2011/rdf-wg/wiki/Graph_Terminology/Options and I'm 
>>> quite happy to let the matter drop for as long as possible.
>>>
>>>       -- Sandro
>>
>> Sandro,
>>
>> How would you map g-box, g-snap, and g-text in formal relational DBMS 
>> terminology? Such a mapping would help many. Basically, mapping to 
>> relations, sets of tuples, and notation.
>>
>
> I'm not really fluent in RDBMS theory terminology.   I do know the 
> terminology database app developers use, though, I think -- the kind 
> of stuff you find in the Oracle or MySQL manuals (talking about 
> "tables" instead of "relations").   In that terminology, I'd say:
>
>   g-box: table (or view)
>   g-snap: dump of a table (or view)
>   g-snap: not something one normally deals with; either:
>       - a state of a table; or
>       - a value which is the set of all the rows in a table.
>
> This is more of an analogy than a real correspondence, since a table 
> row is not the same thing as an RDF triple, in general. (You could 
> make a Subject/Property/Value table, but the data typing of the value 
> wouldn't work right, in general.)
>
>     -- Sandro

Sandro,

Here is a nice article on Relational Model concepts: 
http://www.eng.mu.edu/corlissg/150.07f/ch05.html .

I still believe that mapping your world view to relational model 
concepts will aid bringing this matter to rest.


Kingsley
>
>> Kingsley
>>>
>>>> Best,
>>>> Richard
>>>>
>>>>
>>>> On 24 Aug 2012, at 03:53, Sandro Hawke wrote:
>>>>
>>>>> On 08/23/2012 11:22 AM, Richard Cyganiak wrote:
>>>>>> On 23 Aug 2012, at 16:00, Sandro Hawke wrote:
>>>>>>>> You proposed to redefine "graph" by splitting it into two 
>>>>>>>> separate concepts, a mutable and an immutable one.
>>>>>>>>
>>>>>>>> I propose to instead redefine "named graph" in the same way, by 
>>>>>>>> splitting it into two separate concepts, a mutable and 
>>>>>>>> immutable one.
>>>>>>> You lost me here, sorry.   What's the use case for an immutable 
>>>>>>> named graph?
>>>>>> I guess I should have said "abstract named graph", sorry if that 
>>>>>> caused confusion. Abstract IRI-graph-pairs. The thing that SPARQL 
>>>>>> queries operate over.
>>>>>>
>>>>>>> And it sounds like you're suggesting "mutable named graph" as 
>>>>>>> the official term for g-box.  Is that right?
>>>>>> Almost. My definition of "mutable named graph" would be:
>>>>>>
>>>>>> "A *mutable named graph* is a resource, denoted by an IRI, that 
>>>>>> has a mutable association with an (abstract, immutable) RDF 
>>>>>> graph. The RDF graph is also known as the *state* of the mutable 
>>>>>> named graph."
>>>>>>
>>>>>> The key points are:
>>>>>>
>>>>>> 1) we insist that it is a resource, so the kind of thing denoted 
>>>>>> by IRIs
>>>>>> 2) we insist that it is actually denoted by some IRI
>>>>>> 3) it essentially has a mutable slot that contains an RDF graph
>>>>>>
>>>>>> This means it can cover both the terms "RDF space/g-box" and the 
>>>>>> term "(name, slot) pair" from the diagram in [1].
>>>>>>
>>>>>> I repeat my assertion that there is no need to ever talk about 
>>>>>> unnamed g-boxes.
>>>>> Yeah, this makes sense, but it's not my first or second choice in 
>>>>> naming proposals.   Probably wouldn't help to go into why/why not 
>>>>> at this point.
>>>>>
>>>>>>>>> I think the key elements are : (1) we stop using "RDF Graph" 
>>>>>>>>> as the
>>>>>>>>> canonical, precise term for a g-snap;
>>>>>>>> I disagree; "RDF graph" is a perfectly fine term.
>>>>>>> I wish.   I can live with it, but I think it's hardly "fine". 
>>>>>>> People use it wrong all the time; they say "RDF graph" and mean 
>>>>>>> a mutable and/or distinct set of RDF triples.
>>>>>> I think by actually defining proper terms for these other things, 
>>>>>> and by clarifying that "graph" can mean "any of the above", we 
>>>>>> make a solid step towards improving the situation.
>>>>> I think you're saying "RDF graph"==g-snap; "graph"=g-snap/or/g-box.
>>>>>
>>>>> I have a problem with this.  I think people will need some way 
>>>>> linguistically to distinguish "graph" in the RDF world from 
>>>>> "graph" in the wider world, and the natural way to do that is to 
>>>>> add the modifier, "RDF".   So people will instinctively say "RDF 
>>>>> graph" to clarify they mean "graph" in the RDF sense (not a bar 
>>>>> chart or something).    But with your proposal, they've now 
>>>>> accidentally changed to talking about g-snaps.
>>>>>
>>>>> I think people rarely mean to only be talking about g-snaps, and 
>>>>> when they do, they can/should put the word "abstract" in there.   
>>>>> I also think the presence or absence of the modifier "RDF" 
>>>>> shouldn't affect the semantics of the term -- people should be 
>>>>> able to use it when they want to clarify they're talking about 
>>>>> RDF, without it otherwise affecting the meaning.
>>>>>
>>>>>    -- Sandro
>>>>>
>>>>>> Best,
>>>>>> Richard
>>>>>>
>>>>>> [1] http://www.w3.org/2012/08/RDFNG.html#fig1
>>>>>>
>>>>>>
>>>>>>
>>>>>>> I'm not saying we have to solve this problem, or that we can, 
>>>>>>> but I think it would be helpful if we could and I think this 
>>>>>>> proposal is our best bet.
>>>>>>>
>>>>>>>       -- Sandro
>>>>>>>
>>>>>>>> But we can stress that "RDF graph" is an abstract, unnamed, 
>>>>>>>> immutable graph, and that when we talk about "graphs" in 
>>>>>>>> general then we may sometimes mean named ones that may or may 
>>>>>>>> not be mutable.
>>>>>>>>
>>>>>>>>> (2) we pick terms for g-box and g-snap that convey the idea 
>>>>>>>>> that they are two different kinds of "graphs";
>>>>>>>> I disagree; I believe that there is never any need to talk 
>>>>>>>> about *unnamed* g-boxes; all the g-boxes we want to talk about 
>>>>>>>> are named. Therefore, a term like "mutable named graph" is 
>>>>>>>> sufficient to say all that needs to be said about g-boxes.
>>>>>>>>
>>>>>>>>> (3) we use "graph" if/when we don't mind being ambiguous about 
>>>>>>>>> g-box/g-snap.
>>>>>>>> I'd rephrase that: We can use "graph" if/when we don't mind 
>>>>>>>> being ambiguous about 
>>>>>>>> g-snap/abstract-named-graph/mutable-named-graph. For example 
>>>>>>>> when we say, "SPARQL Update can be used to copy data from one 
>>>>>>>> graph to another". In that case we mean mutable-named-graph.
>>>>>>>>
>>>>>>>>> On your details....  let me start with:  to you, can you have 
>>>>>>>>> a named
>>>>>>>>> graph that's not in a dataset (or graph store)?
>>>>>>>> As defined in SPARQL (named graph == IRI-graph-pair), no.
>>>>>>>>
>>>>>>>> But if we allow a term such as "mutable named graph", then yes. 
>>>>>>>> A Turtle document on the Web is a "mutable named graph", in 
>>>>>>>> that sense. It doesn't have to be in any particular dataset. 
>>>>>>>> Well, it's in the Web, and for me it makes sense to speak of 
>>>>>>>> the entire web as a "mutable RDF dataset".
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Richard
>>>>>>>>
>>>>>>>>
>>>>>>>>> I don't usually hear the term used outside SPARQL, so I don't 
>>>>>>>>> have much of an ear for that usage.
>>>>>>>>       -- Sandro
>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Richard
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 22 Aug 2012, at 18:07, Sandro Hawke wrote:
>>>>>>>>>
>>>>>>>>>> On 08/21/2012 03:33 AM, Andy Seaborne wrote:
>>>>>>>>>>> On 20/08/12 16:30, Sandro Hawke wrote:
>>>>>>>>>>>> If it wouldn't cause SPARQL too many problems, I'd suggest 
>>>>>>>>>>>> we should do
>>>>>>>>>>>> the same with dataset, and even allow a dataset to be a 
>>>>>>>>>>>> kind of graph, I
>>>>>>>>>>>> think, so that the world at large can use the word term 
>>>>>>>>>>>> "RDF dataset"
>>>>>>>>>>>> for any collection of RDF data (whether or not it's 
>>>>>>>>>>>> segmented into named
>>>>>>>>>>>> graphs).
>>>>>>>>>>> That would be problematic.  "RDF Dataset" is a specifically 
>>>>>>>>>>> defined term.  "Dataset" we can be loose about (c.f. VoiD) ; 
>>>>>>>>>>> "RDF Dataset" is stressing the tie to a particular 
>>>>>>>>>>> definition. You might as well mix properties and triples if 
>>>>>>>>>>> you're going to mix things of different "shape".
>>>>>>>>>> In the telecon, I mentioned on irc the term "bacronym" but 
>>>>>>>>>> what I meant was "retronym". These are terms like "cow milk" 
>>>>>>>>>> that arise once some term ("milk") becomes ambiguous (eg 
>>>>>>>>>> because of soy milk, almond milk, rice milk, etc).  See
>>>>>>>>>>
>>>>>>>>>> I take the "radical proposal" to be the recognition that some 
>>>>>>>>>> terms are ambiguous and we need to make retronyms to 
>>>>>>>>>> disambiguate them.
>>>>>>>>>>
>>>>>>>>>> Here's a revised proposal:
>>>>>>>>>>
>>>>>>>>>>    - We pick terms like "Abstract RDF Graph" (gsnap) and 
>>>>>>>>>> "Maintained RDF Graph" (gbox) that fit the retronym model.   
>>>>>>>>>> It makes it easy, when someone says "graph" or "RDF Graph", 
>>>>>>>>>> to think/ask, "do you mean abstract or maintained?"     (I 
>>>>>>>>>> don't find these terms quite as ontologically comfortable as 
>>>>>>>>>> g-snap and g-box/space/data-source, because it makes them 
>>>>>>>>>> both be subclasses of "graph", but I think this approach  
>>>>>>>>>> works better for the community.)
>>>>>>>>>>
>>>>>>>>>>    - We clarify that in all W3C specs to date, "RDF Graph" 
>>>>>>>>>> means "Abstract RDF Graph"
>>>>>>>>>>
>>>>>>>>>>    - Going forward, we avoid using the term "RDF Graph", 
>>>>>>>>>> using either Abstract Graph or Maintained Graph  (with or 
>>>>>>>>>> without "RDF" in there).   Or just "graph" when we don't care 
>>>>>>>>>> which kind.
>>>>>>>>>>
>>>>>>>>>> I think that much of the confusion around the term "named 
>>>>>>>>>> graph" comes from a lack of clarity around whether what is 
>>>>>>>>>> meant is a "named abstract graph" or a "named maintained 
>>>>>>>>>> graph". I think the latter is much more common; the 
>>>>>>>>>> difference doesn't manifest in SPARQL 1.0 because it doesn't 
>>>>>>>>>> consider the idea of data changing. In my mind, this proposal 
>>>>>>>>>> is our best chance for being able to coherently keep using 
>>>>>>>>>> the term "named graph", which seems to be very popular.
>>>>>>>>>>
>>>>>>>>>> BTW, I think we might also want to define "Frozen" graph, 
>>>>>>>>>> which is a maintained graph in the sense that it exists in a 
>>>>>>>>>> computer's storage, but which is required to never change.    
>>>>>>>>>> This is, I think, mostly what PROV wants to use.
>>>>>>>>>>
>>>>>>>>>>      -- Sandro
>>>>>>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>>
>>
>>
>
>
>


-- 

Regards,

Kingsley Idehen	
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Attachments

application/pkcs7-signature attachment: S/MIME Cryptographic Signature
Received on Friday, 24 August 2012 18:44:23 UTC