Re: naming dataset syntax from Ivan Herman on 2012-09-26 (public-rdf-wg@w3.org from September 2012)

From: Ivan Herman <ivan@w3.org>
Date: Wed, 26 Sep 2012 19:46:13 -0400
To: Arnaud Le Hors <lehors@us.ibm.com>
Cc: W3C RDF WG <public-rdf-wg@w3.org>
Message-Id: <267971AF-F3D7-48B4-8BEA-CF707F386ACB@w3.org>
On Sep 26, 2012, at 15:12 , Arnaud Le Hors wrote:

> I realize this group is more interested in technical purity than marketing and that from a technical point of view using two different formats and names can be totally justified but I'd like to ask everyone to think about the bigger picture here. 
> 
> RDF is already plagued with the image of being an overly complicated technology and this is hindering its uptake in the industry. We really don't want to make things worse by introducing a bunch of new formats and names. 
> 
> In a private email Andy wrote to me: 
> 
> > A collection of graphs isn't itself a graph.
> > 
> > A syntax for a collection of graphs isn't a syntax for a graph.
> 
> This certainly makes perfect sense and is very simply put. As an engineer I can certainly appreciate the difference but as someone interested in helping adoption of RDF in the industry I just don't think this is worth introducing a whole new format and name. 
> 
> Turtle is providing us with something everyone can understand (unlike RDF/XML) and the name has been out there for a while now. We should try to build on that rather than start confusing things (again) with the introduction of multiple formats. 
> 
> Could we not simply have two different versions of Turtle with a way for programs to differentiate the two so that we can still only talk about Turtle? 

I am absolutely with you on this, Arnaud. I would vastly prefer to refer basically to one language, called Turtle. For pragmatic reasons (deployed code, etc) we would have to have the basic Turtle, ie one without datasets, and another variant that include datasets (and we already know that we may have to have different media types), but they should not be named and presented as fundamentally different. I know that C vs. C++ is a dangerous analogy (C++ being vastly more complex than C) but, nevertheless, I would prefer to talk about, say, Turtle++ rather than TriG. (We can have nice discussions on how to name that stuff, I think somebody came up with SuperTurtle:-)

Ivan

> 
> Regards.
> --
> Arnaud  Le Hors - Software Standards Architect - IBM Software Group
> 
> 
> Sandro Hawke <sandro@w3.org> wrote on 09/26/2012 11:18:34 AM:
> 
> > From: Sandro Hawke <sandro@w3.org> 
> > To: David Wood <david@3roundstones.com>, 
> > Cc: Arnaud Le Hors/Cupertino/IBM@IBMUS, W3C RDF WG <public-rdf-wg@w3.org> 
> > Date: 09/26/2012 11:19 AM 
> > Subject: naming dataset syntax 
> > 
> > On 09/26/2012 01:58 PM, David Wood wrote: 
> > Hi Arnaud, 
> > 
> > We agreed quite early (Feb 2011) to "use http://www.w3.org/2010/01/Turtle/
> > as the starting point for the Turtle work" [1] and in April 2011 to 
> > limit syntactic sugar additions to Turtle [2]. 
> > 
> > IIRC, we had substantial conversations regarding the desirability of
> > turning Turtle into a quad language, but we decided (without 
> > resolution) not to do that because: 
> > - Turtle is widely fielded already 
> > - We wished to minimize disruption, as per our charter 
> > - Issues around datasets/quads were (and are) less agreed upon 
> > 
> > 
> > Yes, we agreed to get Turtle out the door as a language for Triples.
> > 
> > So, now, what do we call a language that's like Turtle except it can
> > also include datasets (that is, the triples can be segmented into 
> > named sections)?
> > 
> > Frankly I expect this language to supplant Turtle as soon as it is 
> > well supported, as long as it doesn't do anything to exclude simple 
> > usage.   I think the kind of people who use Turtle (or RDF) are the 
> > kind of people who will want to segment and manage their data.   But
> > (1) I could be wrong, and (2) it may be a long time before it is 
> > well-supported, given how confused we are about it within the WG.
> > 
> > So, myself, I'm split about what to call it.  Compared to me, 
> > however, the WG, tends to lean more toward existing users and 
> > experts, over new users and non-experts, so I expect the WG to just 
> > go with "trig" unless someone makes a strong case for something else.
> > 
> > (In my prototype coding, I called the hypothetical trig-like 
> > language "mugl", for MultiGraphLanguage.    If we start from a blank
> > slate, we can probably do better than mugl or trig.)
> > 
> >        -- Sandro
> > 
> > 
> 
> > Regards,
> > Dave
> > 
> > [1] http://www.w3.org/2011/rdf-wg/meeting/2011-02-23#resolution_1 
> > [2] http://www.w3.org/2011/rdf-wg/track/issues/34
> 
> > 
> > On Sep 26, 2012, at 12:42, Arnaud Le Hors <lehors@us.ibm.com> wrote: 
> > 
> > Hi Sandro, 
> > 
> > This discussion had already started when I joined the WG and as I 
> > caught it midstream I thought it was about extending Turtle. I've 
> > since then realized that this wasn't the intent and everybody seems 
> > to agree with that but I must admit that I still don't know why. 
> > Could you please explain or point me to some reference I could read 
> > to catch up on that? 
> > 
> > I have to say that the proliferation of formats for RDF makes me a 
> > bit nervous. This doesn't go along with making RDF simpler for the 
> > masses/industry and facilitating adoption. 
> > 
> > Thanks.
> > --
> > Arnaud  Le Hors - Software Standards Architect - IBM Software Group
> > 
> > 
> > Sandro Hawke <sandro@w3.org> wrote on 09/25/2012 04:14:25 PM:
> > 
> > > From: Sandro Hawke <sandro@w3.org> 
> > > To: W3C RDF WG <public-rdf-wg@w3.org>, 
> > > Date: 09/25/2012 04:14 PM 
> > > Subject: Dataset Syntax - checking for consensus 
> > > 
> > > I'm not sure how much progress we'll be able to make on dataset 
> > > semantics tomorrow, so I thought I'd draft some proposals on dataset 
> > > syntax.   The chairs can put this on the agenda is they like (but it's 
> > > too short notice for these decisions to be binding yet).  I'm thinking 
> > > it would be useful to see how close we are to agreement on these issues.
> > > 
> > > If you followup with votes, please use -1 for Formal Objection, 0 for 
> > > abstain, +1 for approve.   Numbers in between are fine, too.
> > > 
> > > PROPOSED: We will produce a W3C Recommendation for a dataset syntax, 
> > > similar to TriG and to SPARQL's named graph syntax.
> > > 
> > > PROPOSED: We'll request a media-type for this syntax which is different 
> > > from the media-type for Turtle.  (That is, we will not consider this 
> > > language to supplant Turtle and take over the name, becoming the new 
> > > "Turtle", as was once proposed.)
> > > 
> > > PROPOSED: Our dataset syntax will allow for the expression of empty 
> > > named graphs, whatever their semantics might be (to be decided). The 
> > > syntax is an empty curly-braces expression, as in "<g> { }".
> > > 
> > > PROPOSED: Our dataset syntax will have some standard mechanism (to be 
> > > determined within the next few weeks) through which a Dataset 
> > > serialization can include some RDF data about the Dataset (that is, some 
> > > metadata in the form of an RDF graph).
> > > 
> > > 
> > > Below, there are groups of proposals which are alternative solutions to 
> > > a design issue.   If you approve of more than one of the alternatives, 
> > > please vote "+2" for your favorite.
> > > 
> > > * Name of the dataset syntax
> > > 
> > > PROPOSED: We will call our recommended dataset syntax "trig", 
> > > capitalized to Trig as needed.
> > > PROPOSED: We will call our recommended dataset syntax "TriG", but 
> > > informally and in the media type, "trig".
> > > PROPOSED: We will call our recommended dataset syntax "TriG", and use 
> > > that capitalization everywhere.
> > > 
> > > * Use of equals sign, like <g> = { <s> <p> <o> } .  This is not in 
> > > SPARQL but is in traditional TriG, for compatibility with N3.
> > > 
> > > PROPOSED: In our dataset syntax, a "=" MAY appear between the name and 
> > > the graph.
> > > PROPOSED: In our dataset syntax, a "=" MUST appear between the name and 
> > > the graph.
> > > PROPOSED: In our dataset syntax, a "=" MUST NOT appear between the name 
> > > and the graph.
> > > 
> > > * Use of the "graph" keyword, which MUST be used in SPARQL and MUST NOT 
> > > be used in traditional TriG.
> > > 
> > > PROPOSED: In our dataset syntax, the case-insensitive keyword "graph" 
> > > MAY appear before the name, in a name-graph pair.
> > > PROPOSED: In our dataset syntax, the case-insensitive keyword "graph" 
> > > MUST appear before the name, in a name-graph pair.
> > > PROPOSED: In our dataset syntax, the case-insensitive keyword "graph" 
> > > MUST NOT appear before the name, in a name-graph pair.
> > > 
> > > * Use of curly braces { <a> <b> <c> } around the default graphs.   They 
> > > MUST be used in traditional TriG, and MUST NOT be used in SPARQL.
> > > 
> > > PROPOSED: In our dataset syntax, triples of the dataset's default graph 
> > > MAY be surrounded by curly braces.
> > > PROPOSED: In our dataset syntax, triples of the dataset's default graph 
> > > MUST be surrounded by curly braces.
> > > PROPOSED: In our dataset syntax, triples of the dataset's default graph 
> > > MUST NOT be surrounded by curly braces.
> > > 
> > > * Some designs for carrying for metadata
> > > 
> > > PROPOSED: In our dataset syntax, we'll say that metadata goes in the 
> > > default graph
> > > PROPOSED: In our dataset syntax, we'll say that the default graph goes 
> > > inside curly braces and the metadata goes outside curly braces
> > > PROPOSED: In our dataset syntax, we'll say that metadata goes inside a 
> > > set curly braces after a keyword "meta".
> > > PROPOSED: In out dataset syntax, we'll have a keyword "meta" followed by 
> > > "default" or the name of a named graph, to indicate to readers where the 
> > > metadata is.
> > > 
> > >


----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
FOAF: http://www.ivan-herman.net/foaf.rdf
Attachments

application/pkcs7-signature attachment: smime.p7s
Received on Wednesday, 26 September 2012 23:46:44 UTC