Re: naming dataset syntax from Arnaud Le Hors on 2012-09-26 (public-rdf-wg@w3.org from September 2012)

From: Arnaud Le Hors <lehors@us.ibm.com>
Date: Wed, 26 Sep 2012 14:13:25 -0700
To: David Wood <david@3roundstones.com>
Cc: W3C RDF WG <public-rdf-wg@w3.org>
Message-ID: <OFEEB54A9D.1AADC511-ON88257A85.006E0151-88257A85.00749708@us.ibm.com>
I hear you David, and I'm not saying we should try and do everything with 
one format either but we should try and find the right balance. The 
current trend of creating a new format for every use case worries me. Even 
if you want to see it as a feature rather than a bug there is something to 
be said about feature bloat. :-)

I think we could do without RDF/XML and hope it eventually goes away once 
we have Turtle. I think it has done more harm than good.
I think there is a good reason for having RDFa and JSON-LD which are very 
different from Turtle. I can't say the same about TriG vs Turtle.

When users asked where to start it's never the best to have to answer "it 
depends" or something equivalent.
--
Arnaud  Le Hors - Software Standards Architect - IBM Software Group




From:   David Wood <david@3roundstones.com>
To:     Arnaud Le Hors/Cupertino/IBM@IBMUS, 
Cc:     W3C RDF WG <public-rdf-wg@w3.org>
Date:   09/26/2012 12:22 PM
Subject:        Re: naming dataset syntax



Hi Arnaud,

I appreciate your need for marketing simplicity.  However, please consider 
this:

RDF used to have one standard format (RDF/XML) which was, as you say, 
overly complicated for many potential users.  Now we have two standard 
formats (RDF/XML and RDFa).  Those serve very different communities 
(enterprise XML developers and some Web developers).  We are now in the 
process of defining either two additional standard formats (Turtle and 
JSON-LD) or three (if we add TriG).  Again, the potential users of those 
formats are different, but in each case we can parse the formats as RDF.

To my mind, that is a feature, not a bug.  We do not need to explain each 
format to all users.  Instead, we need to figure out which kind of user is 
in front of us and tell them about the format that most closely suits 
their needs.

Regards,
Dave




On Sep 26, 2012, at 15:12, Arnaud Le Hors <lehors@us.ibm.com> wrote:

I realize this group is more interested in technical purity than marketing 
and that from a technical point of view using two different formats and 
names can be totally justified but I'd like to ask everyone to think about 
the bigger picture here. 

RDF is already plagued with the image of being an overly complicated 
technology and this is hindering its uptake in the industry. We really 
don't want to make things worse by introducing a bunch of new formats and 
names. 

In a private email Andy wrote to me: 

> A collection of graphs isn't itself a graph.
> 
> A syntax for a collection of graphs isn't a syntax for a graph.

This certainly makes perfect sense and is very simply put. As an engineer 
I can certainly appreciate the difference but as someone interested in 
helping adoption of RDF in the industry I just don't think this is worth 
introducing a whole new format and name. 

Turtle is providing us with something everyone can understand (unlike 
RDF/XML) and the name has been out there for a while now. We should try to 
build on that rather than start confusing things (again) with the 
introduction of multiple formats. 

Could we not simply have two different versions of Turtle with a way for 
programs to differentiate the two so that we can still only talk about 
Turtle? 

Regards.
--
Arnaud  Le Hors - Software Standards Architect - IBM Software Group


Sandro Hawke <sandro@w3.org> wrote on 09/26/2012 11:18:34 AM:

> From: Sandro Hawke <sandro@w3.org> 
> To: David Wood <david@3roundstones.com>, 
> Cc: Arnaud Le Hors/Cupertino/IBM@IBMUS, W3C RDF WG <public-rdf-wg@w3.org
> 
> Date: 09/26/2012 11:19 AM 
> Subject: naming dataset syntax 
> 
> On 09/26/2012 01:58 PM, David Wood wrote: 
> Hi Arnaud, 
> 
> We agreed quite early (Feb 2011) to "use 
http://www.w3.org/2010/01/Turtle/
> as the starting point for the Turtle work" [1] and in April 2011 to 
> limit syntactic sugar additions to Turtle [2]. 
> 
> IIRC, we had substantial conversations regarding the desirability of
> turning Turtle into a quad language, but we decided (without 
> resolution) not to do that because: 
> - Turtle is widely fielded already 
> - We wished to minimize disruption, as per our charter 
> - Issues around datasets/quads were (and are) less agreed upon 
> 
> 
> Yes, we agreed to get Turtle out the door as a language for Triples.
> 
> So, now, what do we call a language that's like Turtle except it can
> also include datasets (that is, the triples can be segmented into 
> named sections)?
> 
> Frankly I expect this language to supplant Turtle as soon as it is 
> well supported, as long as it doesn't do anything to exclude simple 
> usage.   I think the kind of people who use Turtle (or RDF) are the 
> kind of people who will want to segment and manage their data.   But
> (1) I could be wrong, and (2) it may be a long time before it is 
> well-supported, given how confused we are about it within the WG.
> 
> So, myself, I'm split about what to call it.  Compared to me, 
> however, the WG, tends to lean more toward existing users and 
> experts, over new users and non-experts, so I expect the WG to just 
> go with "trig" unless someone makes a strong case for something else.
> 
> (In my prototype coding, I called the hypothetical trig-like 
> language "mugl", for MultiGraphLanguage.    If we start from a blank
> slate, we can probably do better than mugl or trig.)
> 
>        -- Sandro
> 
> 

> Regards,
> Dave
> 
> [1] http://www.w3.org/2011/rdf-wg/meeting/2011-02-23#resolution_1 
> [2] http://www.w3.org/2011/rdf-wg/track/issues/34

> 
> On Sep 26, 2012, at 12:42, Arnaud Le Hors <lehors@us.ibm.com> wrote: 
> 
> Hi Sandro, 
> 
> This discussion had already started when I joined the WG and as I 
> caught it midstream I thought it was about extending Turtle. I've 
> since then realized that this wasn't the intent and everybody seems 
> to agree with that but I must admit that I still don't know why. 
> Could you please explain or point me to some reference I could read 
> to catch up on that? 
> 
> I have to say that the proliferation of formats for RDF makes me a 
> bit nervous. This doesn't go along with making RDF simpler for the 
> masses/industry and facilitating adoption. 
> 
> Thanks.
> --
> Arnaud  Le Hors - Software Standards Architect - IBM Software Group
> 
> 
> Sandro Hawke <sandro@w3.org> wrote on 09/25/2012 04:14:25 PM:
> 
> > From: Sandro Hawke <sandro@w3.org> 
> > To: W3C RDF WG <public-rdf-wg@w3.org>, 
> > Date: 09/25/2012 04:14 PM 
> > Subject: Dataset Syntax - checking for consensus 
> > 
> > I'm not sure how much progress we'll be able to make on dataset 
> > semantics tomorrow, so I thought I'd draft some proposals on dataset 
> > syntax.   The chairs can put this on the agenda is they like (but it's 

> > too short notice for these decisions to be binding yet).  I'm thinking 

> > it would be useful to see how close we are to agreement on these 
issues.
> > 
> > If you followup with votes, please use -1 for Formal Objection, 0 for 
> > abstain, +1 for approve.   Numbers in between are fine, too.
> > 
> > PROPOSED: We will produce a W3C Recommendation for a dataset syntax, 
> > similar to TriG and to SPARQL's named graph syntax.
> > 
> > PROPOSED: We'll request a media-type for this syntax which is 
different 
> > from the media-type for Turtle.  (That is, we will not consider this 
> > language to supplant Turtle and take over the name, becoming the new 
> > "Turtle", as was once proposed.)
> > 
> > PROPOSED: Our dataset syntax will allow for the expression of empty 
> > named graphs, whatever their semantics might be (to be decided). The 
> > syntax is an empty curly-braces expression, as in "<g> { }".
> > 
> > PROPOSED: Our dataset syntax will have some standard mechanism (to be 
> > determined within the next few weeks) through which a Dataset 
> > serialization can include some RDF data about the Dataset (that is, 
some 
> > metadata in the form of an RDF graph).
> > 
> > 
> > Below, there are groups of proposals which are alternative solutions 
to 
> > a design issue.   If you approve of more than one of the alternatives, 

> > please vote "+2" for your favorite.
> > 
> > * Name of the dataset syntax
> > 
> > PROPOSED: We will call our recommended dataset syntax "trig", 
> > capitalized to Trig as needed.
> > PROPOSED: We will call our recommended dataset syntax "TriG", but 
> > informally and in the media type, "trig".
> > PROPOSED: We will call our recommended dataset syntax "TriG", and use 
> > that capitalization everywhere.
> > 
> > * Use of equals sign, like <g> = { <s> <p> <o> } .  This is not in 
> > SPARQL but is in traditional TriG, for compatibility with N3.
> > 
> > PROPOSED: In our dataset syntax, a "=" MAY appear between the name and 

> > the graph.
> > PROPOSED: In our dataset syntax, a "=" MUST appear between the name 
and 
> > the graph.
> > PROPOSED: In our dataset syntax, a "=" MUST NOT appear between the 
name 
> > and the graph.
> > 
> > * Use of the "graph" keyword, which MUST be used in SPARQL and MUST 
NOT 
> > be used in traditional TriG.
> > 
> > PROPOSED: In our dataset syntax, the case-insensitive keyword "graph" 
> > MAY appear before the name, in a name-graph pair.
> > PROPOSED: In our dataset syntax, the case-insensitive keyword "graph" 
> > MUST appear before the name, in a name-graph pair.
> > PROPOSED: In our dataset syntax, the case-insensitive keyword "graph" 
> > MUST NOT appear before the name, in a name-graph pair.
> > 
> > * Use of curly braces { <a> <b> <c> } around the default graphs. They 
> > MUST be used in traditional TriG, and MUST NOT be used in SPARQL.
> > 
> > PROPOSED: In our dataset syntax, triples of the dataset's default 
graph 
> > MAY be surrounded by curly braces.
> > PROPOSED: In our dataset syntax, triples of the dataset's default 
graph 
> > MUST be surrounded by curly braces.
> > PROPOSED: In our dataset syntax, triples of the dataset's default 
graph 
> > MUST NOT be surrounded by curly braces.
> > 
> > * Some designs for carrying for metadata
> > 
> > PROPOSED: In our dataset syntax, we'll say that metadata goes in the 
> > default graph
> > PROPOSED: In our dataset syntax, we'll say that the default graph goes 

> > inside curly braces and the metadata goes outside curly braces
> > PROPOSED: In our dataset syntax, we'll say that metadata goes inside a 

> > set curly braces after a keyword "meta".
> > PROPOSED: In out dataset syntax, we'll have a keyword "meta" followed 
by 
> > "default" or the name of a named graph, to indicate to readers where 
the 
> > metadata is.
> > 
> >
Received on Wednesday, 26 September 2012 21:14:08 UTC