syntax evolution, was Re: adding inline graphs to TriG from Sandro Hawke on 2013-07-24 (public-rdf-wg@w3.org from July 2013)

From: Sandro Hawke <sandro@w3.org>
Date: Wed, 24 Jul 2013 09:34:21 -0400
To: Markus Lanthaler <markus.lanthaler@gmx.net>
CC: public-rdf-wg@w3.org
Message-ID: <51EFD7DD.1040506@w3.org>
On 07/24/2013 03:45 AM, Markus Lanthaler wrote:
>>>> > >>Some of the things that might also be included in a better-than-TriG
>> >(some of which have nothing to do with named graphs), that I doubt
>> >we're ready to put in a REC yet:
>>>> > >>   - particular dataset semantics
>>>> > >>   - literal times & dates
>>>> > >>   - a syntax to help with repeated objects (either an inverse-
>> >predicate operator or allowing comma between subjects)
>>>> > >>   - things to make turtle usable for people who dont want to handle
>> >lots of namespaces (cf json-ld and RDFa)
>>>> > >>   - a syntax for variables (far-fetched, I know)
>>> > >I'm a bit more wary about the potential problems that untested syntax
>> >might add. I'd say we may go so far as to borrow patterns already
>> >established in Notation3 but leave it at that this time around. An
>> >initial context, based on RDFa's might be interesting, though.
>> >
>> >My point here is that the idea of the "next time around" is
>> >problematic.   I don't believe the market can handle minor changes in a
>> >widely deployed data language.    For any change, once it's widely
>> >deployed, we need to change the mime type.  And people will only bother
>> >doing that if the new type offers significant improvements.     I can
>> >see the market handling TriG and one other graph language in the next
>> >few years, so if we do both of them now, there wont be a "next time
>> >around" for quite a long time.      Of course, if we only do TriG, it'll
>> >still be hard for any second language to make it -- but it'll be a bit
>> >easier.
> I'm not sure I buy this argument. Look at HTML. It's much more problematic
> to create heaps of different competing syntaxes for essentially the same
> thing than to gradually improve a syntax and to keep implementations up to
> date with those minor changes. It's very costly to support a wide variety of
> syntaxes at the same time and greatly reduces interoperability because the
> chance that two systems find a syntax they both understand goes down with
> every new syntax we produce. Thus, my preferred set of syntaxes would be
> RDFa, Turtle+ (datasets), JSON-LD, RDF/XML (for historic reasons).
>

I'm not suggesting creating heaps of different competing systems.  I 
agree that's bad (which is why I proposed making our quad-testing 
language be a subset of our dataset syntax).  But I think we need a few, 
like maybe two or three in the Turtle family.   That will be a pain, but 
it will allow innovation and improvement.

You say HTML manages with one media type (more or less), but it has some 
advantages here, including the ignore-what-you-don't-recognize rule of 
HTML and JavaScript shim functions.   And it's still rather painful.   
How much do people use <section>?

Lets imagine we wanted to extend Turtle to include a shorthand for ISO 
dates.   Instead of writing|||||| 
"2013-07-24T13:57:01Z"^xsd:dateTimeStamp we want to let people just 
write 2013-07-24T13:57:01Z.  That seems to me like it would be nice.

How could we do this without changing the media type?

Best case: Alice writes a great spec for Turtle with shorthand dates, 
"TurtleD".   Bob and Charlie love the spec and extend their Turtle 
Parsers to handle TurtleD.   Between them they have 60% of the market.  
Dave publishes lots of data, and he also likes TurtleD.   He changes his 
feeds to use it.    Now, of course, everyone consuming his feeds who is 
in the other 40% of the market is very, very annoyed with Dave.  To 
them, he just broke the feed.

Basically, the only way you can change a format without changing the 
media type is to have the vast majority of the market on board and 
willing and able to get the extended parsers out there before people 
lose interest.    That's another thing HTML has going for it - there is 
a small number of players who control the market and are able to update 
nearly everyone's systems without convincing end-users of the merit of 
doing so.   (Of course, saying "IE6" to them is like saying 
"httpRange-14" to us.)

Even if we had that level of participation and agreement among the 
parser writers -- which I think we MIGHT be able to get -- we still rely 
on other people to install upgrades.  I think people are not likely to 
deploy the new software unless it does something for them, and it wont 
do something for them until everyone has already deployed the new 
software.   (Otherwise you're in the people-mad-at-Dave scenario.)  Yes, 
you can try to convince them to upgrade for other reasons, bug fixes and 
other features, but it can take a very, very long time, and until pretty 
much everyone has upgraded, and somehow everyone knows this has 
happened, no one can use the feature in the open.

Media types and content negotiation give us a solution to this 
problem.   With them, people can deploy the new software (with the 
TurtleD parser using text/turtleD or something) and get the benefits 
immediately.  And those benefits grow as other people upgrade.  You 
don't have to wait for everyone to upgrade before using the new features 
among everyone who happens to have implemented them.

Even with the media types, it's important to get everyone on board, so 
there is a very small number of media types.   I wonder if we need an 
ongoing Turtle Working Group to handle that.

        -- Sandro
Received on Wednesday, 24 July 2013 13:34:29 UTC