Re: CURIEs in Turtle from Eric Prud'hommeaux on 2010-02-05 (semantic-web@w3.org from February 2010)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Fri, 5 Feb 2010 10:58:27 -0500
To: Dave Beckett <dave@dajobe.org>
Cc: Steven Pemberton <Steven.Pemberton@cwi.nl>, Toby Inkster <tai@g5n.co.uk>, pfps@research.bell-labs.com, semantic-web@w3.org, sandro@w3.org, Mark Birbeck <mark.birbeck@gmail.com>
Message-ID: <20100205155826.GB27206@w3.org>

* Dave Beckett <dave@dajobe.org> [2010-02-04 06:12-0800]
> (replying to the latest msg in this thread)
> 
> Jeremy Carroll wrote:
> ...
> >
> > ?s?o:n1.?s2?p2:n2 as a single CURIE
> ...
> >
> > ?s?o:n1. ?s2?p2:n2
> 
> OK, that's line noise.  Turtle should be readable and this is why whitespace
> is a good idea to sometimes mandate or VERY strongly suggest.  The turtle
> spec doesn't say that very well and the sparql spec does let you get away
> with this.  I'm tempted to make mandatory spaces between components now.

I don't think there's good ROI on chasing down and eliminating paths
that could allow unpleasantly terse expression. I'd favor backward-
compatibility and compatibility with SPARQL instead. I'd say forward-
compatibility is less of an issue as folks frequently rev their SemWeb
tools.

CURIEs allow you to eliminate similar prefix declarations. This can
lead to more readability in any graph which includes a two tier
semantics in its names, e.g. view on an RDB (<stem>/table/pk.value)
or any system which assigns node names hierarchically
(medications/anticoag/warfarin). In this example, I've tweaked the
Uniprot schema to use the LOD naming convention:

@prefix u: <http://purl.uniprot.org/> .
u:Proteins/P30090#it a u:Protein ;
                     u:mnemonic "UPA3_HUMAN" ;
                     u:annotation u:Annotations/P30090-A1#it .

The nodes ending in "#it" are not expressible as qnames. OTOH,
if we allow #, folks have to put whitespace between localnames
and comment charaters.

> I don't see the user need to allow such things.  If you are worried about
> the storage or network cost of extra spaces, you should compress.
> 
> We probably don't need to go all the way to
>   "represent any URI in a compact form" (Steven)
> as CURIES need to do in a constrained place - xml/html attribute value,
> since Turtle has a place to write full URIs in all cases, and also has
> additional syntax constraints in order to allow other abbreviated forms.
> The nearest we could get would be any URI that doesn't use the Turtle
> syntax symbols (anywhere) - [];,_. etc.

We could allow full CURIEs by following e.g. :foo. with a space, as
you suggested above. If we don't, we lose the value of calling it a
CURIE (don't get parser/generator re-use, don't get mindshare with
folks reading the spec).

There is still value to liberalizing the grammar for identifiers.
Unfortunately, many identifiers we want to import into the SemWeb
contain [\._,]. Machine generation of readable, valid turtle
containing these identifiers is easier if they are allowed in the
"local name".

I think our choices look like:
• full CURIEs: use whitespace to disambiguate e.g. :foo. from :foo .
• liberalize localname: allow '/'s and other non-puncutating chars
• leave it alone

My preferenes are (descending):
  full CURIEs
  leave it alone
  liberalize localname.

-- 
-ericP

Received on Friday, 5 February 2010 15:59:09 UTC