n3 suggestion: comma, semi-colon, and period; Versioning from Sandro Hawke on 2002-04-03 (www-rdf-interest@w3.org from April 2002)

From: Sandro Hawke <sandro@w3.org>
Date: Wed, 03 Apr 2002 10:14:56 -0500
To: timbl@w3.org
Cc: www-rdf-interest@w3.org
Message-Id: <200204031515.g33FEu617233@wadimousa.hawke.org>

My biggest practical hassle in using n3 is changing between period and
semi-colon when I add or re-order properties.    

It occurs to me that there is no need for period to be used this way:
one could use semi colon to mark the end of a tuple, and the parser
would fill in any missing fields in the tuple, on the left, with the
data from the same field of the previous tuple.

The counter argument is probably that redundancy in a language helps
catch user errors, but when I hit this error, it's always because I've
used the wrong punctuation, not because I've used the wrong number of
terms in a tuple.

Actually, given the format people seem to use for n3, I'd lean towards
the tuple terminator being either semi-colon or newline.  (This is
much like the shell, where semi-colon or newline ends a
space-separated tuple which ends up in argv[].  The idea of repeating
fields on the left is quite different though.)

I've used the word "tuple" instead of "triple" because this algorithm
generalized to any size tuple.  It means you could have an n3 file
with only predicate/object pairs (perhaps the default subject is "<>",
or perhaps it depends on context), or even just objects.  I think
that's pretty cool -- a file with one term per line would be a
collection of values for some property of some subject.

Allowing semi-colon where period and comma are now used would not
break old data files, it would just give meaning to files which 
currently have invalid syntax.  Allowing newline to work like
semicolon would break some files, but not many, and most of those
would caught in a transitional language where comma and period
retained their current meaning.

In general, I'd like n3 versions to be identified.  My current
favorite approach, which comes out of a much larger analysis [1] is to
say that if the file anywhere contains the the text
     -*- formal-language-URI: "something"; -*- 
the "something" MUST be the URI-Reference of a language which the
system can properly understand.  If the pattern occurs more than once,
the first one which can be used MUST be used.  I think this mechanism
allows nearly-arbitrary languages to be correctly understood without
external metadata (Content-Type, filename snooping).  (The form of the
magic string is from Emacs file variables [2]).

    -- sandro

[1] http://www.w3.org/2001/06/blindfold/langIdent under "Content Sniffing"
[2] http://www.delorie.com/gnu/docs/emacs/emacs_439.html

Received on Wednesday, 3 April 2002 10:17:10 UTC