CWM/N3 Meeting Notes (15th November 2004)

CWM and Notation3 Meeting
Date: 15th November 2004, 19:00 UTC.
At: irc://irc.freenode.net/rdfig (#rdfig)

== Agendum A: Roll call and administrivia. ==

Attendees:

  * Dan Connolly, W3C. (Chair)
  * Dave Beckett, ILRT.
  * Tim Berners-Lee, W3C.
  * Graham Klyne, Nine by Nine.
  * Sean B. Palmer. (Scribe)
  * Andy Seaborne, HP.
  * Yosi Scharf.
  * Christopher Schmidt.

Announcement:

  http://lists.w3.org/Archives/Public/public-cwm-talk/2004OctDec/0015
  - Notation3 languages, cwm etc: Scheduled topic chat #RDFIG
  From: Tim Berners-Lee. Date: Tue, 9 Nov 2004 10:33:26 -0500

Original notes:

  http://rdfig.xmlhack.com/2004/11/15/2004-11-15.html
  - Semantic Web Interest Group IRC Scratchpad

Meeting logs:

  http://ilrt.org/discovery/chatlogs/rdfig/2004-11-15.html#T17-09-28
  - Semantic Web Interest Group IRC Chat Logs for 2004-11-15

Next meeting:

  Tentatively scheduled for 2004-12-13T18:30Z (Monday)
  Same place, agenda yet to be decided upon.

Agendae:

  A) Roll call and administrivia.
  B) Is CWM ready for a 1.0 release?
  C) Notation3 identifier syntax: what characters are allowed?
  D) Notation3 formal grammar.
  E) Streaming RDF a la cwm --pipe -- should @prefix be allowed
     anywhere in a file?
  F) Notation3 :- idiom.
  G) Notation3 standardization.

Action items:

  * ACTION TimBL: fix questionnaire ACLs!
  * ACTION sbp: scribe notes. (Completed)
  * ACTION Yosi: public-cwm-announce CWM 1.0RC2. (Completed)

Summary: the meeting lasted for two-and-a-half hours, covering a range
of topics, many of which were not "official" agendae. Roughly a half
of the issues up for discussion were resolved--for example, that CWM
should undergo a 1.0 release, and that Notation3 standardisation
should converge on a standard test suite--whereas the others partly
floundered.

Many points were raised that were not further answered by the rest of
the group; this appears to have been mainly due to the rapid pace of
the meeting, and not deliberate avoidance. Please keep this in mind
when reading these notes.

Due to the anfractuous nature of the discussion, the agendae headings
below are only rough guidelines for the actual topics discussed.

== Agendum B: Is CWM ready for a 1.0 release? ==

Issue background: Yosi is wanting to release CWM's much deserved 1.0
version; TimBL asks for developer consensus that this is a good idea.
There had been two release candidates prior to the meeting,
culminating with CWM 1.0RC2:

http://lists.w3.org/Archives/Public/public-cwm-announce/2004OctDec/0001
  - Cwm release 1.0.0 Release Candidate 2
  From: Yosi Scharf. Date: Mon, 15 Nov 2004 14:32:15 -0500

Discussion is hampered by Yosi's intermittent attendance, but sbp
notes that 1.0RC2 passes his "smoke test" of "echo | python2.4
./cwm.py" without failure; i.e. that the following bug is now resolved:

  http://lists.w3.org/Archives/Public/public-cwm-bugs/2004Jul/0012
  - CWM 0.8 Distribution is Broken
  From: Sean B. Palmer. Date: Tue, 20 Jul 2004 02:24:14 +0100

And crschmidt notes a similar experience with an unidentified bug
which he claims no longer occurs in the latest version. A round of
applause is given for yosi's release engineering work. (Well done Yosi!)

sbp notes that he'd like to put it to some major project use before
being really happy with it, but that if 1.0 is not considered to be a
Rec milestone, more of a CR milestone, then he's happy. DanC notes
that it'll just be "a release with a zero on the end", and a consensus
emerges that CWM 1.0 is to be released. Yosi does within hours:

http://lists.w3.org/Archives/Public/public-cwm-announce/2004OctDec/0002
  - Cwm Release 1.0
  From: Yosi Scharf. Date: Mon, 15 Nov 2004 22:29:16 -0500

http://www.w3.org/2000/10/swap/cwm-1.0.0.tar.gz
  - CWM 1.0 Source tgz

http://lists.w3.org/Archives/Public/www-archive/2004Nov/0021
  - Cwm 1.0.0 build log
  From: Yosi Scharf. Date: Mon, 15 Nov 2004 17:01:06 -0500

DanC put up a 750 point bounty for anyone who can reproduce Yosi's CWM
1.0.0rc2 log results; presumably this extends to the 1.0.0 log results
too.

== Agendum C: Notation3 identifier syntax? ==

Issue background: what characters are to be allowed in the localname
production in Notation3; specifically, is hyphen-minus (U+002D) to be
permitted? Are the full panoply of XML 1.1 localname characters to be
allowed, or just a small ASCII set?

  * http://lists.w3.org/Archives/Public/www-archive/2002Feb/0025
    - Notation3: The Great QName Survey
    From: Sean B. Palmer. Date: Sun, 17 Feb 2002 17:32:53 -0000.
  * http://www.ilrt.bris.ac.uk/discovery/2004/01/turtle/#sec-qnames
    - Turtle issues on qnames, from Dave Beckett.

-> Hyphen-Minus in localnames.

DanC and sbp find the following production in the new formal syntax
for Notation3 (see Agendum D):

    qname: "(([a-zA-Z_][a-zA-Z0-9_]*)?:)?([a-zA-Z_][a-zA-Z0-9_]*)?"
    - http://www.w3.org/2000/10/swap/grammar/n3.n3

But TimBL claims that this is merely a placeholder. dajobe claims that
for Turtle, he went with "what cwm implemented, rather than what N3
defined". gk says that N3 being at odds with XML by not allowing "-"
is an inconvenience. DanC would like to see this issue explicitly
decided by a test case. AndyS notes that "_:a" is a valid bNode in N3
and QName in XML, but sbp doesn't see how that's relevant.

TimBL outlined his two major points against hyphen-minus:

  a) It means you can't extend the language with math expressions
     without adding whitespaces.
  b) You get an arbitrary question for whether to use - or _ (or just
     camelCase), which adds an extra dimension to identifier
     arbitrariness.

Neither of these points were addressed by anyone else in the meeting,
though gk said that he appreciates TimBL's position. TimBL notes that
there is a questionnaire on this subject on the W3C site:

  * http://www.w3.org/2002/09/wbs/1/RDF-N3-Syntax/
    - Syntax of QNames in N3 questionnaire

The meeting attendees find it to be improperly ACL'd, and hence
inaccessible to the public. DanC notes that "this would explain why so
few results", adding "phpht". TimBL duly takes an action item to chACL
the page. gk asks where and when the questionnaire was announced, and
TimBL says he thinks it was on #rdfig.

-> Underscore in localnames.

Some discussion ensued about whether "_" should be a valid character
too; sbp notes that in some N3 grammars it is not allowed as an
initial character of a localname, but that DanC kept coining temporary
properties such as _firstName, and so the idiom is to remain. TimBL
notes that he's never seen underscore as an issue.

sbp notes that CWM turns:

    @prefix _: <#> .
    _:p _:q _:r .

into:

    @prefix : <#> .
    :p :q :r .

And is unsure whether this is expected behaviour, and whether it has
sufficient developer support. Neither question is addressed within the
meeting. The alternative would be to disallow the rebinding of "_:"
using the prefix directive.

-> XML 1.1 Panoply vs. ASCII set

AndyS mentions that he supports the i18n-ization of QNames. dajobe
notes that Turtle doesn't yet use the XML 1.1 set, though TimBL hinted
earlier on that he thought this was the case. TimBL suggests using the
XML [Namespaces] 1.1 set and then removing "." and "-", and then notes
that the Predictive Parser (see Agendum D) has a very loose QName
parsing mechanism: "notNameChars = notQNameChars + ":" # Assume
anything else valid name :-/"

== Agendum D: Notation3 formal grammar. ==

Issue background: TimBL is gathering a list of outstanding syntax
issues in order to propel the integration into the SWAP/CWM toolset of
the new formal grammars for Notation3:

  * http://www.w3.org/2000/10/swap/grammar/
  * http://www.w3.org/2000/10/swap/grammar/n3.n3
    - Notation3 Formal Grammar, in RDF/N3 BNF
  * http://www.w3.org/2000/10/swap/test/n3parser.tests
    - Yosi's positive parser tests
  * http://www.w3.org/2000/10/swap/grammar/predictiveParser.py
    - Predictive Parser for RDF BNF.

-> The Predictive Parser.

TimBL notes that doing the context-free grammar in RDF was "quite
fun". He then propounds one of the driving design principles of N3:

    "N3 happens to be a very simple language (by design), specifically
     though one which can be parsed by looking ahead one token only to
     decide which branch of a production to expand a production as."

predictiveParser.py uses n3-selectors.n3 (TimBL says to check the
Makefile at http://www.w3.org/2000/10/swap/grammar/Makefile for
further details) in order to decorate the rules from n3.n3.

Deficiencies:

  * Yosi notes that it doesn't handle Unicode well (specifically: that
    it has "far bigger problems with unicode than what n3.n3 has").
  * It only checks the syntax.
  * It does not yet allow you to actually parse out triples e.g. by
    calling the CWM API.
  * TimBL notes that it's not optimised.
  * TimBL further notes it makes no attempt to recover from errors.

In the discussion about error recovery from the parser AndyS says that
Jena uses Antlr, which gives line/column counts for errors, whereupon
gk says that his intention in raising the error recovery point is to
make a "reasonable attempt to get back on track so as not to produce
zillions of secondary syntax errors".

DanC notes that he'd like to use the formal grammar as soon as
possible, obsoleting notation3.py--perhaps as part of CWM 1.1. He
subsequently almost tries to set an action item for someone to
reproduce TimBL's grammar work, but foils himself by thinking that
there may not be any further meetings.

TimBL sees as an objective the alignment of N3 and Turtle. DanC notes
that Turtle is now official DAWG business (kinda; and AndyS has some
non-technical issues that he refrains from raising). dajobe provides a
pointer to the Turtle RDF BNF grammar he made:

  http://lists.w3.org/Archives/Public/www-archive/2004Nov/0004
  - Approximate turtle in BNF n3
  From: Dave Beckett. Date: Wed, 3 Nov 2004 20:55:11 +0000

-> Is N3 LL(1)?

sbp asks whether the optional trailing period in formulae bug that for
so long stopped N3 from being an LL(1) language has been fixed. TimBL
notes that the predictive parser is an LL(1) subset, he thinks,
satisfying sbp on the issue.

-> Path Characters

sbp noted a general lack of consensus over the path characters used in
Notation3 at the moment, i.e. "." and "^", but this was not discussed
further in this meeting.

-> Character Encoding (utf-8 vs. ascii)

Character encoding, mentioned in the original meeting notes, was only
briefly discussed. sbp questioned the allowance of two different
escape mechanisms in N-Triples URIs; dajobe replied that these work on
different levels. sbp raised the issue that a URI with a unicode
character in it, e.g. <http://example.org/\u203D> in N-Triples has no
meaningful interpretation, and therefore the common-sense approach is
to convert it to the other escape mechanism, namely %HH encoding of
utf-8 encoded bytes. This point was not challenged or commented on.

This issue remains open.

== Agendum E: Streaming RDF a la cwm --pipe. ==

Issue background: should @prefix be allowed anywhere in a Notation3
file, overriding previous uses?

gk notes that the idea of using @prefix anywhere "rather appeals" to
him. He uses a derivation of N3 for the scripting language in his
Script toolkit (see Agendum G) which re-uses a lot of parser software
components, and believes that idea of @prefix anywhere fits in "kinda
nicely" with that. (The scribe is unsure of what this means.)

== Agendum F: Notation3 :- idiom. ==

Issue background: what does ":-" mean in Notation3?

This issue is barely discussed. gk says that he uses it to "create an
alias for whatever expression follows; especially the head of a list".

== Agendum G: Notation3 standardization. ==

Issue background: though extant for many years now, Notation3 still
lacks a formal specification, impeding uptake. Should Notation3 now be
better specified as it reaches tool maturity?

gk notes early on that "'stabilization' is maybe a more worthy goal
than 'standardization'", suggesting that a gentle evolution not
revolution approach is preferable. Rationale: this would enable
experimentation to continue, and belay possible i18n issues were
Notation3 to go to REC-track. (sbp note: On the other hand, if there
are i18n issued to be addressed, they should not be ignored.)

This turns into a show-and-tell festival incorporating the following
tools:

* CWM
* "dajobe's Turtle stuff"
* Graham Klyne's Swish:
   http://www.ninebynine.org/RDFNotes/Swish/Intro.html
   http://www.ninebynine.org/Software/HaskellRDF/RDF/Graph/N3Parser.hs
* http://eulersharp.sourceforge.net/ (JosD)
* http://www.mindswap.org/~katz/pychinko/ (by Yarden Katz, aka jordan,
   and the mindswap folk, but based on sbp's afon)
* http://www.wiwiss.fu-berlin.de/suhl/bizer/rdfapi/ PHP, using a port
   of one of sbp's old Notation3 parsers (not afon)
* AndyS's grammar for ANTLR: <http://cvs.sourceforge.net/viewcvs.py/
   jena/jena2/src/com/hp/hpl/jena/n3/n3.g?rev=1.14&view=log>

And, of course, Jos De Roo's amazing Javascript Notation3 work:
http://cvs.sourceforge.net/viewcvs.py/eulermoz/eulermoz/js/parser/n3/
http://cvs.sourceforge.net/viewcvs.py/eulermoz/eulermoz/rdfinf/

DanC notes that an IG Note seems worthwhile, but after discussing
somewhat whether any consensus has been reached on whether to create a
specification or a test suite, or even, as TimBL's asks, what the
difference is, sbp notes that there is "no emergent consensus".

DanC, sbp, JosD, and gk all agree that a centralised or distributed
test suite for Notation3 (presumably independent of the SWAP tools)
would be a worthy objective.

== Other Discussion Points ==

A number of other items were discussed in depth that don't fit into
any of the specified agendae.

-> Extensible Builtins.

sbp asks whether CWM can have a more extensible builtin-modules
mechanism, since to add a builtin module at the moment involves
modifying the CWM source code. TimBL proposes using a configuration
file accessible though an envar, e.g. CWM_CONFIG=~/cwmrc.n3, whereupon
DanC requests that it be available through a command line option too,
e.g. cwm --config ~/cwmrc.n3.

For the configuration file, TlmBL proposes:

    </home/sbp/lib/cwm/complex.py> a cwm:UserBuiltIn.

and sbp:

    </home/sbp/lib/cwm/builtins> a cwm:UserBuiltinsDir .

At this point, there is some branching off into a conversation about
whether builtins should be a part of antecedents, or whether they
should be separated from the query data.

-> Builtins, or Not?

sbp raises a long-standing issue: "I asked once what happens when you
want to query for something that is also a builtin triple, and the
reply was that if the triple is in the store, CWM'll treat the builtin
triple in the query as just a regular triple. So now I'm wondering
what happens if you've a builtin triple in the store, yet you want the
query builtin to act as a builtin. or, even more absurdly, if you
wanted some builtins to remain special, and some to be treated as just
other triples to query." TimBL replies that CWM does *both* a triple
match and a builtin invocation.

gk notes some alternative approaches that he's been working on. When
pressed, he explains that it's "under the general area of
datatype-aware inferencing", and that he's also done a write-up of
some of his explorations:

  http://www.ninebynine.org/RDFNotes/RDF-Datatype-inference.html
  - Using datatype-aware inferences with RDF
  CVS version: v 1.1 2003/12/12 20:35:50 graham (Graham Klyne)

When questioned about CWM's algorithm for invoking builtins, TimBL
explains that they're first separated into Light and Heavy builtins:
the Light ones are "done iff and whenn", whereas the Heavy ones are
done after the query.

-> SOURCE

The issue is whether CWM/SWAP and Notation3 can do SOURCE:

    "SOURCE ?var (?s ?p ?o) - When SOURCE ?var is given
     before a triple, the variable will be bound to all
     of the known *Graph Names* for that triple."

  - http://lists.w3.org/Archives/Public/public-rdf-dawg/2004OctDec/0287
    Re: SOURCE - [...] querying the origin of statements
    From: Seaborne, Andy. Date: Fri, 12 Nov 2004 11:38:10 -0000

JosD provides the following possiblity:

    @forAll :A, :S, :P, :O.
    [] q:select {(:O :A) a q:Answer};
       q:where {:A.log:semantics log:includes {:S :P :O}}.

And notes that he'll have difficulty binding :A, moving the discussion
towards where a list of the inputs can be found. sbp suggests passing
them from the command line (which JosD needs to implement in Euler).
DanC notes that dajobe proposes making the list of sources available,
but doesn't provide an adequate citatation, and then goes on to answer
sbp affirmatively on whether inferred graphs need to be named too:

http://lists.w3.org/Archives/Public/public-rdf-dawg/2004JulSep/0363
  - Re: Test cases: source of a triple [on SOURCE and inference]
  From: Dan Connolly. Date: Fri, 27 Aug 2004 17:27:49 -0500

Other tidbits from this conversation: the #_formula hack is no longer
being used, and log:semantics is treated as a FunctionalProperty by
CWM since, as TimBL puts it, "there is the assumption that all cwm
processing will take a negligible time compared to the change of the web".

The only point that seems to offer any way forward on the SOURCE issue
is TimBL's note that he "wondered about providing access to the
metalevel", which reminds sbp of META_NS_URI in llyn.py, being as
TimBL explains, the "URI of such a meta formula which records cwm's
personal experience".

== Colophon: Thanks ==

* gk thinks it's time to rejoin my family. Thanks everyone.
<sbp> thanks gk
<timbl> Thanks everyone for participating BTW
<sbp> a pleasure as usual
* JosD also taking a bit of family time - thanks a lot for the
meeting, was good
* sbp waves to timbl, AndyS, and JosD
<sbp> thanks
* timbl waves .. thanks everyone
<DanC> yes, thanks all. enjoyed it.
* AndyS waves goodbye
<adrianw> Lots to think about. Thanks to all and bye.

Thinks and thanks galore: I declare the meeting a success!

-- 
Sean B. Palmer, http://inamidst.com/sbp/

Received on Tuesday, 16 November 2004 07:38:58 UTC