See also: IRC log
ChrisW: There are 2 kinds of
issues with PS
... 1. presentability/readability
... 2 ambiguity which makes parsing difficult
... We'll start with Hassan's email about issues he encountered with parsing
Hassan: This has been discussed in previous emails also.
... I propose to put IRI's in quotes in Prefix and Base
directives
Sandro: You could use white spaces as a delimiter
<ChrisW> Prefix ::= 'Prefix' '(' Name IRI ')'
Hassan: It's better to make a context-free tokenizer
<Hassan> Prefix ::= 'Prefix' '(' Name STRING ')'
<sandro> <http:...>
MichaelK: Why can't we use an IRI constant here? with anglebrackets
Hassan: Ok, can use angle brackets instead of quotes as delimiters
Sandro, ChrisW: We will either use quotes or anglebrackets, and will decide which later
<sandro> sandro: lets disallow hyphen in identifiers
<sandro> switch to: camelcase or underscore
<Hassan> In the RIF specification of the EBNF the Rule Language, it is specified that:
<Hassan> IRIMETA ::= '(' IRICONST? (Frame | 'And' '(' Frame ')')? ')'
<Hassan> Frame ::= TERM '[' (TERM '->' TERM) ']'
<Hassan> TERM ::= IRIMETA? (Const | Var | Expr | 'External' '(' Expr ')')
<Hassan> Const ::= '"' UNICODESTRING '"^^' SYMSPACE | CONSTSHORT
<Hassan> SYMSPACE ::= ANGLEBRACKIRI | CURIE
<Hassan> where CONSTSHORT, ANGLEBRACKIRI, and CURIE are defined (in the DTB shorthand notation for RIF constants) by:
<Hassan> CURIE ::= PNAME_LN | PNAME_NS
<Hassan> CONSTSHORT ::= ANGLEBRACKIRI // shorthand for "..."^^rif:iri
<Hassan> | CURIE // shorthand for "..."^^rif:iri
<Hassan> | '"' UNICODESTRING '"' // shorthand for "..."^^xs:string
<Hassan> | NumericLiteral // shorthand for "..."^^xs:integer,xs:decimal,xs:double
<Hassan> | '_' LocalName // shorthand for "..."^^rif:local
<Hassan> where:
<sandro> I suggest using http://www.w3.org/TeamSubmission/turtle/ whereever reasonable.
<Hassan> ANGLEBRACKIRI ::= '<' ([^<>"{}|^`\]-[#x00-#x20]) '>'
<Hassan> PNAME_LN ::= PNAME_NS PN_LOCAL
<Hassan> PNAME_NS ::= PN_PREFIX? ':'
<Hassan> PN_LOCAL ::= (PN_CHARS_U | [0-9]) ((PN_CHARS|'.') PN_CHARS)?
<Hassan> PN_PREFIX ::= PN_CHARS_BASE ((PN_CHARS|'.') PN_CHARS)?
<Hassan> PN_CHARS_U ::= PN_CHARS_BASE | '_'
<Hassan> PN_CHARS ::= PN_CHARS_U
<Hassan> | '-'
<Hassan> | [0-9]
<Hassan> | #x00B7
<Hassan> | [#x0300-#x036F]
<Hassan> | [#x203F-#x2040]
<Hassan> PN_CHARS_BASE ::= [A-Z]
<Hassan> | [a-z]
<Hassan> | [#x00C0-#x00D6]
<Hassan> | [#x00D8-#x00F6]
<Hassan> | [#x00F8-#x02FF]
<Hassan> | [#x0370-#x037D]
<Hassan> | [#x037F-#x1FFF]
<Hassan> | [#x200C-#x200D]
<Hassan> | [#x2070-#x218F]
<Hassan> | [#x2C00-#x2FEF]
<Hassan> | [#x3001-#xD7FF]
<Hassan> | [#xF900-#xFDCF]
<Hassan> | [#xFDF0-#xFFFD]
<Hassan> | [#x10000-#xEFFFF]
Hassan: DTB points to other specs....parts of the above grammar are from those referenced specs
ChrisW: The - is allowed in the
curie notation?
... we are inheriting dash inside identifiers from curie
syntax
Sandro: We will inherit additional difficult things from curie grammar also
Hassan: I'm not aiming my prototype to cover all posssible valid cases....but rather most of them
Sandro: We could give a restricted definition of identifiers
<sandro> http://www.w3.org/TeamSubmission/turtle/#name
ChrisW: Then we have to write our own syntax for these, and not use curie
Sandro: Could possibly use the one I pasted above in the IRC
<sandro> nameStartChar | '-' | [0-9] | #x00B7 | [#x0300-#x036F] | [#x203F-#x2040]
<sandro> (turtle)
AxelP points to SPARQL grammar
ChrisW: What if we remove dash from other places, outside of identifiers?
Sandro: I can't imagine readable rules without infix subtraction
<ChrisW> ex:-
<ChrisW> -
<Hassan> foo-bar
<sandro> External(func:numeric-subtract(2 1)) 2 - 1 "+" "-" "*" "/" as in programming languages
you could expand all "shortcuts" before parsing
<sandro> (quoting http://www.w3.org/TR/rif-ucr/)
<Harold> The "-" in the abridged 17-4 would not be part of an identifier.
but without the infix operator which is not currently part of the grammar, you can preprocess
<Harold> So omitting "-" from identifiers helps introducing abridged PS.
Sandro: So can we remove - from identifers?
ChrisW: Then we have to reproduce syntax for curies
Hassan: If minus signs are allowed in identifiers in turtle, etc and they allow infix operators, how do they handle that?
ChrisW: They don't have - as a
separate operator
... I think it's more important to be in line with the other
standards, than it is to have pretty substraction syntax
<Harold> External(func:subtract-dateTimes(?deliverydate ?scheduledate)) OR External(?deliverydate-?scheduledate)
ChrisW: Let's think about this one, and come back to it
<Harold> External(func:subtract-dateTimes(?deliverydate ?scheduledate)) OR External(?deliverydate - ?scheduledate)
Sandro: Another option (that some others do) is to require spaces around the - when used as substraction
Hassan: For bottom up parser, a problem
Hassan: Could allow mixing at the ebnf level, and then check for additional syntax errors (mixing) later?
is it the prefix notation that cause this ambiguity? Doesn't a "Name" look different than a "TERM"?
ChrisW: It's confusing that syntax for named arguments in uniterms is the same as syntax for slots in frames
<sandro> foo( [color]red ) would work
<sandro> foo( color : red ) would work
<sandro> (solving Chris's problem, not Hassan.)
Sandro: We are discussing 2
different issues about named args
... Hassan's re parsing, and ChrisW's re readability
<ChrisW> foo{ color::red size::big }
<ChrisW> foo! color::red size::big !
<sandro> foo((a->b))
<sandro> f((a)->b)
<sandro> foo( named color->red)
<sandro> foo(NAMED color->red)
Sandro, why do you think color can be confused as positional argument??
<sandro> StellaMitchell, it's not truly ambiguous, it just requires more look-ahead.
why does it require more lookahead?
For a uniterm with no arguments, it is truly ambiguous (in terms of the grammar, but shouldn't matter in a practical sense? (since the parser will just pick one and that will be ok)). If there are arguments, I'm not clear on where the ambiguity comes in
<mkifer> foo(! color-> red !)
<Harold> We implemented this in ANTLR.
<sandro> foo(-> color -> red ->) :-) :-)
ChrisW: Useful for readability to distinguish named arg predicates from frames
<Hassan> foo ( bar baz -> foo boo -> goo )
<sandro> foo ( color -> red name -> rover ) ?
<Harold> PS is Human-Oriented Syntax.
<Harold> So it's ok if parsers have some harder time.
<Harold> After all, parsing is done only once, before the machine has to start its real (deductive) work..
<Harold> It's not on the XML level.
Hassan: Cannot tell the difference between special case and general case
Hassan: The ambiguity is that you have a grammar that gives general rules and then within that grammar there is another grammar that gives more specific rules
MichaelK: Where is the difficulty? is it the ^^?
Hassan: We are talking about the
rif:iri suffix
... already past the "..."^^curie
... I am talking about specifically rif:iri that complicates
things at the lexical level
Sandro: I think a RIF parser should never be looking inside an IRI?
ChrisW: An IRI or a string?
Sandro: Either
Hassan: At the lexical level we
require that it be "rif:iri", and when it is an IRI const complicates
lexical parsing
... it would be cleaner to accept any curie there and check that
it's rif:iri later
I will send an email example to make the problem more clear
ChrisW: Another meeting same time
next week?
... works for everyone
MichaelK: We need to discuss the escape symbols...e.g. for double quotes within double quoted strings
Hassan: And we should discuss the APS