- From: Sandro Hawke <sandro@w3.org>
- Date: Wed, 12 Nov 2008 12:22:56 -0500
- To: Chris Welty <cawelty@gmail.com>
- cc: "Public-Rif-Wg (E-mail)" <public-rif-wg@w3.org>
> Sorry for the continued delay getting this started - but let's go. First > presentation syntax task force meeting will be this Friday (Nov. 14) at 11AM > EST > (that's the usual RIF telecon time, but a different day). > > Please let me know if you plan to attend so I can reserve enough zakim ports. +1 (in the sense of rsvp += Sandro) - s > I'd like to start by reviewing Hassan's comments below. > > -Chris > > > > Hassan Ait-Kaci wrote: > > Hello, > > > > This is an update on my on-going efforts to produce a working parser and > > XML serializer for the BLD Presentation Syntax: Action 564, due on > > October 31, 2008 (http://www.w3.org/2005/rules/wg/track/actions/564). > > > > I had already produced such a thing for the original specs - i.e., before > > several changes were made that have had the effect of introducing several > > rather nasty ambiguities and context sensitivity, both at the lexical and > > syntactic levels - even just for the canonical PS (i.e., even w/o the DTB > > shortcuts and Adrian's Abridged PS). > > > > I have been struggling trying to find workarounds to whatever snags have > > popped up whenever I could figure any. However, there still remain some > > tricky situations that require our attention (at least so that we produce > > specs that are not so uselessly complicated to implement without ad hoc > > hacks). > > > > It would be good that the PS Task Force convene sometime soon to discuss > > these issues and how to resolve them. > > > > Here are some examples of what I have puzzled over (this is non exhaustive > ): > > > > 1) Tokenizing the argument of the Prefix and base directives is made > > uselessy complex by not enclosing the IRI in double quotes (viz., it > > forces a lexer to *parse* IRI's - as opposed to just read them off - > > for no purpose whatsoever, making the lexical nature of some characters > > context-sensitive (for example, ':' is used as a delimiter for CURIE's > > but not within IRI's; or, '#' is used as class membership, but not > > within IRI's; etc, ...). > > > > A possible workaround is simply to double-quote them in the directives. > > > > 2) The minus sign ('-') now appears in some identifiers (e.g., ?diffdays = > > External(func:days-from-duration(?diffduration)). This would be no > > problem if we just considered '-' to be part of identifiers like '_', > > but it must also be seen as a literal character in order to recognize > > tokens such as "->" and ":-". While this is not a major hitch, it is > > unnecessary. (Not to mention the fact that '-' is the subtraction > > operator in the APS.) > > > > A possible workaround is simply to disallow '-' in identifiers (say, > > using '_' instead) - as is the case in most programming languages. > > > > 3) The ANGLEBRACKIRI notation can be dealt with declaring '<' and '>' as > > quote chars, but this precludes them from being used as operators or > > punctuation. > > > > 4) UNITERM's are defined to be either positional or attributed, but not > > both: > > > > UNITERM ::= Const '(' (TERM* | (Name '->' TERM)*) ')' > > > > This creates an inherently unliftable reduce/reduce syntactic ambiguity > : > > > > ============================= > > STATE NUMBER: 54 > > ============================= > > This state has conflicts: > > > > Unresolved R/R conflict: choosing R82 over R84, on inpu > t 'IDENTIFIER' > > Unresolved R/R conflict: choosing R82 over R84, on inpu > t 'CLOSEPAR' > > ----------------------------- > > [45] UniTerm --> Const 'OPENPAR' . UniTermBody 'CLOSEPAR' > > Preceding states: {22, 51, 95, 120, 127, 148} > > Follow set: {'CLOSEPAR'} > > [66] UniTermBody --> . Term_star > > Preceding states: {54} > > [67] UniTermBody --> . TermAttribute_star > > Preceding states: {54} > > [82] Term_star --> . > > Preceding states: {54} > > Lookahead set: {'EXTERNAL', 'NUMBER', 'LOCALNAME', 'VARIABLE', > 'STRING', 'IDENTIFIER', 'ANGLEBRACKIRI', 'CLOSEPAR', 'OPENMETA', 'COLON'} > > [83] Term_star --> . Term_star Term > > Preceding states: {54} > > [84] TermAttribute_star --> . > > Preceding states: {54} > > Lookahead set: {'IDENTIFIER', 'CLOSEPAR'} > > [85] TermAttribute_star --> . TermAttribute_star TermAttribute > > Preceding states: {54} > > ----------------------------- > > With UniTermBody, go to state 55 > > With Term_star, go to state 56 > > With TermAttribute_star, go to state 57 > > > > A possible workaround is to use modify the rule to: > > > > UNITERM ::= Const '(' (TERM | (Name '->' TERM))* ')' > > > > (i.e., accepting mixed positional and attributed term bodies), and > > perform a check ex post facto. > > > > 5) According to http://www.w3.org/TR/rif-bld/#sec-ebnf-condition-language: > > > > An IRICONST is the special case of a Const with the symbol > > space rif:iri, again permitting the shortcut forms defined in > > http://www.w3.org/TR/rif-bld/#ref-rif-dtb. One such specialization > > is '"' IRI '"^^' 'rif:iri' from the Const production, where IRI is a > > sequence of Unicode characters that forms an internationalized > > resource identifier as defined by > http://www.w3.org/TR/rif-bld/#ref-rfc-3987. > > > > However, this definition complicates tokenizing as it becomes > > impossible to distinguish the special case from the general one. > > > > A possible workaround is to see an IRICONST as just a fully qualified > > constant; i.e., accepting even not "rif:iri" symbol spaces and > > performing the check ex post fact. > > > > Again, this is not an exhaustive list of issues. Be those as they may, I > > will continue working on trying to produce a working [A]PS parser as my > > time permits while on the road (I have been traveling and will be until > > Nov. 11). > > > > It will be good that the PS Task Force discuss and find resolutions to all > > such issues. > > > > Regards, > > > > -hak > > -- > > Hassan Aït-Kaci * ILOG, Inc. - Product Division R&D > > http://koala.ilog.fr/wiki/bin/view/Main/HassanAitKaci > > > > > > -- > Dr. Christopher A. Welty IBM Watson Research Center > +1.914.784.7055 19 Skyline Dr. > cawelty@gmail.com Hawthorne, NY 10532 > http://www.research.ibm.com/people/w/welty > -- > Dr. Christopher A. Welty IBM Watson Research Center > +1.914.784.7055 19 Skyline Dr. > cawelty@gmail.com Hawthorne, NY 10532 > http://www.research.ibm.com/people/w/welty
Received on Wednesday, 12 November 2008 17:23:19 UTC