- From: Chris Welty <cawelty@gmail.com>
- Date: Fri, 21 Nov 2008 10:31:32 -0500
- To: "Public-Rif-Wg (E-mail)" <public-rif-wg@w3.org>
All - I forgot to send out the agenda/reminder for a PS TF telecon and I can't make it today anyway. I spoke to a couple of you who also had forgotten about it (we did discuss at the last telecon to have one this week). So, cancelled for this week. I will try to send out a summary of the issues so we can progress by email. Sandro indicated he *can* make a telecon next Friday - it's a major holiday in the US so many of us won't - is there anyone else who can make it next Friday? Otherwise we'll have the next meeting on the 5th. -Chris Hassan Ait-Kaci wrote: > Hello, > > This is an update on my on-going efforts to produce a working parser and > XML serializer for the BLD Presentation Syntax: Action 564, due on > October 31, 2008 (http://www.w3.org/2005/rules/wg/track/actions/564). > > I had already produced such a thing for the original specs - i.e., before > several changes were made that have had the effect of introducing several > rather nasty ambiguities and context sensitivity, both at the lexical and > syntactic levels - even just for the canonical PS (i.e., even w/o the DTB > shortcuts and Adrian's Abridged PS). > > I have been struggling trying to find workarounds to whatever snags have > popped up whenever I could figure any. However, there still remain some > tricky situations that require our attention (at least so that we produce > specs that are not so uselessly complicated to implement without ad hoc > hacks). > > It would be good that the PS Task Force convene sometime soon to discuss > these issues and how to resolve them. > > Here are some examples of what I have puzzled over (this is non exhaustive): > > 1) Tokenizing the argument of the Prefix and base directives is made > uselessy complex by not enclosing the IRI in double quotes (viz., it > forces a lexer to *parse* IRI's - as opposed to just read them off - > for no purpose whatsoever, making the lexical nature of some characters > context-sensitive (for example, ':' is used as a delimiter for CURIE's > but not within IRI's; or, '#' is used as class membership, but not > within IRI's; etc, ...). > > A possible workaround is simply to double-quote them in the directives. > > 2) The minus sign ('-') now appears in some identifiers (e.g., ?diffdays = > External(func:days-from-duration(?diffduration)). This would be no > problem if we just considered '-' to be part of identifiers like '_', > but it must also be seen as a literal character in order to recognize > tokens such as "->" and ":-". While this is not a major hitch, it is > unnecessary. (Not to mention the fact that '-' is the subtraction > operator in the APS.) > > A possible workaround is simply to disallow '-' in identifiers (say, > using '_' instead) - as is the case in most programming languages. > > 3) The ANGLEBRACKIRI notation can be dealt with declaring '<' and '>' as > quote chars, but this precludes them from being used as operators or > punctuation. > > 4) UNITERM's are defined to be either positional or attributed, but not > both: > > UNITERM ::= Const '(' (TERM* | (Name '->' TERM)*) ')' > > This creates an inherently unliftable reduce/reduce syntactic ambiguity: > > ============================= > STATE NUMBER: 54 > ============================= > This state has conflicts: > > Unresolved R/R conflict: choosing R82 over R84, on input 'IDENTIFIER' > Unresolved R/R conflict: choosing R82 over R84, on input 'CLOSEPAR' > ----------------------------- > [45] UniTerm --> Const 'OPENPAR' . UniTermBody 'CLOSEPAR' > Preceding states: {22, 51, 95, 120, 127, 148} > Follow set: {'CLOSEPAR'} > [66] UniTermBody --> . Term_star > Preceding states: {54} > [67] UniTermBody --> . TermAttribute_star > Preceding states: {54} > [82] Term_star --> . > Preceding states: {54} > Lookahead set: {'EXTERNAL', 'NUMBER', 'LOCALNAME', 'VARIABLE', 'STRING', 'IDENTIFIER', 'ANGLEBRACKIRI', 'CLOSEPAR', 'OPENMETA', 'COLON'} > [83] Term_star --> . Term_star Term > Preceding states: {54} > [84] TermAttribute_star --> . > Preceding states: {54} > Lookahead set: {'IDENTIFIER', 'CLOSEPAR'} > [85] TermAttribute_star --> . TermAttribute_star TermAttribute > Preceding states: {54} > ----------------------------- > With UniTermBody, go to state 55 > With Term_star, go to state 56 > With TermAttribute_star, go to state 57 > > A possible workaround is to use modify the rule to: > > UNITERM ::= Const '(' (TERM | (Name '->' TERM))* ')' > > (i.e., accepting mixed positional and attributed term bodies), and > perform a check ex post facto. > > 5) According to http://www.w3.org/TR/rif-bld/#sec-ebnf-condition-language: > > An IRICONST is the special case of a Const with the symbol > space rif:iri, again permitting the shortcut forms defined in > http://www.w3.org/TR/rif-bld/#ref-rif-dtb. One such specialization > is '"' IRI '"^^' 'rif:iri' from the Const production, where IRI is a > sequence of Unicode characters that forms an internationalized > resource identifier as defined by http://www.w3.org/TR/rif-bld/#ref-rfc-3987. > > However, this definition complicates tokenizing as it becomes > impossible to distinguish the special case from the general one. > > A possible workaround is to see an IRICONST as just a fully qualified > constant; i.e., accepting even not "rif:iri" symbol spaces and > performing the check ex post fact. > > Again, this is not an exhaustive list of issues. Be those as they may, I > will continue working on trying to produce a working [A]PS parser as my > time permits while on the road (I have been traveling and will be until > Nov. 11). > > It will be good that the PS Task Force discuss and find resolutions to all > such issues. > > Regards, > > -hak > -- > Hassan Aït-Kaci * ILOG, Inc. - Product Division R&D > http://koala.ilog.fr/wiki/bin/view/Main/HassanAitKaci > > -- Dr. Christopher A. Welty IBM Watson Research Center +1.914.784.7055 19 Skyline Dr. cawelty@gmail.com Hawthorne, NY 10532 http://www.research.ibm.com/people/w/welty
Received on Friday, 21 November 2008 15:32:30 UTC