Re: Re[2]: Testcases for HTTP location grammar [CR130] from John Kaputin (gmail) on 2007-01-12 (www-ws-desc@w3.org from January 2007)

From: John Kaputin (gmail) <jakaputin@gmail.com>
Date: Fri, 12 Jan 2007 15:00:07 +0000
To: "georgi.georgiev.pv@hitachi.com" <georgi.georgiev.pv@hitachi.com>
Cc: www-ws-desc@w3.org, matsuki.yoshino.pw@hitachi.com, plh@w3.org
Message-ID: <4c2ae8f80701120700i2b90db79i47ffd0408d95d6c6@mail.gmail.com>
Georgi,
you asked if I could elaborate on my statement:

...it's not as simple as 'find a left curly brace, check for a double brace,
then scan for a right curly brace'.

This relates to the unexpected results I found with the approach that scans
left to right, with double curly braces taking precedence over single braces
(i.e. not the inner-most pair approach, although I don't think that's
perfect either). The second group of examples in my original post shows some
of these results.

For example:
"{{{town}}}"   > {{,{,town,}},}   > "{{town}}"  > invalid

With this approach, it's not possible to write a location template that
resolves to an element value surrounded by literal curly braces - e.g.
{Paris} after substitution. Because {{{town}}} would resolve to {{town}},
not {Paris}.

Problems tend to occur when you have 3 or more consecutive left or right
braces. As I said, these examples produced some unexpected results which led
me towards my preference for the inner-most pair approach.

I didn't want to overload my orignal post with examples of parsing problems,
but I think you have captured more of these with some of your own examples.

E.g.
- inner-most pair first method:
 There is no way of writing "{town}" or "{randomstring}" as any matching
braces will be tried for expansion.
- double braces first method:
 There is no way of writing "Paris}" as the closing brace of "{town}" would
be matched to a following "}".

I guess what we need is a set of examples that should be possible in the
http location string (like "{randomstring}" ) and these should be applied to
any new grammar to ensure it works.

These cases might be considered so obscure and unlikely to occur in read
WSDL that we can just ignore them and not worry about defining a specific
grammar for handling these cases in the spec - i.e. if they never or hardly
ever occur then maybe it doesn't really matter how different WSDL
2.0processors handle them.  I only raised this issue because to
implement
something at all in Apache Woden I had to consider these questions and make
some decisions and it occurred to me that those decisions should really be
in the spec, so that other implementors don't have to answer the same
questions (and maybe with different answers than mine).

The discussion that this thread has generated, and the various views the
respondants have on what they expect the parsing results to be, I think
highlights the ambiguity with the curly brace syntax as it's currently
specified and the possible problems this might cause for implementors. I
don't envy Phillipe the challenge of writing a grammar that meets everyone's
expectations!

regards,
John Kaputin.


On 1/12/07, georgi.georgiev.pv@hitachi.com <georgi.georgiev.pv@hitachi.com>
wrote:
>
> When deciding on the grammar, please consider a method that would not
> restrict the possible content of the processed result in any way.
>
> With John's town=Paris example (quoted near the end of this mail):
>
> - inner-most pair first method:
>   There is no way of writing "{town}" or "{randomstring}" as any matching
> braces will be tried for expansion.
> - double braces first method:
>   There is no way of writing "Paris}" as the closing brace of "{town}"
> would be matched to a following "}".
>
> And regarding Tony's method with the stacking:
>
> - Does "{{" have to be stacked? Double braces do not have to come in
> pairs.
> - Similar to the above, should a lone "}}" without an opening equivalent
> be really treated as an error? Input like "/foo}}bar" is pretty legal.
> - How would "{coun{town}try}" be parsed? This should be illegal input.
> - If nested braces are not allowed, why is the stack necessary? Could its
> use be avoided if mismatched braces are treated as errors?
>
> To me (if worth anything) left-to-right greedy parsing sounds like the
> obvious approach but as John mentioned it is "not as simple as 'looking
> for...'". It is very likely that I am overlooking something.
>
> John, could you please ellaborate on your statement?
>
> Of course, when I say "left-to-right greedy parsing" I assume the
> following:
> - Nested braces are not allowed
> - The parsing is performed from the left to the right. Therefore:
>   - If a "{" is encountered, it is considered to be either of the
> following (in this order)
>     1) an error if a "{" has already been encountered (and is not the
> previous character)
>     2) the first or second of two braces "{{"
>     3) an opening brace
>   - If a "}" is encountered, it is considered to be either of the
> following (in this order)
>     1) a match for a previous "{"
>     2) the first or second of double braces "}}"
>     3) an error (no matching opening brace)
> So, "{{{town}" and "{town}}}" are O.K. but "{town{{}" is invalid (the
> brace after the "n" is illegal).
>
> I am starting to have the feeling that using a backslash to escape literal
> braces would have been less confusing...
>
> >I think the parser need to have a stack for braces - I don't believe even
> a state machine can hold all the information we need - when we match up a
> pair we need to know what our state was before we opened that pair. My
> sketch of the processing would go:
> >
> >if the next character is {
> >a. if previous character was { and top of stack is { then change top of
> stack to {{
> >b. otherwise stack {  (remembering where it was seen)
> >
> >if the next character is }
> >a. if top of stack is {{ look for another } immediately following
> >    i. if next char is }, unstack the {{  - we have a matching pair  {{}}
> >    ii. if next char is not }, throw error or treat as literal }
> >b. if top of stack is {, unstack the {  - we have a matching pair {}
> >c. if stack is empty, throw error or treat as literal }
> >
> >at the end, the stack should be empty, assuming all { matched },
> otherwise unstack the extras and treat as literals (which is why we
> remembered their locations)
> >
> >To put it into words, I see } or }} as matching to the nearest unpaired {
> or {{, but always respecting nesting. I also see longer sequences of { taken
> as pairs until there's one or none left.
> >
> >So to my mind {{{{X}}}} parses as {{  {{  X  }}  }}  - even though that's
> a questionable construct.  Or do we want to add another rule saying that {{
> cannot be nested inside {{ ?
> >
> >How does that sound?
> >
> >Tony Rogers
> >CA, Inc
> >Senior Architect, Development
> >tony.rogers@ca.com
> >co-chair UDDI TC at OASIS
> >co-chair WS-Desc WG at W3C
> >
> >________________________________
> >
> >From: www-ws-desc-request@w3.org on behalf of John Kaputin (gmail)
> >Sent: Fri 12-Jan-07 8:50
> >To: Philippe Le Hegaret; www-ws-desc@w3.org
> >Subject: Testcases for HTTP location grammar [CR130]
> >
> >
> >Phillipe,
> >Today's working group call concluded that a grammar should define how the
> http location is parsed and you have the action, so as discussed I'm sending
> you some of my testcases. My post [1] is now captured as CR130.
> >
> >In deciding on the grammatical rules, things to consider include the
> precedence of double curly braces versus single braces and how to match
> pairs of single braces - e.g. by scanning from left to right, by 'inner
> most pair' (or whatever the terminology is), etc.
> >
> >When trying several approaches in Woden I found it's not as simple as
> 'find a left curly brace, check for a double brace, then scan for a right
> curly brace'. Also, it appeared from my initial interpretation of the spec
> that double curly braces should take precedence over single braces, but this
> produced some unexpected results. A better approach seems to be 'inner most
> pair' takes precedence, then double curly braces, then other single braces.
> >
> >Below are some test cases using different approaches. "Valid/invalid"
> simply indicates whether non-paired single braces end up in the parsed
> string (literal single braces are okay).
> >
> >Inner-most pair, then doubles, then unpaired singles. town=Paris:
> >
> >"{town}"       > {town}           > "Paris"     > valid
> >"{{town}}"     > {,{town},}       > "{Paris}"   > invalid
> >"{{{town}}}"   > {{,{town},}}     > "{Paris}"   > valid
> >"{{{{town}}}}" > {{,{,{town},}},} > "{{Paris}}" > invalid
> >"{{town}"      > {,{town}         > "{Paris"    > invalid
> >"{{{town}"     > {{,{town}        > "{Paris"    > valid
> >"{town}}"      > {town},}         > "Paris}"    > invalid
> >"{town}}}"     > {town},}}        > "Paris}"    > valid
> >
> >Double braces first, then pairs of singles left-to-right. town=Paris:
> >
> >"{town}"       > {town}           > "Paris"     > valid
> >"{{town}}"     > {{,town,}}       > "{town}"    > valid
> >"{{{town}}}"   > {{,{,town,}},}   > "{{town}}"  > invalid
> >"{{{{town}}}}" > {{,{{,town,}},}} > "{{Paris}}" > invalid
> >"{{town}"      > {{,town,}        > "{town}"    > invalid
> >"{{{town}"     > {{,{town}        > "{Paris"    > valid
> >"{town}}"      > {,town,}}        > "{town}"    > invalid
> >"{town}}}"     > {,town,}},}      > "{town}}"   > invalid
> >
> >Other test cases:
> >
> >""                      (is an   empty string location valid?)
> >"/temperature/"
> >"/temperature/{town}/"
> >"/temperature/{town}/{state}/{country}"
> >"/temperature/{town}/{{{state}}}/{country}"
> >
> >It would be good if the spec could include similar examples and/or if the
> test suite covered the grammar.
> >
> >regards,
> >John Kaputin
> >
> >[1] http://lists.w3.org/Archives/Public/www-ws-desc/2007Jan/0045.html
> >
> >
>
> --
> Best regards,
> Georgi Georgiev
>
Received on Friday, 12 January 2007 15:00:13 UTC