W3C home > Mailing lists > Public > public-cwm-bugs@w3.org > November 2004

predictiveParser.py seems to have problems with unicode

From: Yosi Scharf <syosi@mit.edu>
Date: Mon, 08 Nov 2004 15:41:14 -0500
Message-ID: <418FD9EA.7040904@mit.edu>
To: public-cwm-bugs@w3.org

predictiveParser.py seems to have problems with unicode.


[syosi@yosi grammar]$ PYTHONPATH=`pwd`/../.. python predictiveParser.py 
n3-selectors.n3 http://www.w3.org/2000/10/swap/grammar/n3.n3#document 
../test/i18n/hiragana.n3
  Loading n3-selectors.n3
  Loaded 1450 statements in 0.730000s, ie 1986.301370/s.
   ##### ERROR:  No definition of document
  ###### FAILED with 1 errors.
        No definition of document
[syosi@yosi grammar]$ less n3.n3
[syosi@yosi grammar]$ PYTHONPATH=`pwd`/../.. python predictiveParser.py 
n3-selectors.n3 http://www.w3.org/2000/10/swap/grammar/n3#document 
../test/i18n/hiragana.n3
  Loading n3-selectors.n3
  Loaded 1450 statements in 0.730000s, ie 1986.301370/s.
   
EOF
        Can start with: [":", "_", "a"]
        Can start with: ["<"]
        Can start with: ["_", "a"]
   WARNING: for n3:verb, <= indicates li23, but  < indicates li1
   WARNING: for n3:verb, => indicates li25, but  = indicates li24
   WARNING: for n3:verb, = indicates li24, but  => indicates li25
   WARNING: for n3:verb, < indicates li1, but  <= indicates li23
        Can start with: ["?"]
        Can start with: ["+", "-", "0"]
        Can start with: ["""]
   WARNING: for n3:dtlang, @ indicates li27, but  @is indicates ()
   WARNING: for n3:dtlang, @ indicates li27, but  @a indicates ()
   WARNING: for n3:dtlang, @ indicates li27, but  @of indicates ()
   WARNING: for n3:dtlang, @ indicates li27, but  @this indicates ()
   WARNING: for n3:dtlang, @ indicates li27, but  @has indicates ()
   WARNING: for n3:dtlang, ^^ indicates li48, but  ^ indicates ()
   WARNING: for n3:dtlang, @is indicates (), but  @ indicates li27
   WARNING: for n3:dtlang, @a indicates (), but  @ indicates li27
   WARNING: for n3:dtlang, ^ indicates (), but  ^^ indicates li48
   WARNING: for n3:dtlang, @of indicates (), but  @ indicates li27
   WARNING: for n3:dtlang, @this indicates (), but  @ indicates li27
   WARNING: for n3:dtlang, @has indicates (), but  @ indicates li27
        Can start with: ["a"]
  Ok for predictive parsing
    6) Looking at:  ....rdf
     $@prefix s:...
    6  @prefix means expand n3:document as [_g0, _g1, _g2, 
n3:statements_optional, eof]
     6  @prefix means expand _g0 as [n3:declaration, _g0]
      6  @prefix means expand n3:declaration as [u'@prefix', n3:qname, 
n3:explicituri, u'.']
       6) Looking at:  ...  @prefix $s: <http:/...
      Token matched to <s:> as pattern 
<(([a-zA-Z_][a-zA-Z0-9_]*)?:)?([a-zA-Z_][a-zA-Z0-9_]*)?>
       6) Looking at:  ...prefix s: $<http://ww...
      Token matched to <<http://www.w3.org/2000/01/rdf-schema#>> as 
pattern <<[^>]*>>
       6) Looking at:  ...-schema#> $.
     @pr...
       7) Looking at:  ...#> .
     $@prefix rd...
      7  @prefix means expand _g0 as [n3:declaration, _g0]
       7  @prefix means expand n3:declaration as [u'@prefix', n3:qname, 
n3:explicituri, u'.']
        7) Looking at:  ...  @prefix $rdf: <http...
       Token matched to <rdf:> as pattern 
<(([a-zA-Z_][a-zA-Z0-9_]*)?:)?([a-zA-Z_][a-zA-Z0-9_]*)?>
        7) Looking at:  ...efix rdf: $<http://ww...
       Token matched to <<http://www.w3.org/1999/02/22-rdf-syntax-ns#>> 
as pattern <<[^>]*>>
        7) Looking at:  ...ntax-ns#> $.
     @pr...
        8) Looking at:  ...#> .
     $@prefix : ...
       8  @prefix means expand _g0 as [n3:declaration, _g0]
        8  @prefix means expand n3:declaration as [u'@prefix', n3:qname, 
n3:explicituri, u'.']
         8) Looking at:  ...  @prefix $: <#>.
 
  ...
        Token matched to <:> as pattern 
<(([a-zA-Z_][a-zA-Z0-9_]*)?:)?([a-zA-Z_][a-zA-Z0-9_]*)?>
         8) Looking at:  ...@prefix : $<#>.
 
    ...
        Token matched to <<#>> as pattern <<[^>]*>>
         8) Looking at:  ...efix : <#>$.
 
      [...
         10) Looking at:  ...>.
 
      $[ s:label ...
        10  [ means expand _g0 as []
     10  [ means expand _g1 as []
     10  [ means expand _g2 as []
     10  [ means expand n3:statements_optional as [n3:statement, u'.', 
n3:statements_optional]
      10  [ means expand n3:statement as [n3:subject, n3:propertylist]
       10  [ means expand n3:subject as [n3:path]
        10  [ means expand n3:path as [n3:node, n3:pathtail]
         10  [ means expand n3:node as [u'[', n3:propertylist, u']']
          10) Looking at:  ...
 
      [ $s:label "M...
          10  a means expand n3:propertylist as [n3:verb, n3:object, 
n3:objecttail, n3:propertylisttail]
           10  a means expand n3:verb as [n3:path]
            10  a means expand n3:path as [n3:node, n3:pathtail]
             10  a means expand n3:node as [n3:symbol]
              10  a means expand n3:symbol as [n3:qname]
              Token matched to <s:label> as pattern 
<(([a-zA-Z_][a-zA-Z0-9_]*)?:)?([a-zA-Z_][a-zA-Z0-9_]*)?>
               10) Looking at:  ...[ s:label $"Martin J ...
             10  " means expand n3:pathtail as []
           10  " means expand n3:object as [n3:path]
            10  " means expand n3:path as [n3:node, n3:pathtail]
             10  " means expand n3:node as [n3:literal]
              10  " means expand n3:literal as [n3:string, n3:dtlang]
              Token matched to <"Martin J D\u00fcrst"> as pattern 
<("""[^"\\]*(?:(?:\\.|"(?!""))[^"\\]*)*""")|("[^"\\]*(?:\\.[^"\\]*)*")([a-z]+(-[a-z0-9]+)*)?>
               10) Looking at:  ...u00fcrst" $; :script ...
               10  ; means expand n3:dtlang as []
             10  ; means expand n3:pathtail as []
           10  ; means expand n3:objecttail as []
           10  ; means expand n3:propertylisttail as [u';', n3:propertylist]
            Traceback (most recent call last):
  File "predictiveParser.py", line 400, in ?
    p.parse(str)
  File "predictiveParser.py", line 300, in parse
    return parser.parseProduction(parser.top, str, tok, here)
  File "predictiveParser.py", line 324, in parseProduction
    tok, here = parser.parseProduction(term, str, tok, here)
  File "predictiveParser.py", line 324, in parseProduction
    tok, here = parser.parseProduction(term, str, tok, here)
  File "predictiveParser.py", line 324, in parseProduction
    tok, here = parser.parseProduction(term, str, tok, here)
  File "predictiveParser.py", line 324, in parseProduction
    tok, here = parser.parseProduction(term, str, tok, here)
  File "predictiveParser.py", line 324, in parseProduction
    tok, here = parser.parseProduction(term, str, tok, here)
  File "predictiveParser.py", line 324, in parseProduction
    tok, here = parser.parseProduction(term, str, tok, here)
  File "predictiveParser.py", line 324, in parseProduction
    tok, here = parser.parseProduction(term, str, tok, here)
  File "predictiveParser.py", line 332, in parseProduction
    tok, here = parser.token(str, next)  # Next token
  File "predictiveParser.py", line 255, in token
    if parser.verb: progress( "%i) Looking at:  ...%s$%s..." % (
  File "/home/syosi/cvs-trunk/WWW/2000/10/swap/diag.py", line 14, in 
progress
    sys.stderr.write(utf_8_encode("%s " % (a,))[0])
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 40: 
ordinal not in range(128)
Received on Monday, 8 November 2004 20:42:07 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 19:52:00 UTC