Message-Id: <9212010342.AA06729@pixel.convex.com> To: "Tony Johnson (415) 926 2278" <TONYJ@scs.slac.stanford.edu> Cc: www-talk@nxoc01.cern.ch Subject: Re: quotes around tags and escape sequences In-Reply-To: Your message of "Mon, 30 Nov 92 18:59:00 PDT." <69FDBB0140801933@SCS.SLAC.STANFORD.EDU> Date: Mon, 30 Nov 92 21:42:47 CST From: Dan Connolly <connolly@pixel.convex.com> >Three questions, > > 1) If we now expect quotes around tags, are we still meant to understand % as > an escape character within tags? In short, I think so. These dang things get parsed twice: once by the SGML parser, and once by the URL parser. After the HREF=, the SGML parser is looking for an attribute value, which may be a token or a literal. The syntax of a URL conflicts with the syntax of a token, so you've got to use a literal, i.e. you've got to put quotes around it. To compute the value of the HREF attribute, the SGML parser grabs everything between ""s (or ''s, actually. In fact, it expands &entity; references too!). Then you hand the value of the HREF attribute to the URL parser. It better be a legal URL at this point. I don't know if the URL parsing code can handle spaces in a URL or not. If not, they've got to be represented by the %nn construct. NOTE: There's an SGML construct: &#SPACE; or { designed for the same purpose. We might want to remove the quoting mechanism from the URL spec, and say that you use whatever quoting mechanisms the enclosing data format requires. > 2) Which of the following do I need to support, and which is the "approved" > method of accessing gopher? > > href="gopher://gopher.micro.umn.edu:70/00/Some Stuff" This is legal SGML -- dunno if it's a legal URL. > href="gopher://gopher.micro.umn.edu:70/00/Some%20Stuff" This is probably your best bet for the current linemode code. > href=gopher://gopher.micro.umn.edu:70/00/Some%20Stuff SGML parsers won't grok this. For starters, you've got kind of a bad design for handling SGML attributes: you parse them twice: once to stick them in the param resource, and once to take them out of the param resource and stick them in the href and name resources. Rather than a param resource, the parsing code should build an XtArglist with the attribute names and values. Then it can just call XtSetValues when it's done parsing the start tag. This would be a minor modification to my current version of the MidasWWW code using my HTML parsing library. > 3) Is the % meant to act as an escape character in search strings? ie > > href="http://slacvm.slac.stanford.edu/FIND/PARTICLE?PI%nn" > > meant to find entries for PI+ ? (where nn is the ascii code for +). Yeah... I've got a bunch of questions like this one. My understanding is that everything after the scheme: is defined by the individual scheme. It's not safe to just replace %nn by the corresponding ASCII character in all URLs. The %nn quoting mechanism is specific to the gopher scheme. (It might be used by other schemes too, but it's not a universal mechanism.) I've got some design ideas for the WWW library that I think would obviate the need for implemntors like Tony to even mess with this stuff. Details as the develop... Tony: I'll send you my HTML parsing work separately. Dan