- From: Terje Bless <link@tss.no>
- Date: Fri, 9 Mar 2001 11:43:35 +0100
- To: Nick Kew <nick@webthing.com>
- cc: "Michael D. Crawford" <crawford@goingware.com>, W3C Validator <www-validator@w3.org>, OpenJade-Devel <openjade-devel@lists.sourceforge.net>
[ Could the Grand Poobahs of Jade-Devel yell at me a bit when they ] [ feel I'm going too far off topic? This is being cross-posted from ] [ the www-validator@w3.org list because I'm trying to recruit C++ ] [ programmers to do OpenSP development (for my own nefarious ] [ purposes, obviously ;D), but it's very oriented towards use in ] [ the W3C HTMl Validation Service <URL:http://validator.w3.org/> so ] [ it may not be all that interesting to OpenJade people. ] [ ] [ Any objections? Adam? Brandon? Anyone? ] On 06.03.01 at 22:24, Nick Kew <nick@webthing.com> wrote: >On Tue, 6 Mar 2001, Terje Bless wrote: > >>* Specify all environment stuff on the command line. >> -- Right now we're setting SP_ENCODING et al in the environment; this >> is messy and falls apart in mod_perl land. > >nsgmls from mod_perl??? Rather you than me! Well, we'll blow up spectacularly under mod_perl anyway so why not go for broke...? :-) >But again, I've had to do that to run it from the Web, so a commandline >variant should be equally straightforward (though a hack). Well, the point is that (Open)SP expects some things to be specified in environment variables instead of as switches on the command line. This is a nice feature addition for the command line -- as you won't have to specify the switches every time -- and kinda works in CGI land -- because the environment goes away with each invocation -- but it's a sordid mess when you move to mod_perl or other persistent interpreters where the lifetime of the environment (parent process) spans several invocations of SP. We're currently futzing around with SGML_CATALOG_FILES, SGML_SEARCH_PATH, SP_CHARSET_FIXED, and SP_ENCODING. In particular, SP_CHARSET_FIXED and SP_ENCODING are "magical" in that they are necessary to enable XML mode. >>* Ability to say: "use this SGML Declaration and this DTD". >> -- SGML Open Catalogs are fine and dandy an all, but for some things it >> would be much less painfull to say "use this" on the command line. > >I'd like to, but that'll be a longer-term thing. I'd like it still better >if someone with a much deeper knowledge of SGML than mine looked at it. Et tu, Brute? Aren't there any real SGML gurus around that could help my poor tortured brain -- and Nick's, apparently :-) -- tackle SGML issues? I barely understand half of what the SP man pages are trying to tell me because they speak in SGML-alese (i.e. in tounges for the good it does me) and XML is "double the fun" (that was your cue Sean! ;D). >> * Blue Sky: A Perl (XS) Module Interface > >I don't see myself getting involved with that. If I'm hacking SP, >that's because I don't want to wrap it in something else - like Perl. There is already a Perl interface, but it's not usefull for us. It mainly deals with building groves and SAX and whatnot. Getting it to churn through a file and report all the errors and something ESIS-ish doesn't appear to be possible. I'll go bug the Perl-XML gurus again; maybe they've come up with something clever since last I checked. If anyone else should feel inclined to become my new hero -- :-) -- I'm looking for the ewquivalent of the current "onsgmls" executable converted into a Perl XS module. Or rather more specifically, I'd like a Perl module that provides a Perl interface to libsp -- that would enable me to rewrite onsgmls in pure Perl -- and another that inherits from said module and adds the features that are implemented in "onsgmls" today. Just say... No to SAX! ...No to Groves! :-) In consolation, it should be a pretty easy task if you A) know C++ and B) a little about Perl XS modules. Perl comes with tools to automate parts of the job and man pages that describe the process fairly well (at least, it looked pretty good to my untrained eyes even if I can't do it myself ;D). To all appearances it's mainly a mechanical job of doing data type conversions from/to Perl/C++ and similar things. >> * Blue Sky: Configurable Error Format >> -- The error messages are an exception for most SGML Processors, but >> for the Validator they are the norm. Being able to play tricks with >> the format and fields of the error output would be usefull. >> Reporting context a bonus! > >I've looked at that a little, and I've implemented a compile-time option >to switch between JJC's format and HTML-ised format for Code Valet. >I'll be doing some more work in this field in due course. Well, the reason I'm so gung ho on switching to OpenSP is that it has a switch "-n" that outputs message numbers ("relevant clauses") with error messages: "onsgmls:OSPF<0>:1:1:1:E: DOCTYPE Missing" meaning it detected that there was no DOCTYPE on line one, character one, and this violates clause number one (of some ISO standard presumably). Since we're wrapping SP in a Perl CGI app, it's much easier to parse out the error _number_ (or some other semi-unique identifier) then the free-form text message. Other usefull things to have in an easily parseable format is stuff like containing element (last opened element), asking for warnings about "expected foo, but got bar, assuming baz" so we have a way to report when someone forgot to close their TD or puts weird stuff in the HEAD section that will implicitly close HEAD and open BODY. BTW, since I'm yelling about SGML gurus and C++... Did anyone ever have any ideas about why some errors get reported only once with a HTML 4.01 Strict DTD, but multiple times with the HTML 4.01 Transitional DTD? Either this is an intentional difference in the two DTDs -- one that I can't find or understand the point of (I didn't even know this was possible to express in a DTD!) -- or a bug in all SP-based parsers. In particular, a bogus attribute on the IMG element gets reported only once with strict.dtd, but at every occurence in loose.dtd, using lq-nsgmls, JJC/SP nsgmls, and OpenJade's OpenSP. PS. Michael, how is your Python these days? How about XML? Goingware's singularily impressive resume suggests your advice and understanding would be a treasure trove, even if you don't have the time for direct code contributions. In particular, I'm looking at Xerces to provide some up-to-date XML support; specifically the XML Schema support that's due Real Soon Now. I'm stumbling on the fact that A) I don't XML and B) Xerces-P is in limbo, Xerces-C is fallen behind, and Xerces-J is utterly incomprehensible to me. :-) The Xalan, Jakarta, Foo and Bar gizmos that pop up as intrinsic to Xerces, but which I know nothing about, don't exactly help either. Mayhap you could shed some light? Since you know both XML and Python, perhaps you've looked at the W3C/LTG XML Schema validator XSV? Since I understand neither of those I'm at a bit of a disadvantage when it comes to figuring out what I might be able to use it for and how. Ideas? <DISCLAIMER> I have no affiliation with the W3C other than as an occasional contributer of code to the Validator. When I talk about moving to OpenSP, or adding this or that feature, I'm talking about what _I_ want to do locally. Any and all changes at the W3C end are subject to Gerald's approval and the priorities set by the W3C. I don't propose to speak for anyone but myself and it costs $5K/year to speak for the W3C. :-) IOW, don't shoot me if the W3C thinks I'm talking out of my backside; and don't shoot the W3C if I really _am_ talking out of said orifice. :-) </> Anyone that actually bothered to read this far probably has asocial tendencies and, quite fairly, blame _me_ for developing them. For penance I'll go out on a Pub Crawl. I just have to add One More Feature... :-) -- Terje, you are a sick and twisted individual, and I think I speak for all of us when I say, "Thank you!" -- John Gruber <gruber@barebones.com>
Received on Friday, 9 March 2001 06:27:22 UTC