Re: Any C++ programmers around? (was: Unix --> NT (source code stuff))

From: Terje Bless (link@tss.no)
Date: Fri, Mar 09 2001

  • Next message: Masayasu Ishikawa: "No DOCTYPE != HTML 2.0 (was Re: Table Validation)"

    Date: Fri,  9 Mar 2001 11:43:35 +0100
    From: Terje Bless <link@tss.no>
    To: Nick Kew <nick@webthing.com>
    cc: "Michael D. Crawford" <crawford@goingware.com>, W3C Validator <www-validator@w3.org>, OpenJade-Devel <openjade-devel@lists.sourceforge.net>
    Message-ID: <20010309122712-r01010600-e9a94c36@10.0.0.2>
    Subject: Re: Any C++ programmers around? (was: Unix --> NT (source code   stuff))
    
    [ Could the Grand Poobahs of Jade-Devel yell at me a bit when they  ]
    [ feel I'm going too far off topic? This is being cross-posted from ]
    [ the www-validator@w3.org list because I'm trying to recruit C++   ]
    [ programmers to do OpenSP development (for my own nefarious        ]
    [ purposes, obviously ;D), but it's very oriented towards use in    ]
    [ the W3C HTMl Validation Service <URL:http://validator.w3.org/> so ]
    [ it may not be all that interesting to OpenJade people.            ]
    [                                                                   ]
    [ Any objections? Adam? Brandon? Anyone?                            ]
    
    
    On 06.03.01 at 22:24, Nick Kew <nick@webthing.com> wrote:
    
    >On Tue, 6 Mar 2001, Terje Bless wrote:
    >
    >>* Specify all environment stuff on the command line.
    >>  -- Right now we're setting SP_ENCODING et al in the environment; this
    >>     is messy and falls apart in mod_perl land.
    >
    >nsgmls from mod_perl??? Rather you than me!
    
    Well, we'll blow up spectacularly under mod_perl anyway so why not go for
    broke...? :-)
    
    
    >But again, I've had to do that to run it from the Web, so a commandline
    >variant should be equally straightforward (though a hack).
    
    Well, the point is that (Open)SP expects some things to be specified in
    environment variables instead of as switches on the command line. This is a
    nice feature addition for the command line -- as you won't have to specify
    the switches every time -- and kinda works in CGI land -- because the
    environment goes away with each invocation -- but it's a sordid mess when
    you move to mod_perl or other persistent interpreters where the lifetime of
    the environment (parent process) spans several invocations of SP.
    
    We're currently futzing around with SGML_CATALOG_FILES, SGML_SEARCH_PATH,
    SP_CHARSET_FIXED, and SP_ENCODING. In particular, SP_CHARSET_FIXED and
    SP_ENCODING are "magical" in that they are necessary to enable XML mode.
    
    
    >>* Ability to say: "use this SGML Declaration and this DTD".
    >>  -- SGML Open Catalogs are fine and dandy an all, but for some things it
    >>     would be much less painfull to say "use this" on the command line.
    >
    >I'd like to, but that'll be a longer-term thing.  I'd like it still better
    >if someone with a much deeper knowledge of SGML than mine looked at it.
    
    Et tu, Brute? Aren't there any real SGML gurus around that could help my
    poor tortured brain -- and Nick's, apparently :-) -- tackle SGML issues? I
    barely understand half of what the SP man pages are trying to tell me
    because they speak in SGML-alese (i.e. in tounges for the good it does me)
    and XML is "double the fun" (that was your cue Sean! ;D).
    
    
    >> * Blue Sky: A Perl (XS) Module Interface
    >
    >I don't see myself getting involved with that.  If I'm hacking SP,
    >that's because I don't want to wrap it in something else - like Perl.
    
    There is already a Perl interface, but it's not usefull for us. It mainly
    deals with building groves and SAX and whatnot. Getting it to churn through
    a file and report all the errors and something ESIS-ish doesn't appear to
    be possible. I'll go bug the Perl-XML gurus again; maybe they've come up
    with something clever since last I checked.
    
    If anyone else should feel inclined to become my new hero -- :-) -- I'm
    looking for the ewquivalent of the current "onsgmls" executable converted
    into a Perl XS module. Or rather more specifically, I'd like a Perl module
    that provides a Perl interface to libsp -- that would enable me to rewrite
    onsgmls in pure Perl -- and another that inherits from said module and adds
    the features that are implemented in "onsgmls" today.
    
    Just say... No to SAX! ...No to Groves! :-)
    
    In consolation, it should be a pretty easy task if you A) know C++ and B) a
    little about Perl XS modules. Perl comes with tools to automate parts of
    the job and man pages that describe the process fairly well (at least, it
    looked pretty good to my untrained eyes even if I can't do it myself ;D).
    To all appearances it's mainly a mechanical job of doing data type
    conversions from/to Perl/C++ and similar things.
    
    
    >> * Blue Sky: Configurable Error Format
    >>   -- The error messages are an exception for most SGML Processors, but
    >>      for the Validator they are the norm. Being able to play tricks with
    >>      the format and fields of the error output would be usefull.
    >>      Reporting context a bonus!
    >
    >I've looked at that a little, and I've implemented a compile-time option
    >to switch between JJC's format and HTML-ised format for Code Valet.
    >I'll be doing some more work in this field in due course.
    
    Well, the reason I'm so gung ho on switching to OpenSP is that it has a
    switch "-n" that outputs message numbers ("relevant clauses") with error
    messages: "onsgmls:OSPF<0>:1:1:1:E: DOCTYPE Missing" meaning it detected
    that there was no DOCTYPE on line one, character one, and this violates
    clause number one (of some ISO standard presumably). Since we're wrapping
    SP in a Perl CGI app, it's much easier to parse out the error _number_ (or
    some other semi-unique identifier) then the free-form text message.
    
    Other usefull things to have in an easily parseable format is stuff like
    containing element (last opened element), asking for warnings about
    "expected foo, but got bar, assuming baz" so we have a way to report when
    someone forgot to close their TD or puts weird stuff in the HEAD section
    that will implicitly close HEAD and open BODY.
    
    
    
    BTW, since I'm yelling about SGML gurus and C++... Did anyone ever have any
    ideas about why some errors get reported only once with a HTML 4.01 Strict
    DTD, but multiple times with the HTML 4.01 Transitional DTD? Either this is
    an intentional difference in the two DTDs -- one that I can't find or
    understand the point of (I didn't even know this was possible to express in
    a DTD!) -- or a bug in all SP-based parsers. In particular, a bogus
    attribute on the IMG element gets reported only once with strict.dtd, but
    at every occurence in loose.dtd, using lq-nsgmls, JJC/SP nsgmls, and
    OpenJade's OpenSP.
    
    
    
    PS. Michael, how is your Python these days? How about XML? Goingware's
    singularily impressive resume suggests your advice and understanding would
    be a treasure trove, even if you don't have the time for direct code
    contributions. In particular, I'm looking at Xerces to provide some
    up-to-date XML support; specifically the XML Schema support that's due Real
    Soon Now. I'm stumbling on the fact that A) I don't XML and B) Xerces-P is
    in limbo, Xerces-C is fallen behind, and Xerces-J is utterly
    incomprehensible to me. :-) The Xalan, Jakarta, Foo and Bar gizmos that pop
    up as intrinsic to Xerces, but which I know nothing about, don't exactly
    help either. Mayhap you could shed some light?
    
    Since you know both XML and Python, perhaps you've looked at the W3C/LTG
    XML Schema validator XSV? Since I understand neither of those I'm at a bit
    of a disadvantage when it comes to figuring out what I might be able to use
    it for and how. Ideas?
    
    
    
    
    <DISCLAIMER>
      I have no affiliation with the W3C other than as an occasional
      contributer of code to the Validator. When I talk about moving
      to OpenSP, or adding this or that feature, I'm talking about
      what _I_ want to do locally. Any and all changes at the W3C end
      are subject to Gerald's approval and the priorities set by the
      W3C. I don't propose to speak for anyone but myself and it costs
      $5K/year to speak for the W3C. :-)
    
      IOW, don't shoot me if the W3C thinks I'm talking out of my
      backside; and don't shoot the W3C if I really _am_ talking out
      of said orifice. :-)
    </>
    
    
    
    
    Anyone that actually bothered to read this far probably has asocial
    tendencies and, quite fairly, blame _me_ for developing them. For penance
    I'll go out on a Pub Crawl. I just have to add One More Feature... :-)
    
    
    
    -- 
    Terje, you are a sick and twisted individual, and I
    think I speak for all of us when I say, "Thank you!"
    
                   -- John Gruber <gruber@barebones.com>