Re: Any C++ programmers around? (was: Unix --> NT (source code stuff))

[ Could the Grand Poobahs of Jade-Devel yell at me a bit when they  ]
[ feel I'm going too far off topic? This is being cross-posted from ]
[ the www-validator@w3.org list because I'm trying to recruit C++   ]
[ programmers to do OpenSP development (for my own nefarious        ]
[ purposes, obviously ;D), but it's very oriented towards use in    ]
[ the W3C HTMl Validation Service <URL:http://validator.w3.org/> so ]
[ it may not be all that interesting to OpenJade people.            ]
[                                                                   ]
[ Any objections? Adam? Brandon? Anyone?                            ]


On 06.03.01 at 22:24, Nick Kew <nick@webthing.com> wrote:

>On Tue, 6 Mar 2001, Terje Bless wrote:
>
>>* Specify all environment stuff on the command line.
>>  -- Right now we're setting SP_ENCODING et al in the environment; this
>>     is messy and falls apart in mod_perl land.
>
>nsgmls from mod_perl??? Rather you than me!

Well, we'll blow up spectacularly under mod_perl anyway so why not go for
broke...? :-)


>But again, I've had to do that to run it from the Web, so a commandline
>variant should be equally straightforward (though a hack).

Well, the point is that (Open)SP expects some things to be specified in
environment variables instead of as switches on the command line. This is a
nice feature addition for the command line -- as you won't have to specify
the switches every time -- and kinda works in CGI land -- because the
environment goes away with each invocation -- but it's a sordid mess when
you move to mod_perl or other persistent interpreters where the lifetime of
the environment (parent process) spans several invocations of SP.

We're currently futzing around with SGML_CATALOG_FILES, SGML_SEARCH_PATH,
SP_CHARSET_FIXED, and SP_ENCODING. In particular, SP_CHARSET_FIXED and
SP_ENCODING are "magical" in that they are necessary to enable XML mode.


>>* Ability to say: "use this SGML Declaration and this DTD".
>>  -- SGML Open Catalogs are fine and dandy an all, but for some things it
>>     would be much less painfull to say "use this" on the command line.
>
>I'd like to, but that'll be a longer-term thing.  I'd like it still better
>if someone with a much deeper knowledge of SGML than mine looked at it.

Et tu, Brute? Aren't there any real SGML gurus around that could help my
poor tortured brain -- and Nick's, apparently :-) -- tackle SGML issues? I
barely understand half of what the SP man pages are trying to tell me
because they speak in SGML-alese (i.e. in tounges for the good it does me)
and XML is "double the fun" (that was your cue Sean! ;D).


>> * Blue Sky: A Perl (XS) Module Interface
>
>I don't see myself getting involved with that.  If I'm hacking SP,
>that's because I don't want to wrap it in something else - like Perl.

There is already a Perl interface, but it's not usefull for us. It mainly
deals with building groves and SAX and whatnot. Getting it to churn through
a file and report all the errors and something ESIS-ish doesn't appear to
be possible. I'll go bug the Perl-XML gurus again; maybe they've come up
with something clever since last I checked.

If anyone else should feel inclined to become my new hero -- :-) -- I'm
looking for the ewquivalent of the current "onsgmls" executable converted
into a Perl XS module. Or rather more specifically, I'd like a Perl module
that provides a Perl interface to libsp -- that would enable me to rewrite
onsgmls in pure Perl -- and another that inherits from said module and adds
the features that are implemented in "onsgmls" today.

Just say... No to SAX! ...No to Groves! :-)

In consolation, it should be a pretty easy task if you A) know C++ and B) a
little about Perl XS modules. Perl comes with tools to automate parts of
the job and man pages that describe the process fairly well (at least, it
looked pretty good to my untrained eyes even if I can't do it myself ;D).
To all appearances it's mainly a mechanical job of doing data type
conversions from/to Perl/C++ and similar things.


>> * Blue Sky: Configurable Error Format
>>   -- The error messages are an exception for most SGML Processors, but
>>      for the Validator they are the norm. Being able to play tricks with
>>      the format and fields of the error output would be usefull.
>>      Reporting context a bonus!
>
>I've looked at that a little, and I've implemented a compile-time option
>to switch between JJC's format and HTML-ised format for Code Valet.
>I'll be doing some more work in this field in due course.

Well, the reason I'm so gung ho on switching to OpenSP is that it has a
switch "-n" that outputs message numbers ("relevant clauses") with error
messages: "onsgmls:OSPF<0>:1:1:1:E: DOCTYPE Missing" meaning it detected
that there was no DOCTYPE on line one, character one, and this violates
clause number one (of some ISO standard presumably). Since we're wrapping
SP in a Perl CGI app, it's much easier to parse out the error _number_ (or
some other semi-unique identifier) then the free-form text message.

Other usefull things to have in an easily parseable format is stuff like
containing element (last opened element), asking for warnings about
"expected foo, but got bar, assuming baz" so we have a way to report when
someone forgot to close their TD or puts weird stuff in the HEAD section
that will implicitly close HEAD and open BODY.



BTW, since I'm yelling about SGML gurus and C++... Did anyone ever have any
ideas about why some errors get reported only once with a HTML 4.01 Strict
DTD, but multiple times with the HTML 4.01 Transitional DTD? Either this is
an intentional difference in the two DTDs -- one that I can't find or
understand the point of (I didn't even know this was possible to express in
a DTD!) -- or a bug in all SP-based parsers. In particular, a bogus
attribute on the IMG element gets reported only once with strict.dtd, but
at every occurence in loose.dtd, using lq-nsgmls, JJC/SP nsgmls, and
OpenJade's OpenSP.



PS. Michael, how is your Python these days? How about XML? Goingware's
singularily impressive resume suggests your advice and understanding would
be a treasure trove, even if you don't have the time for direct code
contributions. In particular, I'm looking at Xerces to provide some
up-to-date XML support; specifically the XML Schema support that's due Real
Soon Now. I'm stumbling on the fact that A) I don't XML and B) Xerces-P is
in limbo, Xerces-C is fallen behind, and Xerces-J is utterly
incomprehensible to me. :-) The Xalan, Jakarta, Foo and Bar gizmos that pop
up as intrinsic to Xerces, but which I know nothing about, don't exactly
help either. Mayhap you could shed some light?

Since you know both XML and Python, perhaps you've looked at the W3C/LTG
XML Schema validator XSV? Since I understand neither of those I'm at a bit
of a disadvantage when it comes to figuring out what I might be able to use
it for and how. Ideas?




<DISCLAIMER>
  I have no affiliation with the W3C other than as an occasional
  contributer of code to the Validator. When I talk about moving
  to OpenSP, or adding this or that feature, I'm talking about
  what _I_ want to do locally. Any and all changes at the W3C end
  are subject to Gerald's approval and the priorities set by the
  W3C. I don't propose to speak for anyone but myself and it costs
  $5K/year to speak for the W3C. :-)

  IOW, don't shoot me if the W3C thinks I'm talking out of my
  backside; and don't shoot the W3C if I really _am_ talking out
  of said orifice. :-)
</>




Anyone that actually bothered to read this far probably has asocial
tendencies and, quite fairly, blame _me_ for developing them. For penance
I'll go out on a Pub Crawl. I just have to add One More Feature... :-)



-- 
Terje, you are a sick and twisted individual, and I
think I speak for all of us when I say, "Thank you!"

               -- John Gruber <gruber@barebones.com>

Received on Friday, 9 March 2001 06:27:22 UTC