Problems with HTML 4.0 decl on Unix

Hi,

I've run into some problems trying to get the HTML 4.0 SGML stuff
(DTDs and HTML4.decl) to work on Unix systems.

I'm using the most recent version of everything (SP v1.2.1, and the
HTML 4.0 materials as taken from the CVS repository in html4-src as
of about an hour ago), and it seems that the UCS-4 decl uses characters
that are too big for modern Unix systems to understand.

I've tested it using SP 1.2.1 on Solaris 5.5.1 and on Redhat Linux with
kernel 2.0.30, and SP is compiled with -DSP_MULTI_BYTE .

When I run "nsgmls -s -c sgml/HTML4.cat sgml/HTML4.decl ~/file.html"
using the files produced in the "sgml" directory after a "make" in
html4-src, I get:

nsgmls:HTML4.decl:21:29:W: characters in the document character set with
    numbers exceeding 65535 not supported

I discussed this with Ian on IRC, and he noted that a "make check"
checks the spec against its own DTDs, and that works fine.

However, I see that "make check" does this:

    check: all
        @for i in $(MAINOBJS) $(APPENDIXES) $(REFS) $(INDEXES) ; \
        do echo checking $$i...; $(NSGMLS) -s -c sgml/HTML4.cat $$i; done; \
        echo checking done.

so it isn't using HTML4.decl; it must be using the SGML declaration
that's compiled into nsgmls?

To use the HTML4.decl, I believe the command is:

    nsgmls -s -c sgml/HTML4.cat sgml/HTML4.decl ~/file.html

and that produces the error I quoted above.

The line it's complaining about in HTML4.decl is:

                  160 1113952 160

(it's complaining about the 1113952 being too large.)

I understand that you recently changed this number from what it was
previously (2147483486), to get around the NAMELEN problem (which
requires these number to be 8 characters or less). But it seems that
this new number, 1113952, is still too large on all the Unix systems
I've tried it on.

My somewhat uninformed diagnosis of this is: modern Unix systems are
not capable of handling UCS-4; they can only do UCS-2.

So: any ideas? I need to get this to work for the HTML validation service;
currently I'm using the HTML4.decl that was shipped with the 970708
snapshot of the HTML 4.0 materials:

    http://www.w3.org/TR/WD-html40-970708/sgml/HTML4.decl

but apparently that's UCS-2, not UCS-4.

Maybe you need to ship two decls with the HTML 4.0 materials, one which
is UCS-4 and one which is UCS-2 for systems that aren't capable of UCS-4?

Or maybe it's not necessary to use HTML4.decl at all? Is it only for
use on systems which can support it, and optional on others? (How
important is it that this HTML4.decl be used and not some other one?)

If it *is* important, I believe something needs to change in the
HTML4.decl for it to be useful to people using nsgmls on Unix systems.
I'm in way over my head here, but I've tested this pretty thoroughly.

If you like, I can bring this up on comp.text.sgml to see if any of
the SGML gurus there can help.

Thanks,

Gerald
-- 
Gerald Oskoboiny            <gerald@w3.org>  +1 617 253 2920
System Administrator, W3C   http://www.w3.org/People/Gerald/
World Wide Web Consortium, MIT Labatory for Computer Science
545 Technology Square, Room NE43-353  Cambridge MA 02139 USA

Received on Friday, 24 October 1997 17:01:59 UTC