W3C home > Mailing lists > Public > www-lib@w3.org > October to December 1996

Alternate SGML.c module in wwwlib 5.0a...?

From: Markku Savela <msa@msa.tte.vtt.fi>
Date: Mon, 11 Nov 1996 16:46:48 +0200 (EET)
Message-Id: <199611111446.QAA13166@msa.tte.vtt.fi>
To: www-lib@w3.org
Cc: msa@msa.tte.vtt.fi
Hi,

Some months ago I was asking for a simplified SGML.c, which wouldn't
attempt to fix incorrect SGML.

I have now made a rather quick hack for simplified SGML.c. I took the
standard SGML.c from libwww 5.0a and made following changes:

- no elelement stack is maintained (=> support for </> is gone),

- no error checking for tags, the code only parses the tags and
  attributes, but does not have any idea what tags are legal in what
  contexts. All tags and attributes listed in HTMLPDTD.h are just
  passed as is to the upstream by start_element/end_element calls,

- skipping of comment declarations (actually, any declarations <! .. >).

- text content is passed upstream in bigger chunks instead of char at
  time (if possible),

- fixed the missing entity termination problem (e.g. "&foo<P>" does
  not lose the beginning "<"),

- I do not allocate string storage for the attribute values, instead
  the HTChunk string is used as temp storage, and in start_element,
  the attribute values are just pointers into this chunk.

Icky things...

- some may not like the way I used enumeration symbols as labels in
  the switch statement (I got perverse satisfaction out of it), but
  probably should not really do such things, and maybe even I used
  gotos too much,

- in general, it may not be easy to follow what goes on with the
  switch...

- it seems to work for me, but not really tested too extensively.

and things to watch...

- if your upstream application application assumed that the attribute
  values are malloced strings and "grabbed" them off the attribute
  value list, then it will fail miserably (as the values now are just
  pointers to the internal buffer),

- any upstream thing that relied on SGML to do some basic consistency
  checking and correction on parsed HTML, will fail, as this parser
  does none of that, tags (which are listed in HTMLPDTD) are passed on
  as is.

I'm attaching the compressed source below. Any comments, and bug fixes
are wellcome...

begin 644 SGML.c.gz
M'XL( /$-AS(" ]T\:U?C1I:?Q:^XT-,@&6,:,IF=V&TZ/-P-66BS8&8G)\GQ
MR'+95I E1Y*;,(__OO?>JI)*LF2@N^?LGNTS$VRIZM9]OZK*^PU+_X/;#U>7
M+6^CT; ^]#[V;HXO^0E<']_<]F[@M'_6PW?TVO8<_'K]X\W%A_,!7%T,X."[
M[[YMT:OK0+B)@(D?)RG$PAU#.L.ORR  +UH\QOYTED*2NJF8BS %/Y3O_4!D
M$!G.]_8K!_YP,6XKK)J?X*#UQP-:Z$_[;_YC_\TA'!ZVO_VV?7@ D_@Q21_O
MH??[ OZ@4!S,_ 3FT7B)@/WY(N#E$G!#.!_<IHC8'*+1K\)+6S"(8.'&B+0;
MTDRFF1!J@H?C4H$8(BPY&AYFOC<#_.[*23'.1P+D6YJ.KVQO&<>X7/#H*!!C
M&#W"2/CA%&<E"7YWX6QPAHR(EUZZC$63IKHA/4_=>"I2B:9\.8:(UW;U\FF$
M.,71 _..T1@CJ.5DPJRC_\.?X+T8P7?? %@G?NC&CX"XNMY,)+#$X2VX")$+
M$]<3Q"5_XN,SGOAG^&$9P'=_!'A_95V$R3(@%DQB(6P'_T1S&-*7''40DKD\
M_6/TB44$ //$M9 $?P'CZ"',48T)_;D?^G,W0"J1V6ET+T+_[R)F+E@X*UJ 
M&P21YZ;$LF1!:$ZBF(&X*0(=+1&I3VZP%(F:A%3!(O+#5,2)9!#C2/.]V3*\
M1U5+4M1'1'-_8V._ 9?^*":^^*$7+,=2!Q/ EZ_TDZWD,1F+16NV93P[']RE
M?I"4'][R4N6GI[1R\2&K,S[9>#46$S\4</'Q+\>7%V=@[QTXA)A%^G1+%H**
M%"$]OZ<.1!.#@::*X_^41CWXZ0P$BA@\Y!TSV;T7)M]1;"AT-TP9P/Y&^K@0
MB 2(<(E23:;S8,B6N8'2@W]L6+=#6KP)M\/ 1[:Z 7U,W:GZ,YRZBR;0,!()
M/:2_\NGM4/RV=(.$/K&8^/4$H0RCA0B;-"MD>&% L")^(N*('GGX!SDV(=PN
M;OO#PS>'A\,?KGE$XM&(<10$+B^)I"$X@A.%;N+YOL1YXY4(4:MI2O+;,D(+
MY&GY1WPM_Z1^^DB??D5),7$T9\XOYV.<G*A/8_G)B^;# _5!?S]DAOT+<@YV
M-EB2%Z2-(:KYJ90CG+FI"YE=DQ#WUO\C,4E+@Z%V7%H\J!Q)FKFSTP#5 !I^
MXG8LU&X_G(G8)\_#-ILY/82G)I(B#L?I&!KXG\Z&93H<!<SU4A_',D#EECQ^
MPV *'JHAWW<L8ZQRF61O.'C@3J&A/",Q&I=$:P7])#/K(6KC2,3XFC!DSM$P
M^3>1"ZA)^BFY,*%P8I.#AC1]M08[&)Q)4Q=IC,Q!ZU#.881N$VV#)N?R R5%
MZZ3?OX1%+!)<Y:>KX[\.CP>#FXN3NT'O]A=FR_O G29MPRF10<KQ[Q@HK<\6
M4#F]/YDDR*D:QX7S6;-(G;2[N+X;G-K>S ';;BCWL'>D!+5WM%BF0V_FHJ='
M<(Z=#9 2P8@V<QP3U(D]:@9K8(W0"=]7P!DU(4!(&:C!S?%I[\">-%T'?MZP
MQA'\PY^ S3K&[QQ4P0&B)>PMF52HX+D%-*>#UH.A+1#V&Z<(\Y!@-D>? Q5G
M%>"21;[W,<8>9\+ZR*I6:X=D?=<W%W\Y'O1(9]AD"$(. &Q6;%)Y=(O2L(C_
MT(#$T89*RM'(522!+@U'3F=/4,"L*#-,CY"ST4,3_":@ YM(?<]"N!NC?Z=(
MF"M<Z,ZE[N-C&Z=VWS093I?7D,8TC"9#<S6+,*-!<$2K@7I""\);> /O@" A
MGO[N@0-ML'DL?G7 D4.1+,O')SQN5P[8P\_._J&#T"P&U25%]C ?1"^YL',$
M?O)_:1':3602#2:ATH1N%\6$9AH+="HA^&PBR=+S1((V-HF6*#R?'0KC\*\-
M/7+OH",M1?K=<\RD,._+Q%0E8)(M0K^^.[F\."U*3GI&3(Z608HT;&UU$/0[
M-N=,'3Y%_AAFO([IND@6=NYL01E.K6HHMTCZ@"ME9E9PDQO$A8\8N-J C,)'
MW>['N\O+)CQ@,H3Y5W+O+Q;D+M#QA9QJN=-6JR6]#W(6OYI"HZP;UUK199M5
M.!>(#T==8'GHJ=8*?F6G32K2*8S4OM/_!=_]V+OMY/+M*- H1DO9^M:=(B%7
M[]>)S/R00:^3G\.MIIJ%_Y)F)<=0Z5$,3B=3DN>@K;(PUKB+$!TVBE>[WX)2
ME=+/^AAN.H]J;6$ 5>J2J0<B<R8P?^#<SG @41@\*F6@@,,\:S*]_C2,2"?0
MYU$! A&&D_C!3X0,PR38*IZ9"E(U9(5CFQG+G(*\9:A[<CYJ TW+Y\D0C<]$
MD AZI2+*UG&1WZ0/DL:Q4@:M#1DH&3SWCL:4;.U"<0F'[.DS] (HZW##'?31
M[B?,JA\BP$0(,>&Z<M/0E5Q59'*YHB$R?8=^2 /B1ZY?$D;&Q1J%X[^<*5T[
MUD68%(@82R:J).O42DU9ITVF#VK(-7R.1ADC,!/<.Y*0V)=1J#!G%0:;?%X3
MP%8"T\J">8S2.'U)A'IY@,+8I!=&/\4^D"SOE!H9Z+P2?/-)L(A70A6O5!^E
M+*LNL9),KLBL?*?:1:H_Y)XF6C]4FAA&J5R6.)XL O<1=1.XX" DM"5IWZHF
M2V\J'3XGE3O;.]II$O-,N2\($7+#]J*;=.@[+'9WV?)Y:F.1N]O<#'K("-4<
M*!G!BA)C.3940ZOCIPJ4[*H4AIHN6@;@[?[KY(@I*H2 *OX;BQ$)JS(@M[I7
M4E)\1JS*J<,2/4ZIH51-8C6="4U:1VE&'<=I,C\60'4%D>4/Z],'S2B)\=M*
M/N5S5:&ED]2L\.(\9,-"SW&R] /9UT.68I6_4KY(3RVU,\&_Y*U5JX\!R(:,
MK+Q:8%\M4=&H?Q-Q-93(E!W)"0*L'TG#,Z!)DP&,A.?2A)4FSQQUGTI/S#K\
MO^.BXV6LYVMP4<@@J-FVB(@TWPTP4,:"^TUB[+3PO4RG*0G"'*B#"=-;*#*_
M [N[/FF/E RG-Z4@B,^DIZ)L#=J%,%D7H_3,&LTMJ$^E[I+#J%-??EG.S/BA
MG1M[H^$P$E@[<<R+1.PAHQ\7N@/G(YN]4FXDZRI2190R-38)A<!/V"BL^N;&
M:B!$9"U6)U8F:D_(]H>/28WLH9$\97YF+B(9:DSEV!FI%"F$WU!W6/;YFK_[
M*2\I76W29EQ)5I9%DS*ORL\I+;$L=SQ&KB4:;MX#Q1405]/<=6:O\VSZ9A<[
M+PV:4RX-F(Z"%WA.3.V6 BF)^VL$4=:!0@-9A<&7E7Z9"AJ5GR0TC[28^^A8
M6QEJUT59509NF^MTRD4BB=9PX,.O]$^VGZ^7H\#WX$JDLVB<<(^9#.,T6J*K
M]&;"NT?3H0:^K-E&+CV(8!2E:30G?4+3]NXWX?OON;E6[CH,)\$RF<':B*'(
MK/(;/'LEV<C#V>IRU.)_,CY1EVR9Z$9>F'94A6'+%Z0C%<@P\%5D'*HIS@?#
M_G^2M!4QV0JJL8=E>2"P5BTY4H='#-_?]'I9S[R3<82!UM/JCB(,BW5IQR4Z
M&'PB"H0K6JNH8V@5>9UPOIR*WLU-_Z:>D =J^#ZC_3!J\IR@5!AHMV110[B4
MI.>X6ZJUNEH)E&H%SC^[,-+J@1:;<C3=L#8L&>'M8&\/7=,;(V7GN1Z.:XQV
M=\F;)%C[>C.351BDS:X$_9U&653D_86V]!?T7W3VM*$2N",14*W\ZW*^H-V2
MF9![4CM'.V1^KJR[5"HGYW&T94<_PWQZ)'#&&,7FI;1_QKLO9_W>+7SL#^#]
M,8;X05\O>-[C[=+BGH>LVA=Q1"Y,R$*/@&,FHA>DY$?F-'+[!Y(9.Q ,7;R!
MRB6B3*-X@BX+90*DVKYZLXA@VTB>P]Y3.LJZ^A^VM]?U4KA6+R0?IH(6\AI"
MO%L@G$>,4"/OR6987A08"F.* I.[7^9[DIR@'),V0XF-E(LD2':@69&3S[-,
M<0)!X#00":44%5,!-<N=4LV=0B@> C\4+9 [EE&([(OQ5<#UE7S)"^MYO(:&
M&^+X^0)K*KT5"IA_D3FUX6XZV\S9;]@$,2VW""49Z'9AY^=PA\1AKZ;DF[)M
M-^Q=70]^E&+->G-5,J#O#%RSW\H[;IB59:_1?*)LN!*1_-K>6 _9$*8<7K5M
M5R#NS3??[#R-N4@\B?CN+K.IF@JYR4<IBK$>&)K.2VZ7UY-&0,R7OL>29>R)
MK7TEOI)Z;>D\G:J)3%25",=1D;V4,$*.Q-NOC@2E0\H[>+&[@,N+ 9W9:')E
M0\71Z=GQX%A[%UKY1CY!GSM/7,6F%5HD)A7*UU7*I]9QX)T<:F7;PYA!WJJ=
MO7I&H'9GG.#&MC*N/=VZQ*$C,:%/*%TRV<TZ5'GWN*/76%&72I<3!NVB=KS]
MMVO'EW)4[9"O<K6(=\%2"A9MSJMA"J_0)GF<N^/,X;E2 .SGU!ZI9AVZHIU]
MR3K5/2+!<INTS\UI*4P-24GT[;X"4N52E.(PZJJIJ9Y6>Q6# '07N5B[W9T_
M*,PJEI%'!SI%S<0I=OT4/F-0U+-USO9)'90H%!#^?@?^^4^@3R?U>!2..'Q%
M?)B^ CHG&3H_U*/SE;$H4->NBQKK@D7%$K410A64%R%H4Z,S!$UX</TT,5*,
MA?#\"19VVA>I^-^7 WS,5;&^Q01LY ?45.6-FHA\LH@_B3AI9;'?H%2MR#3J
MBF"Q3#U;^A/T,4Z6$F!2>7=]W;NA(W>;76W*MEUT/5W ^OT=662V(=)>D\_]
M9,[>._REG$JHWJ,.,Q<3II)*%IB[F(TWF1V:<<H[<^$GQ84Y)V8P.NR O?E2
M5%"G\AZM<HK:@38K]Z^DV\LT,:=%$7#E)XQZ&V+A12@:U8!H:?2E%^-XT#&<
MOMFHTTT+B:P:EGG6]0F8S&UQ=*O5PC2?,W9V_E<7?^V=H7^D9@H?*QSYF+!.
MU.C[8#F>HEJ.EBF^FD;1&*9+ZD!Q,R$[9ZA&/T3Q?4()P".UZ:9R\X^&AH5 
MDXVFY('FT&YQ359@M(-S="5)A62VE,U6Q)L\O3_-4O583#"E#SW,K6/HR>VR
M"H/!Y*KD$%Z5HS9*&6#[%;&0P,,-AHL^%0Z-?:B3CJ=RMJKLN,K-,'Z:EH:J
M^<YO^G<?SM%,LQ&PB1E+(_,QBFBU)]A3&TV59/+TC%(_<8-P.4?+ES:QSE6L
MI@-Z=+9S:.=-!OQ7V#4L%G-K!6N$_X[RR(:M&37%:@[R3%.I5AVK0FEX=)F)
MGJ$K90ZBDKB>MYS+,ZVN43-^O+LZZ=UH[7\9I[,37IWG,)[P2A+/#2<EW[+U
M>KS5A&W9?.]V#QS#+=F$J>[+UWHZ=EA;VZ\0S.$SG=C>P8H;^S\G?;/&=*=M
M%B/FE9R99M(W$\8OMR!*&&A;4?7YLR-6#$.=V.FLA,K</>VO,H?E;@;MS6Y6
M7F0[='][^SK9WY'=[4U]OB$_XF#*40NXTE&-U<O<L1GL+Q5DFY^)*Q"RFU^,
M['PMKD\84V4KRSC<A**R>8^E IG:5A@RA;8*%+V:7/3HD.V@J[Y2=B!I+=7,
M;[F;V%5FA*-I2[$F+:HZL@>^VF\WY51]RNICO\QM8X^23LK56".?CY$;-^NJ
MVF<<EBF4H/*@^#KCS@:UZTM##:/H#'@2.X3>[PNU5YP?T$*6'QE%ZW^?8_J3
M^0*) DW]X"XP_TH?J+]KG+%2 3PWE"-E*$Q8N>-<+(04[03-H)*^KC0ZX[2\
MT5TZVDE%N!JA9OE\;T;"RPN,RB8L([#:J5@]=F5V936:)M.X'E1LR#]WRRPI
M'H4L,F6=]RTJQ"JD<D,?T[RLHZ_[PMS2+S%/-^YUDUCRD?KWU[(/3\PD#C,0
MXPH#&:R:E)T24.<:J"QDN:D^>T@!,UO7$$BM\P*FN?+TZ$:AU(&GSK(]1[1U
MBE%A4OIQ>XU)=5]B5[)/GL__3*,RPE5W31^ KYS46R)%]0\B+0E,45*9;TB(
MS(WCC!+:G))%Q3KJ3V3'JW1<M)8!\B"95<P[5,JA0Y#<].YR,1A&$B"?Y.%P
MU7F"C[4-V9UZCJJ[,U5RV%K38BO-,@61)\KE&,!OZB) /NTEGBP'6>?*G,\I
MFDHG>$O5T]IX^G1H?(F[M*PBO4I<K*__Q1^K]*_86=OY7^+!^C#_F?PH]5E?
MSHZM_T?<*#<7)".R%OP7%4BFIU*MT:HJ*565T3-JXD8A<Y8);/K,5+Y0##-F
M<C=6(:8[?+0W$?M3GV[D>71C;!$MZ"S #-,'VO+-$GF^,J?VX;W[%O4/^0[P
M1+A\;T^/PT<QA@@_ICN%%^ID> 3(/;U'S\,8"A_Y66VT,8WR9)+FQ&9:*CWR
M@[N2GNRHZW,JC]J:H5#R&"B8K=<\$4E7^@1?J4S(.PI'U87H9C'"YG=%_D9\
M>.T]KP+-U+C2UO3ESZI2N8Q3L3]2;I"L[P)GD&LK(8U)72 T,36LNS"M.KEX
MNKGS\JZ<+F,\-^2#$+8\$!P\9I>2'!@++W!CEX]'574]Y^,2TGOUF07?OGUA
M-B+O[G9>F/G(N[]5L_ZM/"WPA?!N4])Z>\^WZ)>C0(",:<8-T:KXM:;14K=6
M8JXE3]8\L=9Z]M4OQD*4:U$9Q04&;3#-LSNS%2!SW8!W$@;OC3^Q#J_2:M&N
M'%;,:HT50IY0N,/UI!RV5;>P0 @\EY+YF G!:<7,KGCM8^500M6)A,KCCJO'
M!)4XGS@G:-Y25(#S0X9&@=K$+ *7)R4)1&@GCK-FZ;SA7KTZ'_Y[SK+;7A,.
MG*]_FE?].(1EW&SORUOL?"5^S8U]\^QWU<U\HD)>3T8=V%!)U(:UE3^G\)6?
M]<V^Q$+HSWRV5'_)>,F_P9 +5K]GCC75!7*0IY'E[YGP5>E>B(E0S?7%NH/Y
M*!'5:93'%,\&9TUP@PBUB4R,?\-$GH.UZ'=0.(.BZP+ZISNXB^'2)B==>L?4
M,8Z6:/\B::E?I%"W<7/=8$I"\6"7?[- 'ITO_@2!^MV!E2LT&ERF:1UU3%E]
M):/,EH0&72H?GAY?7O9/[0/4<,QOHDDVP'$<LT-,QG8WZ+^_ZEW).^C#D4#&
M;O$]P\SV,<_&1;9S69LW;K+SM#I+)GH/#O\LS\9_H-]Y&3W*!'2^5 ?PS;.Y
M.%.>T"V=.>8;//RC#,8K)9^,9/,''QJ.;6=T-AS%3<9>7^]ZZA\B?#>=M:D]
IT>\7,5T)@'2%@3KBX+\MWZU1S>ZJ!O:;W-%ETI1>X'\ D<M?P\M(  "=
 
end


--
Markku Savela (msa@hemuli.tte.vtt.fi),     Technical Research Centre of Finland
Multimedia Systems, P.O.Box 1203,FIN-02044 VTT,http://www.vtt.fi/tte/staff/msa/
Received on Monday, 11 November 1996 09:46:53 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 23 April 2007 18:18:27 GMT