[Prev][Next][Index][Thread]
Alternate SGML.c module in wwwlib 5.0a...?
Hi,
Some months ago I was asking for a simplified SGML.c, which wouldn't
attempt to fix incorrect SGML.
I have now made a rather quick hack for simplified SGML.c. I took the
standard SGML.c from libwww 5.0a and made following changes:
- no elelement stack is maintained (=> support for </> is gone),
- no error checking for tags, the code only parses the tags and
attributes, but does not have any idea what tags are legal in what
contexts. All tags and attributes listed in HTMLPDTD.h are just
passed as is to the upstream by start_element/end_element calls,
- skipping of comment declarations (actually, any declarations <! .. >).
- text content is passed upstream in bigger chunks instead of char at
time (if possible),
- fixed the missing entity termination problem (e.g. "&foo<P>" does
not lose the beginning "<"),
- I do not allocate string storage for the attribute values, instead
the HTChunk string is used as temp storage, and in start_element,
the attribute values are just pointers into this chunk.
Icky things...
- some may not like the way I used enumeration symbols as labels in
the switch statement (I got perverse satisfaction out of it), but
probably should not really do such things, and maybe even I used
gotos too much,
- in general, it may not be easy to follow what goes on with the
switch...
- it seems to work for me, but not really tested too extensively.
and things to watch...
- if your upstream application application assumed that the attribute
values are malloced strings and "grabbed" them off the attribute
value list, then it will fail miserably (as the values now are just
pointers to the internal buffer),
- any upstream thing that relied on SGML to do some basic consistency
checking and correction on parsed HTML, will fail, as this parser
does none of that, tags (which are listed in HTMLPDTD) are passed on
as is.
I'm attaching the compressed source below. Any comments, and bug fixes
are wellcome...
begin 644 SGML.c.gz
M'XL( /$-AS(" ]T\:U?C1I:?Q:^XT-,@&6,:,IF=V&TZ/-P-66BS8&8G)\GQ
MR'+95I E1Y*;,(__OO?>JI)*LF2@N^?LGNTS$VRIZM9]OZK*^PU+_X/;#U>7
M+6^CT; ^]#[V;HXO^0E<']_<]F[@M'_6PW?TVO8<_'K]X\W%A_,!7%T,X."[
M[[YMT:OK0+B)@(D?)RG$PAU#.L.ORR +UH\QOYTED*2NJF8BS %/Y3O_4!D
M$!G.]_8K!_YP,6XKK)J?X*#UQP-:Z$_[;_YC_\TA'!ZVO_VV?7@ D_@Q21_O
MH??[ OZ@4!S,_ 3FT7B)@/WY(N#E$G!#.!_<IHC8'*+1K\)+6S"(8.'&B+0;
MTDRFF1!J@H?C4H$8(BPY&AYFOC<#_.[*23'.1P+D6YJ.KVQO&<>X7/#H*!!C
M&#W"2/CA%&<E"7YWX6QPAHR(EUZZC$63IKHA/4_=>"I2B:9\.8:(UW;U\FF$
M.,71 _..T1@CJ.5DPJRC_\.?X+T8P7?? %@G?NC&CX"XNMY,)+#$X2VX")$+
M$]<3Q"5_XN,SGOAG^&$9P'=_!'A_95V$R3(@%DQB(6P'_T1S&-*7''40DKD\
M_6/TB44$ //$M9 $?P'CZ"',48T)_;D?^G,W0"J1V6ET+T+_[R)F+E@X*UJ
M&P21YZ;$LF1!:$ZBF(&X*0(=+1&I3VZP%(F:A%3!(O+#5,2)9!#C2/.]V3*\
M1U5+4M1'1'-_8V._ 9?^*":^^*$7+,=2!Q/ EZ_TDZWD,1F+16NV93P[']RE
M?I"4'][R4N6GI[1R\2&K,S[9>#46$S\4</'Q+\>7%V=@[QTXA)A%^G1+%H**
M%"$]OZ<.1!.#@::*X_^41CWXZ0P$BA@\Y!TSV;T7)M]1;"AT-TP9P/Y&^K@0
MB 2(<(E23:;S8,B6N8'2@W]L6+=#6KP)M\/ 1[:Z 7U,W:GZ,YRZBR;0,!()
M/:2_\NGM4/RV=(.$/K&8^/4$H0RCA0B;-"MD>&% L")^(N*('GGX!SDV(=PN
M;OO#PS>'A\,?KGE$XM&(<10$+B^)I"$X@A.%;N+YOL1YXY4(4:MI2O+;,D(+
MY&GY1WPM_Z1^^DB??D5),7$T9\XOYV.<G*A/8_G)B^;# _5!?S]DAOT+<@YV
M-EB2%Z2-(:KYJ90CG+FI"YE=DQ#WUO\C,4E+@Z%V7%H\J!Q)FKFSTP#5 !I^
MXG8LU&X_G(G8)\_#-ILY/82G)I(B#L?I&!KXG\Z&93H<!<SU4A_',D#EECQ^
MPV *'JHAWW<L8ZQRF61O.'C@3J&A/",Q&I=$:P7])#/K(6KC2,3XFC!DSM$P
M^3>1"ZA)^BFY,*%P8I.#AC1]M08[&)Q)4Q=IC,Q!ZU#.881N$VV#)N?R R5%
MZZ3?OX1%+!)<Y:>KX[\.CP>#FXN3NT'O]A=FR_O G29MPRF10<KQ[Q@HK<\6
M4#F]/YDDR*D:QX7S6;-(G;2[N+X;G-K>S ';;BCWL'>D!+5WM%BF0V_FHJ='
M<(Z=#9 2P8@V<QP3U(D]:@9K8(W0"=]7P!DU(4!(&:C!S?%I[\">-%T'?MZP
MQA'\PY^ S3K&[QQ4P0&B)>PMF52HX+D%-*>#UH.A+1#V&Z<(\Y!@-D>? Q5G
M%>"21;[W,<8>9\+ZR*I6:X=D?=<W%W\Y'O1(9]AD"$(. &Q6;%)Y=(O2L(C_
MT(#$T89*RM'(522!+@U'3F=/4,"L*#-,CY"ST4,3_":@ YM(?<]"N!NC?Z=(
MF"M<Z,ZE[N-C&Z=VWS093I?7D,8TC"9#<S6+,*-!<$2K@7I""\);> /O@" A
MGO[N@0-ML'DL?G7 D4.1+,O')SQN5P[8P\_._J&#T"P&U25%]C ?1"^YL',$
M?O)_:1':3602#2:ATH1N%\6$9AH+="HA^&PBR=+S1((V-HF6*#R?'0KC\*\-
M/7+OH",M1?K=<\RD,._+Q%0E8)(M0K^^.[F\."U*3GI&3(Z608HT;&UU$/0[
M-N=,'3Y%_AAFO([IND@6=NYL01E.K6HHMTCZ@"ME9E9PDQO$A8\8N-J C,)'
MW>['N\O+)CQ@,H3Y5W+O+Q;D+M#QA9QJN=-6JR6]#W(6OYI"HZP;UUK199M5
M.!>(#T==8'GHJ=8*?F6G32K2*8S4OM/_!=_]V+OMY/+M*- H1DO9^M:=(B%7
M[]>)S/R00:^3G\.MIIJ%_Y)F)<=0Z5$,3B=3DN>@K;(PUKB+$!TVBE>[WX)2
ME=+/^AAN.H]J;6$ 5>J2J0<B<R8P?^#<SG @41@\*F6@@,,\:S*]_C2,2"?0
MYU$! A&&D_C!3X0,PR38*IZ9"E(U9(5CFQG+G(*\9:A[<CYJ TW+Y\D0C<]$
MD AZI2+*UG&1WZ0/DL:Q4@:M#1DH&3SWCL:4;.U"<0F'[.DS] (HZW##'?31
M[B?,JA\BP$0(,>&Z<M/0E5Q59'*YHB$R?8=^2 /B1ZY?$D;&Q1J%X[^<*5T[
MUD68%(@82R:J).O42DU9ITVF#VK(-7R.1ADC,!/<.Y*0V)=1J#!G%0:;?%X3
MP%8"T\J">8S2.'U)A'IY@,+8I!=&/\4^D"SOE!H9Z+P2?/-)L(A70A6O5!^E
M+*LNL9),KLBL?*?:1:H_Y)XF6C]4FAA&J5R6.)XL O<1=1.XX" DM"5IWZHF
M2V\J'3XGE3O;.]II$O-,N2\($7+#]J*;=.@[+'9WV?)Y:F.1N]O<#'K("-4<
M*!G!BA)C.3940ZOCIPJ4[*H4AIHN6@;@[?[KY(@I*H2 *OX;BQ$)JS(@M[I7
M4E)\1JS*J<,2/4ZIH51-8C6="4U:1VE&'<=I,C\60'4%D>4/Z],'S2B)\=M*
M/N5S5:&ED]2L\.(\9,-"SW&R] /9UT.68I6_4KY(3RVU,\&_Y*U5JX\!R(:,
MK+Q:8%\M4=&H?Q-Q-93(E!W)"0*L'TG#,Z!)DP&,A.?2A)4FSQQUGTI/S#K\
MO^.BXV6LYVMP4<@@J-FVB(@TWPTP4,:"^TUB[+3PO4RG*0G"'*B#"=-;*#*_
M [N[/FF/E RG-Z4@B,^DIZ)L#=J%,%D7H_3,<MJ$^E[I+#J%-??EG.S/BA
MG1M[H^$P$E@[<<R+1.PAHQ\7N@/G(YN]4FXDZRI2190R-38)A<!/V"BL^N;&
M:B!$9"U6)U8F:D_(]H>/28WLH9$\97YF+B(9:DSEV!FI%"F$WU!W6/;YFK_[
M*2\I76W29EQ)5I9%DS*ORL\I+;$L=SQ&KB4:;MX#Q1405]/<=6:O\VSZ9A<[
M+PV:4RX-F(Z"%WA.3.V6 BF)^VL$4=:!0@-9A<&7E7Z9"AJ5GR0TC[28^^A8
M6QEJUT59509NF^MTRD4BB=9PX,.O]$^VGZ^7H\#WX$JDLVB<<(^9#.,T6J*K
M]&;"NT?3H0:^K-E&+CV(8!2E:30G?4+3]NXWX?OON;E6[CH,)\$RF<':B*'(
MK/(;/'LEV<C#V>IRU.)_,CY1EVR9Z$9>F'94A6'+%Z0C%<@P\%5D'*HIS@?#
M_G^2M!4QV0JJL8=E>2"P5BTY4H='#-_?]'I9S[R3<82!UM/JCB(,BW5IQR4Z
M&'PB"H0K6JNH8V@5>9UPOIR*WLU-_Z:>D =J^#ZC_3!J\IR@5!AHMV110[B4
MI.>X6ZJUNEH)E&H%SC^[,-+J@1:;<C3=L#8L&>'M8&\/7=,;(V7GN1Z.:XQV
M=\F;)%C[>C.351BDS:X$_9U&653D_86V]!?T7W3VM*$2N",14*W\ZW*^H-V2
MF9![4CM'.V1^KJR[5"HGYW&T94<_PWQZ)'#&&,7FI;1_QKLO9_W>+7SL#^#]
M,8;X05\O>-[C[=+BGH>LVA=Q1"Y,R$*/@&,FHA>DY$?F-'+[!Y(9.Q ,7;R!
MRB6B3*-X@BX+90*DVKYZLXA@VTB>P]Y3.LJZ^A^VM]?U4KA6+R0?IH(6\AI"
MO%L@G$>,4"/OR6987A08"F.* I.[7^9[DIR@'),V0XF-E(LD2':@69&3S[-,
M<0)!X#00":44%5,!-<N=4LV=0B@> C\4+9 [EE&([(OQ5<#UE7S)"^MYO(:&
M&^+X^0)K*KT5"IA_D3FUX6XZV\S9;]@$,2VW""49Z'9AY^=PA\1AKZ;DF[)M
M-^Q=70]^E&+->G-5,J#O#%RSW\H[;IB59:_1?*)LN!*1_-K>6 _9$*8<7K5M
M5R#NS3??[#R-N4@\B?CN+K.IF@JYR4<IBK$>&)K.2VZ7UY-&0,R7OL>29>R)
MK7TEOI)Z;>D\G:J)3%25",=1D;V4,$*.Q-NOC@2E0\H[>+&[@,N+ 9W9:')E
M0\71Z=GQX%A[%UKY1CY!GSM/7,6F%5HD)A7*UU7*I]9QX)T<:F7;PYA!WJJ=
MO7I&H'9GG.#&MC*N/=VZQ*$C,:%/*%TRV<TZ5'GWN*/76%&72I<3!NVB=KS]
MMVO'EW)4[9"O<K6(=\%2"A9MSJMA"J_0)GF<N^/,X;E2 .SGU!ZI9AVZHIU]
MR3K5/2+!<INTS\UI*4P-24GT[;X"4N52E.(PZJJIJ9Y6>Q6# '07N5B[W9T_
M*,PJEI%'!SI%S<0I=OT4/F-0U+-USO9)'90H%!#^?@?^^4^@3R?U>!2..'Q%
M?)B^ CHG&3H_U*/SE;$H4->NBQKK@D7%$K410A64%R%H4Z,S!$UX</TT,5*,
MA?#\"19VVA>I^-^7 WS,5;&^Q01LY ?45.6-FHA\LH@_B3AI9;'?H%2MR#3J
MBF"Q3#U;^A/T,4Z6$F!2>7=]W;NA(W>;76W*MEUT/5W ^OT=662V(=)>D\_]
M9,[>._REG$JHWJ,.,Q<3II)*%IB[F(TWF1V:<<H[<^$GQ84Y)V8P.NR O?E2
M5%"G\AZM<HK:@38K]Z^DV\LT,:=%$7#E)XQZ&V+A12@:U8!H:?2E%^-XT#&<
MOMFHTTT+B:P:EGG6]0F8S&UQ=*O5PC2?,W9V_E<7?^V=H7^D9@H?*QSYF+!.
MU.C[8#F>HEJ.EBF^FD;1&*9+ZD!Q,R$[9ZA&/T3Q?4()P".UZ:9R\X^&AH5
MDXVFY('FT&YQ359@M(-S="5)A62VE,U6Q)L\O3_-4O583#"E#SW,K6/HR>VR
M"H/!Y*KD$%Z5HS9*&6#[%;&0P,,-AHL^%0Z-?:B3CJ=RMJKLN,K-,'Z:EH:J
M^<YO^G<?SM%,LQ&PB1E+(_,QBFBU)]A3&TV59/+TC%(_<8-P.4?+ES:QSE6L
MI@-Z=+9S:.=-!OQ7V#4L%G-K!6N$_X[RR(:M&37%:@[R3%.I5AVK0FEX=)F)
MGJ$K90ZBDKB>MYS+,ZVN43-^O+LZZ=UH[7\9I[,37IWG,)[P2A+/#2<EW[+U
M>KS5A&W9?.]V#QS#+=F$J>[+UWHZ=EA;VZ\0S.$SG=C>P8H;^S\G?;/&=*=M
M%B/FE9R99M(W$\8OMR!*&&A;4?7YLR-6#$.=V.FLA,K</>VO,H?E;@;MS6Y6
M7F0[='][^SK9WY'=[4U]OB$_XF#*40NXTE&-U<O<L1GL+Q5DFY^)*Q"RFU^,
M['PMKD\84V4KRSC<A**R>8^E IG:5A@RA;8*%+V:7/3HD.V@J[Y2=B!I+=7,
M;[F;V%5FA*-I2[$F+:HZL@>^VF\WY51]RNICO\QM8X^23LK56".?CY$;-^NJ
MVF<<EBF4H/*@^#KCS@:UZTM##:/H#'@2.X3>[PNU5YP?T$*6'QE%ZW^?8_J3
M^0*) DW]X"XP_TH?J+]KG+%2 3PWE"-E*$Q8N>-<+(04[03-H)*^KC0ZX[2\
MT5TZVDE%N!JA9OE\;T;"RPN,RB8L([#:J5@]=F5V936:)M.X'E1LR#]WRRPI
M'H4L,F6=]RTJQ"JD<D,?T[RLHZ_[PMS2+S%/-^YUDUCRD?KWU[(/3\PD#C,0
MXPH#&:R:E)T24.<:J"QDN:D^>T@!,UO7$$BM\P*FN?+TZ$:AU(&GSK(]1[1U
MBE%A4OIQ>XU)=5]B5[)/GL__3*,RPE5W31^ KYS46R)%]0\B+0E,45*9;TB(
MS(WCC!+:G))%Q3KJ3V3'JW1<M)8!\B"95<P[5,JA0Y#<].YR,1A&$B"?Y.%P
MU7F"C[4-V9UZCJJ[,U5RV%K38BO-,@61)\KE&,!OZB) /NTEGBP'6>?*G,\I
MFDHG>$O5T]IX^G1H?(F[M*PBO4I<K*__Q1^K]*_86=OY7^+!^C#_F?PH]5E?
MSHZM_T?<*#<7)".R%OP7%4BFIU*MT:HJ*565T3-JXD8A<Y8);/K,5+Y0##-F
M<C=6(:8[?+0W$?M3GV[D>71C;!$MZ"S #-,'VO+-$GF^,J?VX;W[%O4/^0[P
M1+A\;T^/PT<QA@@_ICN%%^ID> 3(/;U'S\,8"A_Y66VT,8WR9)+FQ&9:*CWR
M@[N2GNRHZW,JC]J:H5#R&"B8K=<\$4E7^@1?J4S(.PI'U87H9C'"YG=%_D9\
M>.T]KP+-U+C2UO3ESZI2N8Q3L3]2;I"L[P)GD&LK(8U)72 T,36LNS"M.KEX
MNKGS\JZ<+F,\-^2#$+8\$!P\9I>2'!@++W!CEX]'574]Y^,2TGOUF07?OGUA
M-B+O[G9>F/G(N[]5L_ZM/"WPA?!N4])Z>\^WZ)>C0(",:<8-T:KXM:;14K=6
M8JXE3]8\L=9Z]M4OQD*4:U$9Q04&;3#-LSNS%2!SW8!W$@;OC3^Q#J_2:M&N
M'%;,:HT50IY0N,/UI!RV5;>P0 @\EY+YF G!:<7,KGCM8^500M6)A,KCCJO'
M!)4XGS@G:-Y25(#S0X9&@=K$+ *7)R4)1&@GCK-FZ;SA7KTZ'_Y[SK+;7A,.
MG*]_FE?].(1EW&SORUOL?"5^S8U]\^QWU<U\HD)>3T8=V%!)U(:UE3^G\)6?
M]<V^Q$+HSWRV5'_)>,F_P9 +5K]GCC75!7*0IY'E[YGP5>E>B(E0S?7%NH/Y
M*!'5:93'%,\&9TUP@PBUB4R,?\-$GH.UZ'=0.(.BZP+ZISNXB^'2)B==>L?4
M,8Z6:/\B::E?I%"W<7/=8$I"\6"7?[- 'ITO_@2!^MV!E2LT&ERF:1UU3%E]
M):/,EH0&72H?GAY?7O9/[0/4<,QOHDDVP'$<LT-,QG8WZ+^_ZEW).^C#D4#&
M;O$]P\SV,<_&1;9S69LW;K+SM#I+)GH/#O\LS\9_H-]Y&3W*!'2^5 ?PS;.Y
M.%.>T"V=.>8;//RC#,8K)9^,9/,''QJ.;6=T-AS%3<9>7^]ZZA\B?#>=M:D]
IT>\7,5T)@'2%@3KBX+\MWZU1S>ZJ!O:;W-%ETI1>X'\ D<M?P\M( "=
end
--
Markku Savela (msa@hemuli.tte.vtt.fi), Technical Research Centre of Finland
Multimedia Systems, P.O.Box 1203,FIN-02044 VTT,http://www.vtt.fi/tte/staff/msa/
Follow-Ups: