- From: Markku Savela <msa@msa.tte.vtt.fi>
- Date: Mon, 11 Nov 1996 16:46:48 +0200 (EET)
- To: www-lib@w3.org
- Cc: msa@msa.tte.vtt.fi
Hi, Some months ago I was asking for a simplified SGML.c, which wouldn't attempt to fix incorrect SGML. I have now made a rather quick hack for simplified SGML.c. I took the standard SGML.c from libwww 5.0a and made following changes: - no elelement stack is maintained (=> support for </> is gone), - no error checking for tags, the code only parses the tags and attributes, but does not have any idea what tags are legal in what contexts. All tags and attributes listed in HTMLPDTD.h are just passed as is to the upstream by start_element/end_element calls, - skipping of comment declarations (actually, any declarations <! .. >). - text content is passed upstream in bigger chunks instead of char at time (if possible), - fixed the missing entity termination problem (e.g. "&foo<P>" does not lose the beginning "<"), - I do not allocate string storage for the attribute values, instead the HTChunk string is used as temp storage, and in start_element, the attribute values are just pointers into this chunk. Icky things... - some may not like the way I used enumeration symbols as labels in the switch statement (I got perverse satisfaction out of it), but probably should not really do such things, and maybe even I used gotos too much, - in general, it may not be easy to follow what goes on with the switch... - it seems to work for me, but not really tested too extensively. and things to watch... - if your upstream application application assumed that the attribute values are malloced strings and "grabbed" them off the attribute value list, then it will fail miserably (as the values now are just pointers to the internal buffer), - any upstream thing that relied on SGML to do some basic consistency checking and correction on parsed HTML, will fail, as this parser does none of that, tags (which are listed in HTMLPDTD) are passed on as is. I'm attaching the compressed source below. Any comments, and bug fixes are wellcome... begin 644 SGML.c.gz M'XL( /$-AS(" ]T\:U?C1I:?Q:^XT-,@&6,:,IF=V&TZ/-P-66BS8&8G)\GQ MR'+95I E1Y*;,(__OO?>JI)*LF2@N^?LGNTS$VRIZM9]OZK*^PU+_X/;#U>7 M+6^CT; ^]#[V;HXO^0E<']_<]F[@M'_6PW?TVO8<_'K]X\W%A_,!7%T,X."[ M[[YMT:OK0+B)@(D?)RG$PAU#.L.ORR +UH\QOYTED*2NJF8BS %/Y3O_4!D M$!G.]_8K!_YP,6XKK)J?X*#UQP-:Z$_[;_YC_\TA'!ZVO_VV?7@ D_@Q21_O MH??[ OZ@4!S,_ 3FT7B)@/WY(N#E$G!#.!_<IHC8'*+1K\)+6S"(8.'&B+0; MTDRFF1!J@H?C4H$8(BPY&AYFOC<#_.[*23'.1P+D6YJ.KVQO&<>X7/#H*!!C M&#W"2/CA%&<E"7YWX6QPAHR(EUZZC$63IKHA/4_=>"I2B:9\.8:(UW;U\FF$ M.,71 _..T1@CJ.5DPJRC_\.?X+T8P7?? %@G?NC&CX"XNMY,)+#$X2VX")$+ M$]<3Q"5_XN,SGOAG^&$9P'=_!'A_95V$R3(@%DQB(6P'_T1S&-*7''40DKD\ M_6/TB44$ //$M9 $?P'CZ"',48T)_;D?^G,W0"J1V6ET+T+_[R)F+E@X*UJ M&P21YZ;$LF1!:$ZBF(&X*0(=+1&I3VZP%(F:A%3!(O+#5,2)9!#C2/.]V3*\ M1U5+4M1'1'-_8V._ 9?^*":^^*$7+,=2!Q/ EZ_TDZWD,1F+16NV93P[']RE M?I"4'][R4N6GI[1R\2&K,S[9>#46$S\4</'Q+\>7%V=@[QTXA)A%^G1+%H** M%"$]OZ<.1!.#@::*X_^41CWXZ0P$BA@\Y!TSV;T7)M]1;"AT-TP9P/Y&^K@0 MB 2(<(E23:;S8,B6N8'2@W]L6+=#6KP)M\/ 1[:Z 7U,W:GZ,YRZBR;0,!() M/:2_\NGM4/RV=(.$/K&8^/4$H0RCA0B;-"MD>&% L")^(N*('GGX!SDV(=PN M;OO#PS>'A\,?KGE$XM&(<10$+B^)I"$X@A.%;N+YOL1YXY4(4:MI2O+;,D(+ MY&GY1WPM_Z1^^DB??D5),7$T9\XOYV.<G*A/8_G)B^;# _5!?S]DAOT+<@YV M-EB2%Z2-(:KYJ90CG+FI"YE=DQ#WUO\C,4E+@Z%V7%H\J!Q)FKFSTP#5 !I^ MXG8LU&X_G(G8)\_#-ILY/82G)I(B#L?I&!KXG\Z&93H<!<SU4A_',D#EECQ^ MPV *'JHAWW<L8ZQRF61O.'C@3J&A/",Q&I=$:P7])#/K(6KC2,3XFC!DSM$P M^3>1"ZA)^BFY,*%P8I.#AC1]M08[&)Q)4Q=IC,Q!ZU#.881N$VV#)N?R R5% MZZ3?OX1%+!)<Y:>KX[\.CP>#FXN3NT'O]A=FR_O G29MPRF10<KQ[Q@HK<\6 M4#F]/YDDR*D:QX7S6;-(G;2[N+X;G-K>S ';;BCWL'>D!+5WM%BF0V_FHJ=' M<(Z=#9 2P8@V<QP3U(D]:@9K8(W0"=]7P!DU(4!(&:C!S?%I[\">-%T'?MZP MQA'\PY^ S3K&[QQ4P0&B)>PMF52HX+D%-*>#UH.A+1#V&Z<(\Y!@-D>? Q5G M%>"21;[W,<8>9\+ZR*I6:X=D?=<W%W\Y'O1(9]AD"$(. &Q6;%)Y=(O2L(C_ MT(#$T89*RM'(522!+@U'3F=/4,"L*#-,CY"ST4,3_":@ YM(?<]"N!NC?Z=( MF"M<Z,ZE[N-C&Z=VWS093I?7D,8TC"9#<S6+,*-!<$2K@7I""\);> /O@" A MGO[N@0-ML'DL?G7 D4.1+,O')SQN5P[8P\_._J&#T"P&U25%]C ?1"^YL',$ M?O)_:1':3602#2:ATH1N%\6$9AH+="HA^&PBR=+S1((V-HF6*#R?'0KC\*\- M/7+OH",M1?K=<\RD,._+Q%0E8)(M0K^^.[F\."U*3GI&3(Z608HT;&UU$/0[ M-N=,'3Y%_AAFO([IND@6=NYL01E.K6HHMTCZ@"ME9E9PDQO$A8\8N-J C,)' MW>['N\O+)CQ@,H3Y5W+O+Q;D+M#QA9QJN=-6JR6]#W(6OYI"HZP;UUK199M5 M.!>(#T==8'GHJ=8*?F6G32K2*8S4OM/_!=_]V+OMY/+M*- H1DO9^M:=(B%7 M[]>)S/R00:^3G\.MIIJ%_Y)F)<=0Z5$,3B=3DN>@K;(PUKB+$!TVBE>[WX)2 ME=+/^AAN.H]J;6$ 5>J2J0<B<R8P?^#<SG @41@\*F6@@,,\:S*]_C2,2"?0 MYU$! A&&D_C!3X0,PR38*IZ9"E(U9(5CFQG+G(*\9:A[<CYJ TW+Y\D0C<]$ MD AZI2+*UG&1WZ0/DL:Q4@:M#1DH&3SWCL:4;.U"<0F'[.DS] (HZW##'?31 M[B?,JA\BP$0(,>&Z<M/0E5Q59'*YHB$R?8=^2 /B1ZY?$D;&Q1J%X[^<*5T[ MUD68%(@82R:J).O42DU9ITVF#VK(-7R.1ADC,!/<.Y*0V)=1J#!G%0:;?%X3 MP%8"T\J">8S2.'U)A'IY@,+8I!=&/\4^D"SOE!H9Z+P2?/-)L(A70A6O5!^E M+*LNL9),KLBL?*?:1:H_Y)XF6C]4FAA&J5R6.)XL O<1=1.XX" DM"5IWZHF M2V\J'3XGE3O;.]II$O-,N2\($7+#]J*;=.@[+'9WV?)Y:F.1N]O<#'K("-4< M*!G!BA)C.3940ZOCIPJ4[*H4AIHN6@;@[?[KY(@I*H2 *OX;BQ$)JS(@M[I7 M4E)\1JS*J<,2/4ZIH51-8C6="4U:1VE&'<=I,C\60'4%D>4/Z],'S2B)\=M* M/N5S5:&ED]2L\.(\9,-"SW&R] /9UT.68I6_4KY(3RVU,\&_Y*U5JX\!R(:, MK+Q:8%\M4=&H?Q-Q-93(E!W)"0*L'TG#,Z!)DP&,A.?2A)4FSQQUGTI/S#K\ MO^.BXV6LYVMP4<@@J-FVB(@TWPTP4,:"^TUB[+3PO4RG*0G"'*B#"=-;*#*_ M [N[/FF/E RG-Z4@B,^DIZ)L#=J%,%D7H_3,<MJ$^E[I+#J%-??EG.S/BA MG1M[H^$P$E@[<<R+1.PAHQ\7N@/G(YN]4FXDZRI2190R-38)A<!/V"BL^N;& M:B!$9"U6)U8F:D_(]H>/28WLH9$\97YF+B(9:DSEV!FI%"F$WU!W6/;YFK_[ M*2\I76W29EQ)5I9%DS*ORL\I+;$L=SQ&KB4:;MX#Q1405]/<=6:O\VSZ9A<[ M+PV:4RX-F(Z"%WA.3.V6 BF)^VL$4=:!0@-9A<&7E7Z9"AJ5GR0TC[28^^A8 M6QEJUT59509NF^MTRD4BB=9PX,.O]$^VGZ^7H\#WX$JDLVB<<(^9#.,T6J*K M]&;"NT?3H0:^K-E&+CV(8!2E:30G?4+3]NXWX?OON;E6[CH,)\$RF<':B*'( MK/(;/'LEV<C#V>IRU.)_,CY1EVR9Z$9>F'94A6'+%Z0C%<@P\%5D'*HIS@?# M_G^2M!4QV0JJL8=E>2"P5BTY4H='#-_?]'I9S[R3<82!UM/JCB(,BW5IQR4Z M&'PB"H0K6JNH8V@5>9UPOIR*WLU-_Z:>D =J^#ZC_3!J\IR@5!AHMV110[B4 MI.>X6ZJUNEH)E&H%SC^[,-+J@1:;<C3=L#8L&>'M8&\/7=,;(V7GN1Z.:XQV M=\F;)%C[>C.351BDS:X$_9U&653D_86V]!?T7W3VM*$2N",14*W\ZW*^H-V2 MF9![4CM'.V1^KJR[5"HGYW&T94<_PWQZ)'#&&,7FI;1_QKLO9_W>+7SL#^#] M,8;X05\O>-[C[=+BGH>LVA=Q1"Y,R$*/@&,FHA>DY$?F-'+[!Y(9.Q ,7;R! MRB6B3*-X@BX+90*DVKYZLXA@VTB>P]Y3.LJZ^A^VM]?U4KA6+R0?IH(6\AI" MO%L@G$>,4"/OR6987A08"F.* I.[7^9[DIR@'),V0XF-E(LD2':@69&3S[-, M<0)!X#00":44%5,!-<N=4LV=0B@> C\4+9 [EE&([(OQ5<#UE7S)"^MYO(:& M&^+X^0)K*KT5"IA_D3FUX6XZV\S9;]@$,2VW""49Z'9AY^=PA\1AKZ;DF[)M M-^Q=70]^E&+->G-5,J#O#%RSW\H[;IB59:_1?*)LN!*1_-K>6 _9$*8<7K5M M5R#NS3??[#R-N4@\B?CN+K.IF@JYR4<IBK$>&)K.2VZ7UY-&0,R7OL>29>R) MK7TEOI)Z;>D\G:J)3%25",=1D;V4,$*.Q-NOC@2E0\H[>+&[@,N+ 9W9:')E M0\71Z=GQX%A[%UKY1CY!GSM/7,6F%5HD)A7*UU7*I]9QX)T<:F7;PYA!WJJ= MO7I&H'9GG.#&MC*N/=VZQ*$C,:%/*%TRV<TZ5'GWN*/76%&72I<3!NVB=KS] MMVO'EW)4[9"O<K6(=\%2"A9MSJMA"J_0)GF<N^/,X;E2 .SGU!ZI9AVZHIU] MR3K5/2+!<INTS\UI*4P-24GT[;X"4N52E.(PZJJIJ9Y6>Q6# '07N5B[W9T_ M*,PJEI%'!SI%S<0I=OT4/F-0U+-USO9)'90H%!#^?@?^^4^@3R?U>!2..'Q% M?)B^ CHG&3H_U*/SE;$H4->NBQKK@D7%$K410A64%R%H4Z,S!$UX</TT,5*, MA?#\"19VVA>I^-^7 WS,5;&^Q01LY ?45.6-FHA\LH@_B3AI9;'?H%2MR#3J MBF"Q3#U;^A/T,4Z6$F!2>7=]W;NA(W>;76W*MEUT/5W ^OT=662V(=)>D\_] M9,[>._REG$JHWJ,.,Q<3II)*%IB[F(TWF1V:<<H[<^$GQ84Y)V8P.NR O?E2 M5%"G\AZM<HK:@38K]Z^DV\LT,:=%$7#E)XQZ&V+A12@:U8!H:?2E%^-XT#&< MOMFHTTT+B:P:EGG6]0F8S&UQ=*O5PC2?,W9V_E<7?^V=H7^D9@H?*QSYF+!. MU.C[8#F>HEJ.EBF^FD;1&*9+ZD!Q,R$[9ZA&/T3Q?4()P".UZ:9R\X^&AH5 MDXVFY('FT&YQ359@M(-S="5)A62VE,U6Q)L\O3_-4O583#"E#SW,K6/HR>VR M"H/!Y*KD$%Z5HS9*&6#[%;&0P,,-AHL^%0Z-?:B3CJ=RMJKLN,K-,'Z:EH:J M^<YO^G<?SM%,LQ&PB1E+(_,QBFBU)]A3&TV59/+TC%(_<8-P.4?+ES:QSE6L MI@-Z=+9S:.=-!OQ7V#4L%G-K!6N$_X[RR(:M&37%:@[R3%.I5AVK0FEX=)F) MGJ$K90ZBDKB>MYS+,ZVN43-^O+LZZ=UH[7\9I[,37IWG,)[P2A+/#2<EW[+U M>KS5A&W9?.]V#QS#+=F$J>[+UWHZ=EA;VZ\0S.$SG=C>P8H;^S\G?;/&=*=M M%B/FE9R99M(W$\8OMR!*&&A;4?7YLR-6#$.=V.FLA,K</>VO,H?E;@;MS6Y6 M7F0[='][^SK9WY'=[4U]OB$_XF#*40NXTE&-U<O<L1GL+Q5DFY^)*Q"RFU^, M['PMKD\84V4KRSC<A**R>8^E IG:5A@RA;8*%+V:7/3HD.V@J[Y2=B!I+=7, M;[F;V%5FA*-I2[$F+:HZL@>^VF\WY51]RNICO\QM8X^23LK56".?CY$;-^NJ MVF<<EBF4H/*@^#KCS@:UZTM##:/H#'@2.X3>[PNU5YP?T$*6'QE%ZW^?8_J3 M^0*) DW]X"XP_TH?J+]KG+%2 3PWE"-E*$Q8N>-<+(04[03-H)*^KC0ZX[2\ MT5TZVDE%N!JA9OE\;T;"RPN,RB8L([#:J5@]=F5V936:)M.X'E1LR#]WRRPI M'H4L,F6=]RTJQ"JD<D,?T[RLHZ_[PMS2+S%/-^YUDUCRD?KWU[(/3\PD#C,0 MXPH#&:R:E)T24.<:J"QDN:D^>T@!,UO7$$BM\P*FN?+TZ$:AU(&GSK(]1[1U MBE%A4OIQ>XU)=5]B5[)/GL__3*,RPE5W31^ KYS46R)%]0\B+0E,45*9;TB( MS(WCC!+:G))%Q3KJ3V3'JW1<M)8!\B"95<P[5,JA0Y#<].YR,1A&$B"?Y.%P MU7F"C[4-V9UZCJJ[,U5RV%K38BO-,@61)\KE&,!OZB) /NTEGBP'6>?*G,\I MFDHG>$O5T]IX^G1H?(F[M*PBO4I<K*__Q1^K]*_86=OY7^+!^C#_F?PH]5E? MSHZM_T?<*#<7)".R%OP7%4BFIU*MT:HJ*565T3-JXD8A<Y8);/K,5+Y0##-F M<C=6(:8[?+0W$?M3GV[D>71C;!$MZ"S #-,'VO+-$GF^,J?VX;W[%O4/^0[P M1+A\;T^/PT<QA@@_ICN%%^ID> 3(/;U'S\,8"A_Y66VT,8WR9)+FQ&9:*CWR M@[N2GNRHZW,JC]J:H5#R&"B8K=<\$4E7^@1?J4S(.PI'U87H9C'"YG=%_D9\ M>.T]KP+-U+C2UO3ESZI2N8Q3L3]2;I"L[P)GD&LK(8U)72 T,36LNS"M.KEX MNKGS\JZ<+F,\-^2#$+8\$!P\9I>2'!@++W!CEX]'574]Y^,2TGOUF07?OGUA M-B+O[G9>F/G(N[]5L_ZM/"WPA?!N4])Z>\^WZ)>C0(",:<8-T:KXM:;14K=6 M8JXE3]8\L=9Z]M4OQD*4:U$9Q04&;3#-LSNS%2!SW8!W$@;OC3^Q#J_2:M&N M'%;,:HT50IY0N,/UI!RV5;>P0 @\EY+YF G!:<7,KGCM8^500M6)A,KCCJO' M!)4XGS@G:-Y25(#S0X9&@=K$+ *7)R4)1&@GCK-FZ;SA7KTZ'_Y[SK+;7A,. MG*]_FE?].(1EW&SORUOL?"5^S8U]\^QWU<U\HD)>3T8=V%!)U(:UE3^G\)6? M]<V^Q$+HSWRV5'_)>,F_P9 +5K]GCC75!7*0IY'E[YGP5>E>B(E0S?7%NH/Y M*!'5:93'%,\&9TUP@PBUB4R,?\-$GH.UZ'=0.(.BZP+ZISNXB^'2)B==>L?4 M,8Z6:/\B::E?I%"W<7/=8$I"\6"7?[- 'ITO_@2!^MV!E2LT&ERF:1UU3%E] M):/,EH0&72H?GAY?7O9/[0/4<,QOHDDVP'$<LT-,QG8WZ+^_ZEW).^C#D4#& M;O$]P\SV,<_&1;9S69LW;K+SM#I+)GH/#O\LS\9_H-]Y&3W*!'2^5 ?PS;.Y M.%.>T"V=.>8;//RC#,8K)9^,9/,''QJ.;6=T-AS%3<9>7^]ZZA\B?#>=M:D] IT>\7,5T)@'2%@3KBX+\MWZU1S>ZJ!O:;W-%ETI1>X'\ D<M?P\M( "= end -- Markku Savela (msa@hemuli.tte.vtt.fi), Technical Research Centre of Finland Multimedia Systems, P.O.Box 1203,FIN-02044 VTT,http://www.vtt.fi/tte/staff/msa/
Received on Monday, 11 November 1996 09:46:53 UTC