- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Fri, 12 Mar 2004 17:53:46 +0100
- To: tbray@textuality.com
- Cc: www-archive@w3.org
Hi Tim, I think it would be helpful if genx provides means to ease the creation of structured comments and processing instructions. The existing routines genxPI(...) and genxComment(...) require precomposing the string content which is quite inconvenient to do in C in a flexible, secure and portable manner. I would thus prefer to be able to something like: genxStartPI(w, "xml-stylesheet"); if (styleType) { genxAddText(w, "type='"); genxAddText(w, styleType); genxAddText(w, "' "); } genxAddText(w, "href='"); genxAddText(w, styleUri); genxAddText(w, "'"); genxEndPI(w); Using sprintf/strcat/etc. to precompose the data would likely yield in security issues on "reduced-mental-function days" or make developers use non-portable library routines to circumvent such problems. The same applies for comments, of course. Structured comments are often used for meta data in template systems or mailing list archives, for example, so I'd say there is some market for such an addition to genx. I've in fact expected genx to provide only the start/end versions for comments and processing instructions because it appears to be high-level functionality to do this in one step (and there is no genxElement() to create an element plus text node child in on step either). Also, while I can gurantee that all my input to genx is UTF-8 encoded, I would like to avoid pre-processing the text to handle illegal characters such as U+0001. While genxScrubText() provides help, I would prefer to have something to configure a replacement character rather than stripping such values off the string. In my particular application, this data is source code from foreign documents and I have line/column pointers for this data, they would break if the line length changes. It would also be helpful to have something visual for debugging. Since genxScrubText() gurantees equal-or-less string length for the resulting string, the choice of replacement characters is limited to US-ASCII, but that would work for me (using '.' or '?'). Hmm, the documentation reads int genxScrubText(genxWriter w, utf8 in, utf8 out); I think this should be int genxScrubText(genxWriter w, const utf8 in, utf8 out); It would be helpful to know from the documentation whether in and out must be distinct or whether I can use genxScrubText(w, s, s); and whether I can safely assume that the return value is the number of bytes scrubbed. Since most of my untrusted text goes into text nodes, it would also be helpful to have genxAdd*Text(...) routines that allow to scrub the text more conveniently, say genxAddScrubbedText(...) or genxSetAutoScrubText(...) so I would have to worry less about the well-formedness of the output (or, more precicely, genx runtime errors...) Hmm, now that I think about this, genxScrubText(...) does not take care of U+0000 since it considers the first occurence of such a character to terminate the string. Argl, so I need to fix the text myself anyway... regards.
Received on Friday, 12 March 2004 11:54:09 UTC