- From: Frode Børli <frode@seria.no>
- Date: Tue, 17 Jun 2008 06:09:55 +0200
Hi! I am a new member of this mailing list, and I wish to contribute with a couple of specific requirements that I believe should be discussed and perhaps implemented in the final specification. I am unsure if this is the correct place to post my ideas (or if my ideas are even new), but if it is not, then I am sure somebody will instruct me. :) One person told me that the specification was finished and no new features would be added from now on - but hopefully that is not true. The challenge: More and more websites have features where users can contribute with user generated content - often in the form of audio, video, images or wiki-articles. An older type of content contribution is normal text such as posts in a discussion forum, a mailing list such as this and comments on blog articles. A major challenge for many web developers is validating "untrusted" content such as the message body of a blog comment. Unless the developer has a flawless and future proof algorithm for ensuring that the message body does not contain any script, web developers have to resort to text only - or bbCode-style markup languages to allow users to post text content with richer formatting. If the developer wants to enable rich formatting using bbCode, it also needs fairly advanced methods of ensuring that no scripts are executed. Consider this bbCode example: [img]some_image.jpg'onmouseover=maliciousScript()[/img]. The bbCode parser must ensure that there is absolutely no method of injecting scripts in user posts - and that is very difficult when at the same time there exists parsing errors in browsers. The example could easily be validating by not allowing apostrophes or quotation marks in urls - but then we have multiple entities that could be used: ' or '. To make matters worse, some browsers parse ' which is an incomplete html entity and all these variations must be considered by the bbCode parser author. Another problem which makes future proofing this type of security is that standards evolve. A few years ago you could safely allow users to apply css-styles to tags. Example bbCode tag [color=blue]Blue text[/color] would be translated to <span style='color: blue'>Blue text</span>. In this example an exploit could be [color=expression(maliciousCode())]Text[/color]. When the algorithm was made, it was considered secure, since no script could ever be executed inside a style attribute. With the invention of expressions and behaviours etc the knowledge required by web developers are ever increasing, and web developers have to review all old code whenever new technologies emerge - because what once was secure suddenly is not secure anymore. One solution: <htmlarea>User generated content</htmlarea> No scripts would ever be allowed to be executed inside this tag. Malicious users could potentially submit "</htmlarea> unsafe content <htmlarea>" and get around this. There are as I can see it two solutions to this: User generated content inside the tag must be escaped using html entities (but still rendered as html by the user agent), or the author must prevent users from submitting the string "</htmlarea>" and all possible variations of the tag. If the first solution is used, then browsers should display a strong security warning if unescaped content is seen between htmlarea-tags on a website (to educated web developers). A sidenote: The tag name I chose is based on the <textarea>-tags which should also be entity escaped to prevent users from inserting the text </textarea>. This currently breaks a lot of web pages - so perhaps a strong security warning is in place if unescaped content is found after the textarea start tag also? -- Best regards / Med vennlig hilsen Frode B?rli Seria.no Mobile: +47 406 16 637 Company: +47 216 90 000 Fax: +47 216 91 000 Think about the environment. Do not print this e-mail unless you really need to. Tenk milj?. Ikke skriv ut denne e-posten dersom det ikke er n?dvendig. -- Best regards / Med vennlig hilsen Frode B?rli Seria.no Mobile: +47 406 16 637 Company: +47 216 90 000 Fax: +47 216 91 000 Think about the environment. Do not print this e-mail unless you really need to. Tenk milj?. Ikke skriv ut denne e-posten dersom det ikke er n?dvendig. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.whatwg.org/pipermail/whatwg-whatwg.org/attachments/20080617/a38ade1f/attachment.htm>
Received on Monday, 16 June 2008 21:09:55 UTC