- From: Per-Erik Brodin <per-erik.brodin@ericsson.com>
- Date: Mon, 21 Sep 2009 17:39:14 +0200
- To: "Michael A. Puls II" <shadow2531@gmail.com>
- CC: public-webapps@w3.org
Michael A. Puls II wrote: > On Fri, 18 Sep 2009 11:37:24 -0400, Per-Erik Brodin wrote: > >> When parsing an event stream, allowing carriage return, carriage return >> line feed, and line feed to denote line endings introduces unnecessary >> ambiguity into the spec. For example, the sequence "\r\r\n\n" could be >> interpreted as three or four line endings. > > That would always be 3 lines: a mac, a windows and a nix. "\n\r\n\r" > would be the reverse order, but still 3. So what you are saying is that "\r\n" will always be a Windows line ending and never a Mac line ending followed by a Unix line ending? > > Universal newline normalization for input with mixed newline formats: > > // normalize newlines to \n > .replace(/\r\n|\r/g, "\n"); > > // normalize newlines to \r\n > .replace(/\r\n|r|\n/g, "\r\n"); > > // normalize newlines to \r > .replace(/\r\n|\n/g, "\r"); While regular expressions are greedy by default, I have been told that there is no way to express such behavior using ABNF. For what it is worth, that means that the current ABNF definition of the event stream format can't stand on its own. > > Ideally, I think it's often best to do the first to normalize to \n for > processing (like if you need to know line count) and then normalize to a > different format *if needed* afterwards. > > IMO > Keep in mind that we are parsing a continuous stream where data arrives in chunks. It is entirely possible for a "\r\n" pair to be split up between two chunks which could be handled by either 1) dispatching an event immediately when receiving a carriage return and then upon reception of the next chunk "remember" that the last character in the previous chunk was a carriage return and discard the first character if it happens to be line feed, or 2) not dispatching an event until the next character after carriage return has been received which could lead to delays in event dispatch. Both these options are far from ideal. -- Per-Erik Brodin Ericsson Research
Received on Monday, 21 September 2009 15:41:08 UTC