- From: Martin Duerst <duerst@w3.org>
- Date: Wed, 23 May 2001 09:02:14 +0900
- To: www-validator@w3.org
Hello everybody, [I had some terrible problems, it seems, to get this in right. Many thanks to Terje for catching this!] I'm not sure the details in this mail are appropriate for this list, please just tell me if you get bored :-). In sub normalize_newlines, the following two lines $file =~ s(\015\012){\n}g; # Turn ASCII CRLF into native newline. $file =~ s(\015) {\n}g; # Turn ASCII CR into native newline. pretended to turn various line endings into native convention newlines, and they indeed did so on Unix systems. But they didn't do that on PCs or Macs. Here is what happened: Start Mac PC Unix CRLF CR CRCRLF LF CR CR CRLF LF LF LF LF LF desired CR CRLF LF This can be got by replacing the two lines above by $file =~ s(\015\012?|\012){\n}g; # Turn CRLF/CR/LF into native newline. I have checked this change in, together with some tweaks to the comments at the start of the subroutine. The above regular expression may puzzle some, but it works. It could also be written (\015\012|\015|\012) or (\015\012|\012|\015) [but beware of (\015|\015\012|\012) and similar, and if you want to know why, please read Jeffrey Friedl's Mastering Regular Expressions. Of course, the whole subroutine, currently reading sub normalize_newlines { my $file = shift; $file =~ s(\015\012?|\012){\n}g; # Turn CRLF/CR/LF into native newline. return [split /\n/, $file]; } can be further simplified to read sub normalize_newlines { my $file = shift; return [split /\015\012?|\012/, $file]; } and then I guess further to sub normalize_newlines { return [split /\015\012?|\012/, shift]; } but once we are here, we might be able to get rid of the subroutine altogether. Regards, Martin.
Received on Tuesday, 22 May 2001 20:03:00 UTC