- From: <bugzilla@jessica.w3.org>
- Date: Wed, 16 Jul 2014 20:02:26 +0000
- To: public-browser-tools-testing@w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=26278 Andrey Botalov <botalov.andrey@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|FIXED |--- --- Comment #3 from Andrey Botalov <botalov.andrey@gmail.com> --- There are other whitespace and BiDi characters in http://www.unicode.org/Public/6.3.0/ucd/PropList.txt and http://en.wikipedia.org/wiki/Space_(punctuation)#Spaces_in_Unicode. I think that if only \u200b, \u200e, \u200f, \v, \f should be removed by getElementText() from the string, then the spec should also contain an explanation (note) about what makes those characters special and why other invisible "spaces" shouldn't be removed. I don't know much about Unicode but IMO those "spaces" also look like zero-width: U+180E U+200C U+2060 U+061C etc. I also found this line in gecko-dev repository: https://github.com/mozilla/gecko-dev/blob/master/browser/base/content/browser.js#L2205: > value = value.replace(/[\u00ad\u034f\u061c\u115f-\u1160\u17b4-\u17b5\u180b-\u180d\u200b\u200e-\u200f\u202a-\u202e\u2060-\u206f\u3164\ufe00-\ufe0f\ufeff\uffa0\ufff0-\ufff8]|\ud834[\udd73-\udd7a]|[\udb40-\udb43][\udc00-\udfff]/g, encodeURIComponent); It seems that implementation in Firefox is a bit more complicated. -- You are receiving this mail because: You are the QA Contact for the bug.
Received on Wednesday, 16 July 2014 20:02:28 UTC