- From: <bugzilla@jessica.w3.org>
- Date: Wed, 16 Jul 2014 20:02:26 +0000
- To: public-browser-tools-testing@w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=26278
Andrey Botalov <botalov.andrey@gmail.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|RESOLVED |REOPENED
Resolution|FIXED |---
--- Comment #3 from Andrey Botalov <botalov.andrey@gmail.com> ---
There are other whitespace and BiDi characters in
http://www.unicode.org/Public/6.3.0/ucd/PropList.txt and
http://en.wikipedia.org/wiki/Space_(punctuation)#Spaces_in_Unicode.
I think that if only \u200b, \u200e, \u200f, \v, \f should be removed by
getElementText() from the string, then the spec should also contain an
explanation (note) about what makes those characters special and why other
invisible "spaces" shouldn't be removed.
I don't know much about Unicode but IMO those "spaces" also look like
zero-width:
U+180E
U+200C
U+2060
U+061C
etc.
I also found this line in gecko-dev repository:
https://github.com/mozilla/gecko-dev/blob/master/browser/base/content/browser.js#L2205:
> value = value.replace(/[\u00ad\u034f\u061c\u115f-\u1160\u17b4-\u17b5\u180b-\u180d\u200b\u200e-\u200f\u202a-\u202e\u2060-\u206f\u3164\ufe00-\ufe0f\ufeff\uffa0\ufff0-\ufff8]|\ud834[\udd73-\udd7a]|[\udb40-\udb43][\udc00-\udfff]/g, encodeURIComponent);
It seems that implementation in Firefox is a bit more complicated.
--
You are receiving this mail because:
You are the QA Contact for the bug.
Received on Wednesday, 16 July 2014 20:02:28 UTC