- From: Simon Pieters <simonp@opera.com>
- Date: Tue, 24 Feb 2015 13:17:29 +0100
- To: www-style@w3.org, fantasai <fantasai.lists@inkedblade.net>
On Mon, 26 Jan 2015 21:00:12 +0100, fantasai <fantasai.lists@inkedblade.net> wrote: > On 01/26/2015 06:31 AM, Koji Ishii wrote: >> Thank you for the great summary. >> >> On Mon, Jan 26, 2015 at 7:29 AM, fantasai >> <fantasai.lists@inkedblade.net> wrote: >>> >>> 1. text-combine-upright >>> ----------------------- >>> >>> Result of text-combine-upright should break as ID, not as U+FFFC. >>> Current spec requires treating as actual contents for >>> line-breaking. >>> So there is some misunderstanding of the text; >>> unclear whether there is an issue here to fix. >>> >>> Proposal A: Leave spec as-is: TCY treated as its own text. >>> Proposal B: Make TCY always treated as ideographic character. >> >> Hm, the change was made in 2012[1]. I merely remember we discussed, >> but don't remember why we changed. >> >> Though I lost that context, thinking now, I think B works the best. >> >>> 2. UAX#14 Rules for Atomic Inlines Problematic >>> ---------------------------------------------- >>> >>> Changing the rule order for UAX#14 is a difficult tailoring. >>> Spec should just create a special rule for atomic inlines. >>> >>> Proposal A: Change spec wording to fix this. >>> Proposal B: Change spec wording to fix issue #3. >>> >>> Remaining Issue: Should U+FFFC match images? >> >> Not very clear the diff between A and B. Can you clarify? >> [...] >> Maybe we're talking the same? I couldn't read what you meant by your A >> and B. > > Sorry, I wasn't clear. I meant follow Proposal B for Issue #3, i.e. >>> Proposal B: Treat all images as ID. > > >> This property is to opt-out the fix and bring back the behavior >> we defined in the LC, so I think we need this in the Level 3. > > The ideal behavior is, I think, to treat as ID. I can't imagine > anyone intentionally *wanting* the current behavior (ignoring > nbsp etc.) > > > FWIW, just checked Presto with some of your test cases (using > comma, period, brackets, etc.), and it seems to treat images > as ideographic. E.g. it keeps an image together with an > immediately following close-bracket, comma, or period. This > means it was Web-compatible enough for Presto, so maybe it's > Web-compatible enough for everyone. > > I propose we treat TCY, U+FFFC, and images all as ID by default. > What do you think? I did some research in httparchive. Not breaking for nbsp around replaced elements has the potential to put a lot of images or form controls on a single line when it was expected to wrap, but this appears to not be common enough that it is trivial to find when looking for it. Still, this is something that has been reported as a bug for Opera. Not breaking for other characters seems like it would not break pages (more than implementing UAX14 for text in general, at least). Possibly LB19 can break pages where e.g. a 100% wide inline image is adjacent to inline heading text with quotes (like http://software.hixie.ch/utilities/js/live-dom-viewer/saved/3426 ). Replace <...> with the things below to get the whole query. I limited the searches to ASCII characters (except nbsp). SELECT page, COUNT(*) AS num FROM [httparchive:runs.2014_08_15_requests_body] WHERE LOWER(mimeType) CONTAINS "html" <...> GROUP BY page ORDER BY num == LB12 == # nbsp before replaced element AND REGEXP_MATCH(LOWER(body), r'( |�*a0;|�*160;)<(embed|iframe|video|canvas|object|applet|audio|img|input|button|meter|progress|select|textarea|keygen)(\s|>)') 8756 pages. I loaded the first 50 in Opera 12 and didn't see anything obviously broken. # nbsp after replaced element AND REGEXP_MATCH(LOWER(body), r'<(/?embed|/iframe|/video|/canvas|/object|/applet|/audio|img|input|/button|/meter|/progress|/select|/textarea|/?keygen)\s*/?>( |�*a0;|�*160;)') 685 pages. I loaded the first 50 in Opera 12 and found one page that is slightly broken: http://joboutlook.gov.au/ the "search" buttons overflow in Presto but wrap in other browsers. # nbsp between two replaced elements AND REGEXP_MATCH(LOWER(body), r'<(/?embed|/iframe|/video|/canvas|/object|/applet|/audio|img|input|/button|/meter|/progress|/select|/textarea|/?keygen)\s*/?>( |�*a0;|�*160;)+<(embed|iframe|video|canvas|object|applet|audio|img|input|button|meter|progress|select|textarea|keygen)(\s|>)') 190 pages. Of the first 50 I only found joboutlook.gov.au again. == LB13 == # } ) ] ! ? , . / after replaced element, possibly spaces between AND REGEXP_MATCH(LOWER(body), r'<(/?embed|/iframe|/video|/canvas|/object|/applet|/audio|img|input|/button|/meter|/progress|/select|/textarea|/?keygen)\s*/?>\s*[\}\)\]\!\?\,\.\/]') 167 pages. I included \s* between, although UAX14 is inconsistent. It says "Do not break before ‘]’ or ‘!’ or ‘;’ or ‘/’, even after spaces." but then the grammar is "× CL", not "× SP* CL". Presto prevents breaking even with the space. == LB14 == # ( [ { before replaced element, possibly spaces between AND REGEXP_MATCH(LOWER(body), r'[\(\[\{]\s*<(embed|iframe|video|canvas|object|applet|audio|img|input|button|meter|progress|select|textarea|keygen)(\s|>)') 127 pages. Presto breaks when there is a space between. e.g. http://www.gigposters.com/ http://www.newsonews.com/ Note that the newsonnews.com one is in quirks mode, with [<img>] in table cells. WebKit/Blink prevent linebreaks around images in table cells in quirks mode. Gecko only prevents linebreaks around images in table cells in quirks mode *for the purpose of calculating the width of the table cell*, not when actually laying out. The good news is that the proposal is slightly closer to the behavior in quirks mode, so it is less likely to break such pages. == LB19 == (Searching for " or ' is not useful because of strings in JS.) # HTML-escaped QU characters before replaced element AND REGEXP_MATCH(LOWER(body), r'&#(x0*(ab|bb|2018|2019|201[bcdf]|203[9a]|275[bcdef]|2760|2e0[0123456789abcd]|2e2[01]|1f67[678])|0*(171|187|8216|8217|8219|822[013]|8249|8250|1007[56789]|10080|1177[6789]|117[89]\d|1180[01234589]|12863[012]));?<(embed|iframe|video|canvas|object|applet|audio|img|input|button|meter|progress|select|textarea|keygen)(\s|>)') 7 pages. # HTML-escaped QU characters after replaced element AND REGEXP_MATCH(LOWER(body), r'<(/?embed|/iframe|/video|/canvas|/object|/applet|/audio|img|input|/button|/meter|/progress|/select|/textarea|/?keygen)\s*/?>&#(x0*(ab|bb|2018|2019|201[bcdf]|203[9a]|275[bcdef]|2760|2e0[0123456789abcd]|2e2[01]|1f67[678])|0*(171|187|8216|8217|8219|822[013]|8249|8250|1007[56789]|10080|1177[6789]|117[89]\d|1180[01234589]|12863[012]))([;<]|\s)') 2 pages. # Raw QU characters before replaced element, excluding " ' and astral characters AND REGEXP_MATCH(LOWER(body), r'[«»‘’‛-“”‟‹›❛-❠⸀-⸁⸂⸃⸄⸅⸆-⸈⸉⸊⸋⸌⸍⸜⸝⸠⸡]<(embed|iframe|video|canvas|object|applet|audio|img|input|button|meter|progress|select|textarea|keygen)(\s|>)') 16 pages. (I don't know if this result is accurate, it is possible there are encoding issues.) # Raw QU characters after replaced element, excluding " ' and astral characters AND REGEXP_MATCH(LOWER(body), r'<(/?embed|/iframe|/video|/canvas|/object|/applet|/audio|img|input|/button|/meter|/progress|/select|/textarea|/?keygen)\s*/?>[«»‘’‛-“”‟‹›❛-❠⸀-⸁⸂⸃⸄⸅⸆-⸈⸉⸊⸋⸌⸍⸜⸝⸠⸡]') 0 pages. (Again, don't know if this is accurate.) -- Simon Pieters Opera Software
Received on Tuesday, 24 February 2015 12:17:59 UTC