Re: strange behavior? in wpt / css-text line-breaking test issue

   hi all,

   Sorry for taking time to come back, I now believe the root cause of the original test
in discussion is missing font configuration (e.g. putting mplus-1p), but without secured
text based evidence. Details what I've checked and their results are at bottom.

   Let me file a PR to add loading online resource (but in wpt, like mplus-1p) for tests
in css/css-text/line-break directory (ones pointed as possible missing in previous email),
and ask comment/review from wider WPT colleagues.
# of course, without i18n endorsed...



1. Asked Mike (W3C/Keio) if he has any knowledge on fonts in wpt. No reply yet, and heard
that he is busy for things and may take time to reply when I've mentioned during weekly
call of Keio (mid-Dec). - but "no reply" might due to holiday seasons.

2. Checked docker image for wpt: https://hub.docker.com/r/webplatformtests/wpt.fyi, which
seems to be used for Chrome and Firefox on Linux.
   Except for ones in browser resources (e.g. TwemojiMozilla.ttf in Firefox), I could find
basic bitmap fonts (xfonts-base) and DejaVu fonts (fonts-dejavu-core and
fonts-dejavu-extra) only. (checked by three ways: resources in /usr/share/fonts/, find
files with font extension, list of installed packages)
   I believe PCF bitmap fonts will not be used by browsers, so only codepoints included in
DejaVu could be used for these two platforms.

3. Checked some other tests, which has discrepancy between wpt test results and MDN bcd
(browser compatibility data), and found another case:
   CSS full-size-kana is marked as not supported in Chrome/Edge, but Chrome passed at wpt.fyi.
bcd: https://developer.mozilla.org/en-US/docs/Web/CSS/text-transform#browser_compatibility
wpt results: https://wpt.fyi/results/css/css-text/text-transform/text-transform-full-size-kana-001.html
test case: http://wpt.live/css/css-text/text-transform/text-transform-full-size-kana-001.html

4. Looking history of test files in css/css-text/line-break directory, tests of
>      line-break-loose-hyphens 001 - 003
>      line-break-normal-hyphens 001 - 003
>      line-break-strict-hyphens 001 - 003
are newly added in one PR to update test files including existing ones at that time,
so it seems that these are separately created and could lead PR to miss some configuration
lines?
https://github.com/web-platform-tests/wpt/commit/d2767c04559c016e04ad43fcc07f63f1153d18bf




On 2021/12/09 18:05, Atsushi Shimono (W3C Team) wrote:
>    hi Fuqiao,
> 
> On 2021/12/07 14:45, Fuqiao Xue wrote:
>> Hi Atsushi,
>>
>> I'm not quite familiar with fonts in wpt, but I see that Ahem is recommended here:
>>
>>    https://web-platform-tests.org/writing-tests/general-guidelines.html#be-cross-platform
> 
>    Yes,, we might need to read
>> Fonts cannot be relied on to be either installed or to have specific metrics. As such, in most cases when a known font is needed,
> as to be either installed other than us-ascii (or western?), or something...
> 
>> And looking at the test in https://github.com/web-platform-tests/wpt/blob/f294f587fdba42782cf64cbb6f42108fc661387a/infrastructure/assumptions/ahem.html#L291-L311 , it seems to support at least some CJK characters.
> 
>    Ah, yes. Aham has 278 glyph in total, 8 glyph not mapped, and all available characters are listed
> at the above page (I've just checked with dump of ttf), like U+0020 to U+007E except for U+0027,
> U+00A0 to U+00FF.
>    most of all glyph are simply 1em black square box, even for US-ASCII, as written at:
> https://web-platform-tests.org/writing-tests/ahem.html
> 
>    Considering our target,,,
> 
> 1. line-break property
> 
>    84 files in css/css-text/line-break has font line:
> - Ahem.css: 39 (line-break-anywhere, line-break-anywhere-and-white-space, line-break-anywhere-overrides-uax-behavior)
> - mplus-1p-regular.woff: 32 (line-break-loose, line-break-normal, line-break-strict)
> - NotoNaskhArabic-regular.woff2: 1 (line-break-shaping-001)
> - no font specified: 12
>      line-break-anywhere 001 - 003
>      line-break-loose-hyphens 001 - 003
>      line-break-normal-hyphens 001 - 003
>      line-break-strict-hyphens 001 - 003
> 
>    For anywhere 001 to 003 (3 files), failure of 002 at firefox seems false positive (= FAIL detected),
> and could update test with Ahem, editing some characters, and editing CSS (box size).
>    For hyphens tests (9 files), I believe we can write valid test with using Ahem, but might
> need serious consideration (not just replacing CJK Han character to available one in Ahem),
> such as changing width of box and references.
> 
> 
> 2. other tests mentioned in previous email
> 
>    Since tests target specific character, like punctuation marks, we may be better to change
> to mplus-1p woff?
> 
> 
> 3. for letter-spacing tests, under review as i18n test
> 
>    most uses mplus-1p woff as their font, where punctuation marks are tested, and should be ok.
> 
> 
>    Still I'm not unsure whether analysis above are correct or not, although...
> 
> 
>> ~xfq
>>
>>> On Dec 7, 2021, at 11:24, Atsushi Shimono (W3C Team) <atsushi@w3.org> wrote:
>>>
>>>   hi all, (sorry in English for public-i18n-japanese)
>>>
>>>   I'd want to ask help or advice from whom could know (or encountered to similar ones) on
>>> a possible issue of periodic process by wpt.
>>>
>>>   TL;DR; (in short) possible broad issue on font in tests ('tofu' error)
>>>
>>>
>>> 1) The root issues which initiated this survey are:
>>> https://github.com/w3c/jlreq/issues/274 (for jpan-gap, line-break not working for some browsers)
>>> https://github.com/web-platform-tests/wpt/issues/31021 (results of line-break-loose-hyphens-001 seem not valid)
>>>
>>>   For first one, even there are several not valid results exist in wpt results, there are
>>> some unimplemented cases in browsers, but again which failures (in wpt) are unimplemented
>>> and are not valid test outcome need to be distinguished.. (of course!)
>>>
>>>
>>>   For line-break-loose-hyphens-001, the most recent results are:
>>> https://wpt.fyi/results/css/css-text/line-break/line-break-loose-hyphens-001.html?label=experimental&label=master&aligned
>>>   live test files are:
>>> http://wpt.live/css/css-text/line-break/line-break-loose-hyphens-001.html
>>> http://wpt.live/css/css-text/line-break/reference/line-break-loose-hyphens-001-ref.html
>>>   screenshots for results in wpt are:
>>> chrome: https://wpt.fyi/analyzer?screenshot=sha1%3A62184ca0e5591687ca98b3702fb02e5078b3a727&screenshot=sha1%3Af948379acf4a4ff38703bceb6d1c737f7638a648
>>>   fonts are shown as 'tofu', which does not have 1em (~0.4em?), which makes test will never work
>>>   also real issue confirmed with local Chrome (Windows)
>>> edge: https://wpt.fyi/analyzer?screenshot=sha1%3Aac50bd0a260a1bfd3a589a9bfb5be8804fa15212&screenshot=sha1%3Aeb0a637f8f1263bb5d4d3a3efd4367edceee2246
>>>   fonts are shown correctly, real issue confirmed with local Edge (Windows)
>>> firefox: https://wpt.fyi/analyzer?screenshot=sha1%3A8dd71c721b7d50aca4e18043d075ed5ba3d6254b&screenshot=sha1%3A496d84adc1707136077d4424f82ee51be2c31ebe
>>>   fonts are shown as 'tofu', which does not have 1em (~0.8em?), which makes first test will never work (2x0.8 + hyphens <= 2em)
>>> safari: https://wpt.fyi/analyzer?screenshot=sha1%3A2e90a49c9fbddfe7666a066e0166266a479d43b6&screenshot=sha1%3A565fe0d1702106ebd2470ce57ce30e75051fa21f
>>>   fonts are shown correctly, real issue confirmed with local Safari (MacOS)
>>>
>>>
>>> 2) So, two root causes exist here. One is real issue of implementation (Chrome, Edge, Safari),
>>> and another is incorrectly picked font (Chrome, Firefox).
>>>   This tests hyphens with UAX#14 ID characters, and definition in css-text-3 is:
>>>> The following breaks are allowed for loose line breaking if the preceding character belongs to the Unicode line breaking class ID [UAX14] (including when the preceding character is treated as ID due to word-break: break-all), and are otherwise forbidden:
>>> which does not have any condition relates to 'lang' attribute specified to the target element.
>>> So, even we have lang="en" on this test (as now), this test case is valid and should be handled
>>> correctly by implementations. Character used is U+6587:
>>> https://util.unicode.org/UnicodeJsps/character.jsp?a=6587
>>>
>>> 2a) For second point, it does not happen on Safari or Edge (both seems picking zh-hant font?),
>>> and I thought we could just change lang to html element into something other like zh-hant/hans
>>> or ja, which should have 1em glyph by default (for the first moment).
>>>   But looking into other tests in wpt, it seems there are several similar cases of 'tofu',
>>> with some confused outcomes...
>>>
>>>
>>> 3) And, I've checked several others to find solution and/or possible similar cases.
>>>
>>>   In css-ruby, ruby-intrinsic-isize-001, whose test cases has html lang="ja"
>>> test: http://wpt.live/css/css-ruby/ruby-intrinsic-isize-001.html
>>> results: https://wpt.fyi/results/css/css-ruby/ruby-intrinsic-isize-001.html?label=experimental&label=master&aligned
>>> chrome: https://wpt.fyi/analyzer?screenshot=sha1%3A39ea5f5efcf9845ff20643f8e5e9756cff284d6f&screenshot=sha1%3A16cce8d6657ed23bba754e755fed55aa07468d29
>>>   Firefox passes on this test (no screenshot provided). Chrome screenshot show 'tofu' fonts
>>> for this screenshot.
>>>
>>>   In css-contnet, there are several multi language tests on quotes. quotes-016 tests Japanese
>>> quote, and all browsers pass.
>>> test: http://wpt.live/css/css-content/quotes-016.html
>>> result: https://wpt.fyi/results/css/css-content/quotes-016.html?label=experimental&label=master&aligned
>>>   This test is consisted of two lines, one with 'q' element, one replacing with &#xXXXX;.
>>>   Considering this format with possible 'tofu' replacement, I believe test should pass if
>>> browser implements correctly, even with glyph as 'tofu'. (two lines will have identical
>>> width of glyph)
>>>
>>>   In the same suite, test for fallback of multiple region (like ja to ja-JA) is added as
>>> quotes-034, which fails in chrome and firefox with screenshot.
>>> test: http://wpt.live/css/css-content/quotes-034.html
>>> result: https://wpt.fyi/results/css/css-content/quotes-034.html?label=experimental&label=master&aligned
>>> chrome screenshot: https://wpt.fyi/analyzer?screenshot=sha1%3A16cb9a1153f028415b7bef0896a945df2a1e2b31&screenshot=sha1%3A730970202b6b9a65dddc5aacaac044e33a102426
>>> firefox screenshot: https://wpt.fyi/analyzer?screenshot=sha1%3A1288b3218b0ab384f993fd89c8be35ffbb479248&screenshot=sha1%3A8b062b8e99ae2fa30dc4a996e8fa14168be301a2
>>>   This test has html lang="en" and lang is specified per line (wrapped as p). Reference is
>>> written with &$xXXXX; presentation.
>>>   Both screenshot has 'tofu' character for Japanese lines.
>>>
>>>
>>> 4) I think some tests contributed from i18n WG to wpt have lines for web fonts, like Arabic,
>>> Nko, or Mongolian, but I thought these are added not to have side-effect from glyph in font
>>> for complex shaping tests.
>>>   Although I haven't encountered any test contribution using CJK, I thought we don't need
>>> to have similar lines to load web font files if one does not have such complexity...
>>>
>>>
>>>   Does anyone have some advice or knowledge? Like, if we rely on non-Latin characters (or
>>> something,, say sets of DejaVu - widely used in old age although...), we need to include
>>> lines of web font...
>>>   I should miss some manual on this area, and sorry if there is something clearly stating
>>> this in wpt manuals.
>>>
>>>
>>

Received on Tuesday, 28 December 2021 07:25:38 UTC