- From: Felix Sasaki <fsasaki@w3.org>
- Date: Sat, 10 Nov 2007 02:06:44 +0900
- To: public-html@w3.org, public-i18n-core@w3.org
... are at http://www.w3.org/2007/11/09-i18n-minutes.html and below as text. Felix [1]W3C [1] http://www.w3.org/ - DRAFT - SV_MEETING_TITLE 9 Nov 2007 See also: [2]IRC log [2] http://www.w3.org/2007/11/09-i18n-irc Attendees Present Regrets Chair SV_MEETING_CHAIR Scribe fantasai Contents * [3]Topics 1. [4]Validator checking entity reqs * [5]Summary of Action Items _________________________________________________________ <aphillip_> [6]http://www.w3.org/html/wg/html5/#determining0 [6] http://www.w3.org/html/wg/html5/#determining0 <anne> [7]http://www.whatwg.org/specs/web-apps/current-work/multipage/secti on-parsing.html [7] http://www.whatwg.org/specs/web-apps/current-work/multipage/section-parsing.html <aphillip_> [8]http://lists.w3.org/Archives/Public/public-i18n-core/2007OctDec/0 088.html [8] http://lists.w3.org/Archives/Public/public-i18n-core/2007OctDec/0088.html 16: 13 -!- Irssi: Join to #i18n was synced in 0 secs ... 13 < Hixie> [9]http://www.whatwg.org/specs/web-apps/current-work/multipage/secti on-parsing.html#parsing ... 13 < Hixie> [10]http://www.whatwg.org/specs/web-apps/current-work/multipage/sect ion-parsing.html#the-input0 [9] http://www.whatwg.org/specs/web-apps/current-work/multipage/section-parsing.html#parsing [10] http://www.whatwg.org/specs/web-apps/current-work/multipage/section-parsing.html#the-input0 <scribe> ScribeNick: fantasai Addison: There was a badly-titled thread saying something about making windows-1252 the default encoding. ... Our first reaction was, wouldn't it be nice if that were something else, say utf-8 ... At the same time we recognize that there's a legacy encoding issue, since previous versions of HTML required iso-???? <hsivonen> [11]http://hsivonen.iki.fi/charmod-checking/ [11] http://hsivonen.iki.fi/charmod-checking/ <hsivonen> [12]http://hsivonen.iki.fi/charmod-norm-checking/ [12] http://hsivonen.iki.fi/charmod-norm-checking/ Addison: If you actually look at the sections, 8.2 and .... ... It does not in fact say that the default encoding of the universe at large is windows 1252 ... In the sequence there's looking at byte sequences, then using heuristics, etc. ... at the end of that sequence there's a paragraph that says ... if all else fails, you have to supply some implementation-defined default and we recommend you do these things. ... And windows-1252 just appears out of nowhere. ... One thought we had was for us to provide some information on why windows-1252 is preferable and how it differs from the standard ISO encodings. <Hixie> " <Hixie> When a user agent would otherwise use the ISO-8859-1 encoding, it must instead use the Windows-1252 encoding." Henri: that part is a violation of charmod Addison doesn't consider that a violation of charmod Addison: There are superset encodings and they're often tagged with the subset encodings. ... using the superset interpretation doesn't conflict with using the subset interpretation ... We're not proposing a substantive change, just providing more justification for what you're doing. ... We also looked at the structure of the paragraph, and had some concerns. ... one was the phrasing of "western demographics" etc ... We had several reactions. ... Oene it's not clear what a western demographic and how you tell when you're talking to one on the internet. ... We proposed 2 things, one of which was to turn two things around. ... We have a love of utf-8, and we'd like you to mention that one first and then the legacy thing ... We also think the wording could be changed somewhat on the windows-1252 to say that "in a legacy context, if you have to guess, you should guess this one" Ian: I haven't gotten to that issue yet, haven't looked at it in detail, sounds ok Richard: Is it purely editorial? Addison: It doesn't change the result, it just changes how you explain the result. Ian: Do you have any recommendation for dealing with say Japan and other parts of East Asia? Addison: There are a variety of things in step #7 that allow for various heuristics and sniffing. Ian: windows-1252 is fine for US and UK, but what about other places? Felix: Depends on what device. Addison: Most implementations use information in the browser, e.g. what the browser uses or if a narrower auto-detect is set (as for Japanese) Ian: So in the Japanese cases, you expect that the rest of the steps would take care of it? Addison: I think you'd trap those encodings before you get to step 7(?) ... Might want to mention that in some cases of getting a subset encoding to use the superset encoding. ... I think we can provide that information. Ian: I believe when I wrote that section that I checked a browser and that was the only mapping they had. Addison: Most browsers dont' just do GBK, but do ???? ... There are some cases, such as in Japan, where the byte patterns are completely different. ... where the encoding schemes are different even though the charset is the same ... that kind of autodetection is a separate thing ... I think this is still valid. ... THe only question I have is, if you're thinking "what should happen in step 7" is some language-dependent or context-dependent thing ... Hixie: In this final step, you don't have any information from the content Addison: You might want to think about splitting step 7 and doing a utf-8 detection first ... UTF-8 has recognizable byte patterns, it would be great to put that first before saying "use your favorite legacy encoding" Hixie: The concern is what happens if the user enters some bytes into the form and then submits it? Addison: We were just looking at that in the i18n working group Hixie: We'd have to make sure that that's what the server was expecting. Felix what information are you looking at to guess what encoding the user applies? Hixie: Typically different localizations of the browser have different default encodings. ... well, the email's in my pile. I don't know when I'll get to it. Addison: We'll look at superset encodings and try to write up a document that you can reference. Introductions Richard Ishida: W3C Internationalization Lead Anne van Kesteren: Opera Software Elika: fantasai, CSSWG Invited Expert, works on international text layout Addison Phillips: Yahoo, i18n wg Amit Parashar: something-or-other chair Henri Sivonen: working on HTML5 conformance checker Ian Hickson: HTML5 editor Felix Sasaki: i18n Core, i18n ITS and Web Services Policy WG [W3C] <plh> Philippe Le Hegaret: W3C, Architecture Domain (XML, Web Services, i18n), and Video Ishida: Can you explain the alt text issue? <najib> Najib Tounsi, W3C Morocco Office Mgr. Ishida: We believe that you should never put human-readable text in an attribute value because you can't put markup in it ... which is important for various i18n reasons: bidi, language annotation, ruby, etc. Hixie: We still have the <img> element; we can't get rid of it. It still has alt attr, because it's had that. ... We can't give it content because HTML parsers all close it right after the start tag. ... We also have the <object> tag, which has full fallback capabilities. Ishida: Would the group advise the <object> tag then? Hixie: I don't think we'll have a recommendation one way or another; if your fallback content needs element content, then you'll have to use <object> ... We've been doing some work, e.g. Acid2, on making sure the <object> tag works properly in various browsers. Ishida asks about some XHTML2 stuff Hixie: THe XHTML2 group did two things, one was switching some attributes into elements, e.g. title attributes. ... Then they also went and started usng rdf for everything: we are certainly not going to do that. ... For the first one, I'm not convinced that the benefits of using an element for these things is better than the costs ... We can try not to do things like that in the future though ... This problem comes up in many places, e.g. in DOM APIs that take a string. ... There are also places where we can't make such changes, such as the <title> element ... whose content winds up in places like filenames where you can't have structured markup anyway Ishida: Can you use bidi in filenames? Hixie: probably, but I'm not going to recommend it Ishida: We might need to start thinking about how to convert text from markup to strings with bidi control characters. <anne> (I think HTML 5 should get &rlo;, &lro;, and &pdf; (or something in that direction) for BiDi. These are already in IE.) Hixie: We did consider having a DOM attribute that would pull out e.g. bidi control characters from the markup and alt text from images ... not sure where that's going ... I would recommend finding solutions for plaintext, since that will work for both Discussion of that language tags are in Unicode, but were deprecated as soon as they were added: they were added as deprecated and should never be used <anne> (event though the characters they map to are apparently deprecated) discussion of markup-plaintext thing <apppp> reference RFC 3066 should point to BCP 47 Addison notes that the i18n group needs to review the date parsing things <najib> +1 for to add &rle, ..., &pdf; in HTML Henri notes that it's using ISO dates anyway najib, if we're adding more entities I want &zwsp; :) <najib> It depends on usage frequences. :-) Validator checking entity reqs Henri: I don't check that character entities are only used for characters that are unclear. ... because I can't tell mechanically whether the character is unclear <anne> fantasai, I think &zwsp; is also supported by IE cool let's add it :P all the characters next to it have names, zwnj, zwj etc <najib> I don't have IE on MacOS :-( & :-) Ishida explain that this part of charmod is about best practices it's not should in the normative sense Elika: Maybe you should go through the document and change the wording of should sentences that don't match RFC2119 to something else Ishida: Well, we mean it that way for authors. Maybe we need to create different classes and explain which recommendations apply to which <fsasaki> [13]http://hsivonen.iki.fi/charmod-norm-checking/ [13] http://hsivonen.iki.fi/charmod-norm-checking/ Henri: I documented which constructs in HTML5 result in a continuous string ... I don't have any other comment there except that I wrote this and it is available :) ... I have another comment, but its targetted at the unicode/icu specs Ishida: Might want to post to the unicode list <apppp> Title: I18N / HTML5 break out session <apppp> Scribe: fantasai <apppp> ScribeNick: fantasai Summary of Action Items [End of minutes] _________________________________________________________ Minutes formatted by David Booth's [14]scribe.perl version 1.128 ([15]CVS log) $Date: 2007/11/09 17:05:50 $ _________________________________________________________ [14] http://dev.w3.org/cvsweb/~checkout~/2002/scribe/scribedoc.htm [15] http://dev.w3.org/cvsweb/2002/scribe/ Scribe.perl diagnostic output [Delete this section before finalizing the minutes.] This is scribe.perl Revision: 1.128 of Date: 2007/02/23 21:38:13 Check for newer version at [16]http://dev.w3.org/cvsweb/~checkout~/2002 /scribe/ [16] http://dev.w3.org/cvsweb/~checkout~/2002/scribe/ Guessing input format: RRSAgent_Text_Format (score 1.00) Succeeded: s/THere/There/ Succeeded: s/asks a question/what information are you looking at to gue ss what encoding the user applies?/ Succeeded: s/psot/post/ Found ScribeNick: fantasai Found Scribe: fantasai Found ScribeNick: fantasai WARNING: No scribe lines found matching ScribeNick pattern: <fantasai> ... WARNING: No "Present: ... " found! Possibly Present: Addison Elika Felix Henri Hixie Ian Ishida MikeSmith Philip Richard ScribeNick Title amit anne aphillip_ apppp fsasaki hsivo nen jgraham_ najib plh smedero You can indicate people for the Present list like this: <dbooth> Present: dbooth jonathan mary <dbooth> Present+ amy WARNING: No meeting title found! You should specify the meeting title like this: <dbooth> Meeting: Weekly Baking Club Meeting WARNING: No meeting chair found! You should specify the meeting chair like this: <dbooth> Chair: dbooth Got date from IRC log name: 9 Nov 2007 Guessing minutes URL: [17]http://www.w3.org/2007/11/09-i18n-minutes.htm l People with action items: [17] http://www.w3.org/2007/11/09-i18n-minutes.html WARNING: Input appears to use implicit continuation lines. You may need the "-implicitContinuations" option. End of [18]scribe.perl diagnostic output] [18] http://dev.w3.org/cvsweb/~checkout~/2002/scribe/scribedoc.htm
Received on Friday, 9 November 2007 17:07:11 UTC