Joint meeting at TPAC from HTML and i18n core WG minutes 2007-11-09

... are at http://www.w3.org/2007/11/09-i18n-minutes.html and below as text.

Felix

   [1]W3C

      [1] http://www.w3.org/

                               - DRAFT -

                           SV_MEETING_TITLE

9 Nov 2007

   See also: [2]IRC log

      [2] http://www.w3.org/2007/11/09-i18n-irc

Attendees

   Present
   Regrets
   Chair
          SV_MEETING_CHAIR

   Scribe
          fantasai

Contents

     * [3]Topics
         1. [4]Validator checking entity reqs
     * [5]Summary of Action Items
     _________________________________________________________



   <aphillip_> [6]http://www.w3.org/html/wg/html5/#determining0

      [6] http://www.w3.org/html/wg/html5/#determining0

   <anne>
   [7]http://www.whatwg.org/specs/web-apps/current-work/multipage/secti
   on-parsing.html

      [7] 
http://www.whatwg.org/specs/web-apps/current-work/multipage/section-parsing.html

   <aphillip_>
   [8]http://lists.w3.org/Archives/Public/public-i18n-core/2007OctDec/0
   088.html

      [8] 
http://lists.w3.org/Archives/Public/public-i18n-core/2007OctDec/0088.html

   16: 13 -!- Irssi: Join to #i18n was synced in 0 secs
   ... 13 < Hixie>
   [9]http://www.whatwg.org/specs/web-apps/current-work/multipage/secti
   on-parsing.html#parsing
   ... 13 < Hixie>
   [10]http://www.whatwg.org/specs/web-apps/current-work/multipage/sect
   ion-parsing.html#the-input0

      [9] 
http://www.whatwg.org/specs/web-apps/current-work/multipage/section-parsing.html#parsing
     [10] 
http://www.whatwg.org/specs/web-apps/current-work/multipage/section-parsing.html#the-input0

   <scribe> ScribeNick: fantasai

   Addison: There was a badly-titled thread saying something about
   making windows-1252 the default encoding.
   ... Our first reaction was, wouldn't it be nice if that were
   something else, say utf-8
   ... At the same time we recognize that there's a legacy encoding
   issue, since previous versions of HTML required iso-????

   <hsivonen> [11]http://hsivonen.iki.fi/charmod-checking/

     [11] http://hsivonen.iki.fi/charmod-checking/

   <hsivonen> [12]http://hsivonen.iki.fi/charmod-norm-checking/

     [12] http://hsivonen.iki.fi/charmod-norm-checking/

   Addison: If you actually look at the sections, 8.2 and ....
   ... It does not in fact say that the default encoding of the
   universe at large is windows 1252
   ... In the sequence there's looking at byte sequences, then using
   heuristics, etc.
   ... at the end of that sequence there's a paragraph that says
   ... if all else fails, you have to supply some
   implementation-defined default and we recommend you do these things.
   ... And windows-1252 just appears out of nowhere.
   ... One thought we had was for us to provide some information on why
   windows-1252 is preferable and how it differs from the standard ISO
   encodings.

   <Hixie> "

   <Hixie> When a user agent would otherwise use the ISO-8859-1
   encoding, it must instead use the Windows-1252 encoding."

   Henri: that part is a violation of charmod

   Addison doesn't consider that a violation of charmod

   Addison: There are superset encodings and they're often tagged with
   the subset encodings.
   ... using the superset interpretation doesn't conflict with using
   the subset interpretation
   ... We're not proposing a substantive change, just providing more
   justification for what you're doing.
   ... We also looked at the structure of the paragraph, and had some
   concerns.
   ... one was the phrasing of "western demographics" etc
   ... We had several reactions.
   ... Oene it's not clear what a western demographic and how you tell
   when you're talking to one on the internet.
   ... We proposed 2 things, one of which was to turn two things
   around.
   ... We have a love of utf-8, and we'd like you to mention that one
   first and then the legacy thing
   ... We also think the wording could be changed somewhat on the
   windows-1252 to say that "in a legacy context, if you have to guess,
   you should guess this one"

   Ian: I haven't gotten to that issue yet, haven't looked at it in
   detail, sounds ok

   Richard: Is it purely editorial?

   Addison: It doesn't change the result, it just changes how you
   explain the result.

   Ian: Do you have any recommendation for dealing with say Japan and
   other parts of East Asia?

   Addison: There are a variety of things in step #7 that allow for
   various heuristics and sniffing.

   Ian: windows-1252 is fine for US and UK, but what about other
   places?

   Felix: Depends on what device.

   Addison: Most implementations use information in the browser, e.g.
   what the browser uses or if a narrower auto-detect is set (as for
   Japanese)

   Ian: So in the Japanese cases, you expect that the rest of the steps
   would take care of it?

   Addison: I think you'd trap those encodings before you get to step
   7(?)
   ... Might want to mention that in some cases of getting a subset
   encoding to use the superset encoding.
   ... I think we can provide that information.

   Ian: I believe when I wrote that section that I checked a browser
   and that was the only mapping they had.

   Addison: Most browsers dont' just do GBK, but do ????
   ... There are some cases, such as in Japan, where the byte patterns
   are completely different.
   ... where the encoding schemes are different even though the charset
   is the same
   ... that kind of autodetection is a separate thing
   ... I think this is still valid.
   ... THe only question I have is, if you're thinking "what should
   happen in step 7" is some language-dependent or context-dependent
   thing ...

   Hixie: In this final step, you don't have any information from the
   content

   Addison: You might want to think about splitting step 7 and doing a
   utf-8 detection first
   ... UTF-8 has recognizable byte patterns, it would be great to put
   that first before saying "use your favorite legacy encoding"

   Hixie: The concern is what happens if the user enters some bytes
   into the form and then submits it?

   Addison: We were just looking at that in the i18n working group

   Hixie: We'd have to make sure that that's what the server was
   expecting.

   Felix what information are you looking at to guess what encoding the
   user applies?

   Hixie: Typically different localizations of the browser have
   different default encodings.
   ... well, the email's in my pile. I don't know when I'll get to it.

   Addison: We'll look at superset encodings and try to write up a
   document that you can reference.

   Introductions

   Richard Ishida: W3C Internationalization Lead

   Anne van Kesteren: Opera Software

   Elika: fantasai, CSSWG Invited Expert, works on international text
   layout

   Addison Phillips: Yahoo, i18n wg

   Amit Parashar: something-or-other chair

   Henri Sivonen: working on HTML5 conformance checker

   Ian Hickson: HTML5 editor

   Felix Sasaki: i18n Core, i18n ITS and Web Services Policy WG [W3C]

   <plh> Philippe Le Hegaret: W3C, Architecture Domain (XML, Web
   Services, i18n), and Video

   Ishida: Can you explain the alt text issue?

   <najib> Najib Tounsi, W3C Morocco Office Mgr.

   Ishida: We believe that you should never put human-readable text in
   an attribute value because you can't put markup in it
   ... which is important for various i18n reasons: bidi, language
   annotation, ruby, etc.

   Hixie: We still have the <img> element; we can't get rid of it. It
   still has alt attr, because it's had that.
   ... We can't give it content because HTML parsers all close it right
   after the start tag.
   ... We also have the <object> tag, which has full fallback
   capabilities.

   Ishida: Would the group advise the <object> tag then?

   Hixie: I don't think we'll have a recommendation one way or another;
   if your fallback content needs element content, then you'll have to
   use <object>
   ... We've been doing some work, e.g. Acid2, on making sure the
   <object> tag works properly in various browsers.

   Ishida asks about some XHTML2 stuff

   Hixie: THe XHTML2 group did two things, one was switching some
   attributes into elements, e.g. title attributes.
   ... Then they also went and started usng rdf for everything: we are
   certainly not going to do that.
   ... For the first one, I'm not convinced that the benefits of using
   an element for these things is better than the costs
   ... We can try not to do things like that in the future though
   ... This problem comes up in many places, e.g. in DOM APIs that take
   a string.
   ... There are also places where we can't make such changes, such as
   the <title> element
   ... whose content winds up in places like filenames where you can't
   have structured markup anyway

   Ishida: Can you use bidi in filenames?

   Hixie: probably, but I'm not going to recommend it

   Ishida: We might need to start thinking about how to convert text
   from markup to strings with bidi control characters.

   <anne> (I think HTML 5 should get &rlo;, &lro;, and &pdf; (or
   something in that direction) for BiDi. These are already in IE.)

   Hixie: We did consider having a DOM attribute that would pull out
   e.g. bidi control characters from the markup and alt text from
   images
   ... not sure where that's going
   ... I would recommend finding solutions for plaintext, since that
   will work for both

   Discussion of that

   language tags are in Unicode, but were deprecated as soon as they
   were added: they were added as deprecated and should never be used

   <anne> (event though the characters they map to are apparently
   deprecated)

   discussion of markup-plaintext thing

   <apppp> reference RFC 3066 should point to BCP 47

   Addison notes that the i18n group needs to review the date parsing
   things

   <najib> +1 for to add &rle, ..., &pdf; in HTML

   Henri notes that it's using ISO dates anyway

   najib, if we're adding more entities I want &zwsp;

   :)

   <najib> It depends on usage frequences. :-)

Validator checking entity reqs

   Henri: I don't check that character entities are only used for
   characters that are unclear.
   ... because I can't tell mechanically whether the character is
   unclear

   <anne> fantasai, I think &zwsp; is also supported by IE

   cool

   let's add it :P

   all the characters next to it have names,

   zwnj, zwj etc

   <najib> I don't have IE on MacOS :-( & :-)

   Ishida explain that this part of charmod is about best practices

   it's not should in the normative sense

   Elika: Maybe you should go through the document and change the
   wording of should sentences that don't match RFC2119 to something
   else

   Ishida: Well, we mean it that way for authors. Maybe we need to
   create different classes and explain which recommendations apply to
   which

   <fsasaki> [13]http://hsivonen.iki.fi/charmod-norm-checking/

     [13] http://hsivonen.iki.fi/charmod-norm-checking/

   Henri: I documented which constructs in HTML5 result in a continuous
   string
   ... I don't have any other comment there except that I wrote this
   and it is available :)
   ... I have another comment, but its targetted at the unicode/icu
   specs

   Ishida: Might want to post to the unicode list

   <apppp> Title: I18N / HTML5 break out session

   <apppp> Scribe: fantasai

   <apppp> ScribeNick: fantasai

Summary of Action Items

   [End of minutes]
     _________________________________________________________


    Minutes formatted by David Booth's [14]scribe.perl version 1.128
    ([15]CVS log)
    $Date: 2007/11/09 17:05:50 $
     _________________________________________________________

     [14] http://dev.w3.org/cvsweb/~checkout~/2002/scribe/scribedoc.htm
     [15] http://dev.w3.org/cvsweb/2002/scribe/

Scribe.perl diagnostic output

   [Delete this section before finalizing the minutes.]
This is scribe.perl Revision: 1.128  of Date: 2007/02/23 21:38:13
Check for newer version at [16]http://dev.w3.org/cvsweb/~checkout~/2002
/scribe/

     [16] http://dev.w3.org/cvsweb/~checkout~/2002/scribe/

Guessing input format: RRSAgent_Text_Format (score 1.00)

Succeeded: s/THere/There/
Succeeded: s/asks a question/what information are you looking at to gue
ss what encoding the user applies?/
Succeeded: s/psot/post/
Found ScribeNick: fantasai
Found Scribe: fantasai
Found ScribeNick: fantasai
WARNING: No scribe lines found matching ScribeNick pattern: <fantasai>
...

WARNING: No "Present: ... " found!
Possibly Present: Addison Elika Felix Henri Hixie Ian Ishida MikeSmith
Philip Richard ScribeNick Title amit anne aphillip_ apppp fsasaki hsivo
nen jgraham_ najib plh smedero
You can indicate people for the Present list like this:
        <dbooth> Present: dbooth jonathan mary
        <dbooth> Present+ amy


WARNING: No meeting title found!
You should specify the meeting title like this:
<dbooth> Meeting: Weekly Baking Club Meeting


WARNING: No meeting chair found!
You should specify the meeting chair like this:
<dbooth> Chair: dbooth

Got date from IRC log name: 9 Nov 2007
Guessing minutes URL: [17]http://www.w3.org/2007/11/09-i18n-minutes.htm
l
People with action items:

     [17] http://www.w3.org/2007/11/09-i18n-minutes.html

WARNING: Input appears to use implicit continuation lines.
You may need the "-implicitContinuations" option.


   End of [18]scribe.perl diagnostic output]

     [18] http://dev.w3.org/cvsweb/~checkout~/2002/scribe/scribedoc.htm

Received on Friday, 9 November 2007 17:07:11 UTC