- From: Martin Duerst <duerst@w3.org>
- Date: Mon, 02 Jul 2001 15:21:11 +0900
- To: puninj@cs.rpi.edu, xmlschema-dev@w3.org
- Cc: puninj@cs.rpi.edu
Hello John, Two points, the first one generic, and hopefully of use to every schema developer. The second one specific to your proposal: First point: In appendix A (http://www.cs.rpi.edu/~puninj/LOGML/draft-logml.html#Char) you write: > Since LOGML is an application of XML, LOGML supports Unicode [UTR20]. > Unicode is a 16 bit encoding for characters. The latest version Unicode > 3.0 contains 49,194 distinct coded characters. The default character set > for LOGML is ISO-8859-1 (Latin 1). Appendix B of XML 1.0 document explains > in more detail what Unicode characters can be used for tag names. Let's look at this one by one: > Since LOGML is an application of XML, LOGML supports Unicode Good. > [UTR20]. Thanks for referencing this, but I'm not sure this is the best reference. UTR20 should be referenced when there is an issue of how to use Unicode. For the simple fact that XML applications support Unicode, the XML Rec is the crucial reference. > Unicode is a 16 bit encoding for characters. Wrong. Unicode now (as of 3.1) supports somewhere around 90,000 characters. That doesn't fit into 16 bits. See http://www.unicode.org/unicode/standard/WhatIsUnicode.html. > The latest version Unicode 3.0 contains 49,194 distinct coded characters. This was correct, but is no longer. See http://www.unicode.org/unicode/reports/tr27/. In general, it's a bad idea to mention any specific Unicode version number, as Unicode is evolving (and XML is done so that it can move along, at least for content). > The default character set for LOGML is ISO-8859-1 (Latin 1). This is confusing, wrong, or dangerous, or probably all of these together. What does it mean that iso-8859-1 is the default? Does it mean that an unmarked (*) LOGML file is in iso-8859-1? This would clearly be in conflict with the XML Rec, which says that such files are UTF-8. So LOGML wouldn't be XML anymore. On the other hand, if you want to say that in addition to UTF-8 and UTF-16 (as required by the XML Rec), LOGML applications should also support properly marked (*) iso-8859-1, then it's better to say so to avoid misunderstandings. (*) marked means that there is an "encoding" pseudo-attribute on the xml/text declaration, or appropriate info e.g. in an HTTP header. Second point: [Discussion of this point may not really be appropriate for the xmlschema-dev list. Please move it to a more appropriate place.] I just had a quick look at your proposal. I didn't see any kind of support for content negotiation (e.g. Accept-Language,...) and related features, and for Content-Type (e.g. if I have images both as .png and as .gif, how many times is each variant served). Maybe I didn't look close enough, in that case, can you give me a pointer? Regards, Martin. At 17:23 01/06/29 -0400, puninj@cs.rpi.edu wrote: >Hello > >I'm glad to announce the draft specification of LOGML (Log Markup Language) >and Schema at: http://www.cs.rpi.edu/~puninj/LOGML/ > >[[[ > >Log Markup Language (LOGML) is an XML 1.0 application designed to describe >log reports of web servers. Web-data mining is one of the current hot topics >in computer science. Mining data that has been collected from web server >logfiles, is not only useful for studying customer choices, but also helps >in organizing web pages. This is accomplished by knowing which web pages are >most frequently accessed by the web surfers. The structure of a web site is >represented as a web graph (see the XGMML draft specification >http://www.cs.rpi.edu/~puninj/XGMML/ ). In mining the data from the log >statistics, we use the web graph in annotating the log information. Further >we give summary reports, comprising of information such as client sites, >types of browsers and the usage time statistics. We also gather the client >activity in a web site as a subgraph of the web site graph. This subgraph >can be used to get better understanding of general user activity in the web >site. > >In LOGML, we create a new XML vocabulary to structurally express the contents >of the logfile information. > >]]] > >We provide with a LOGML dtd and LOGML Schema (based on XML Schema W3C >Recommendation 2 May 2001). Software will be available pretty soon. > >LOGML 1.0 Draft Specification: >http://www.cs.rpi.edu/~puninj/LOGML/draft-logml.html >LOGML DTD: http://www.cs.rpi.edu/~puninj/LOGML/logml.dtd >LOGML Schema: http://www.cs.rpi.edu/~puninj/LOGML/logml.xsd > >Questions and comments are welcome. > >John Punin >puninj@cs.rpi.edu > #-#-# Martin J. Du"rst, I18N Activity Lead, World Wide Web Consortium #-#-# mailto:duerst@w3.org http://www.w3.org/People/D%C3%BCrst
Received on Monday, 2 July 2001 02:59:33 UTC