W3C home > Mailing lists > Public > xmlschema-dev@w3.org > July 2001

Re: [ANN] LOGML (Log Markup Language) Draft Specification and Schema

From: Martin Duerst <duerst@w3.org>
Date: Mon, 02 Jul 2001 15:21:11 +0900
Message-Id: <4.2.0.58.J.20010702145837.035ae310@sh.w3.mag.keio.ac.jp>
To: puninj@cs.rpi.edu, xmlschema-dev@w3.org
Cc: puninj@cs.rpi.edu
Hello John,

Two points, the first one generic, and hopefully of use to
every schema developer. The second one specific to your proposal:

First point:

In appendix A
(http://www.cs.rpi.edu/~puninj/LOGML/draft-logml.html#Char)
you write:


 > Since LOGML is an application of XML, LOGML supports Unicode [UTR20].
 > Unicode is a 16 bit encoding for characters. The latest version Unicode
 > 3.0 contains 49,194 distinct coded characters. The default character set
 > for LOGML is ISO-8859-1 (Latin 1). Appendix B of XML 1.0 document explains
 > in more detail what Unicode characters can be used for tag names.


Let's look at this one by one:

 > Since LOGML is an application of XML, LOGML supports Unicode

Good.

 > [UTR20].

Thanks for referencing this, but I'm not sure this is the best reference.
UTR20 should be referenced when there is an issue of how to use Unicode.
For the simple fact that XML applications support Unicode, the XML
Rec is the crucial reference.


 > Unicode is a 16 bit encoding for characters.

Wrong. Unicode now (as of 3.1) supports somewhere around 90,000 characters.
That doesn't fit into 16 bits.
See http://www.unicode.org/unicode/standard/WhatIsUnicode.html.


 > The latest version Unicode 3.0 contains 49,194 distinct coded characters.

This was correct, but is no longer. See 
http://www.unicode.org/unicode/reports/tr27/.
In general, it's a bad idea to mention any specific Unicode version number,
as Unicode is evolving (and XML is done so that it can move along, at least
for content).

 > The default character set for LOGML is ISO-8859-1 (Latin 1).

This is confusing, wrong, or dangerous, or probably all of these
together. What does it mean that iso-8859-1 is the default?
Does it mean that an unmarked (*) LOGML file is in iso-8859-1?
This would clearly be in conflict with the XML Rec, which
says that such files are UTF-8. So LOGML wouldn't be XML anymore.

On the other hand, if you want to say that in addition to
UTF-8 and UTF-16 (as required by the XML Rec), LOGML applications
should also support properly marked (*) iso-8859-1, then it's
better to say so to avoid misunderstandings.

(*) marked means that there is an "encoding" pseudo-attribute
on the xml/text declaration, or appropriate info e.g. in an
HTTP header.


Second point:
[Discussion of this point may not really be appropriate for the
xmlschema-dev list. Please move it to a more appropriate place.]
I just had a quick look at your proposal. I didn't see any kind
of support for content negotiation (e.g. Accept-Language,...)
and related features, and for Content-Type (e.g. if I have
images both as .png and as .gif, how many times is each variant
served). Maybe I didn't look close enough, in that case, can
you give me a pointer?


Regards,   Martin.

At 17:23 01/06/29 -0400, puninj@cs.rpi.edu wrote:
>Hello
>
>I'm glad to announce the draft specification of LOGML (Log Markup Language)
>and Schema at: http://www.cs.rpi.edu/~puninj/LOGML/
>
>[[[
>
>Log Markup Language (LOGML) is an XML 1.0 application designed to describe
>log reports of web servers. Web-data mining is one of the current hot topics
>in computer science. Mining data that has been collected from web server
>logfiles, is not only  useful for studying customer choices, but also helps
>in organizing web pages. This is accomplished by knowing which web pages are
>most frequently accessed by the web surfers. The structure of a web site is
>represented as a web graph (see the XGMML draft specification
>http://www.cs.rpi.edu/~puninj/XGMML/ ). In mining the data from the log
>statistics, we use the web graph in annotating the log information. Further
>we give summary reports, comprising of information such as client sites,
>types of browsers and the usage time statistics. We also gather the client
>activity in a web site as a subgraph of the web site graph. This subgraph
>can be used to get better understanding of general user activity in the web
>site.
>
>In LOGML, we create a new XML vocabulary to structurally express the contents
>of the logfile information.
>
>]]]
>
>We provide with a LOGML dtd and LOGML Schema (based on XML Schema W3C
>Recommendation 2 May 2001). Software will be available pretty soon.
>
>LOGML 1.0 Draft Specification: 
>http://www.cs.rpi.edu/~puninj/LOGML/draft-logml.html
>LOGML DTD: http://www.cs.rpi.edu/~puninj/LOGML/logml.dtd
>LOGML Schema: http://www.cs.rpi.edu/~puninj/LOGML/logml.xsd
>
>Questions and comments are welcome.
>
>John Punin
>puninj@cs.rpi.edu
>


#-#-#  Martin J. Du"rst, I18N Activity Lead, World Wide Web Consortium
#-#-#  mailto:duerst@w3.org   http://www.w3.org/People/D%C3%BCrst
Received on Monday, 2 July 2001 02:59:33 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 11 January 2011 00:14:22 GMT