Minutes from telecon July 16, 14:00 EST

From: Johan Hjelm (hjelm@w3.org)
Date: Fri, Jul 16 1999

Message-Id: <4.1.19990716200836.00cf71c0@>
Date: Fri, 16 Jul 1999 21:08:37 +0200
To: www-wca@w3.org
From: Johan Hjelm <hjelm@w3.org>
Subject: Minutes from telecon July 16, 14:00 EST

Johan took the chair, in Jims absence

Round the table:
Mark: Main thing, 2 things: New implementation of repository, new URL
yesterday (dime.dlib.vt.edu/cgi-bin/cgiwrap/hussein/index.pl). Look at it,
give feedback. Implementation of new UI, back-end local, not OCLC. Control
aspects. Different customizations from OCLC. Refinements to take into account.
2nd, prototype idea of form to describe log file formats, XML description,
certify simple properties, some looked. Presently, suggestions, Bala how to
change UI description of logs. 

Ed: Basically, taking and finishing 3rd web sample, at least preliminary
interesting results, identified 2,5 million public webs. Close to 350
million pages. Sort of latest numbers. Fairly significant growth from last
year. Finall numbers next phonecall. Analysis in parallell with harvest.
Started 3 weeks ago, apart from megasites. Halflive web page 6 weeks,
compared last sample. Sites rarely disappear. Stuck with identical IP
adresses to be able to compare. See how many sites develop. Few disappear,
those are 2-3 page sites. Seat of the pants numbers. Sites grown
dramatically. Followed robot exclusion, very few do. Didn't check last
year. Hard to compare. Appear to be small even this year. 

Bala: Tutorial at Sigcomm, 5-6 weeks, web and IP wload, HTTP - HTTP 1.1.
Working on protocol compliance test. Ready by next meeting. 

Martin: Also working on compliancy stuff, world cup data. Have reduced
data, go through tests, release. How can I give it to repository? Too much
to submitt over the net. Who to contact. Want to send a tape - it is 10-11 GB. 
Mark: Forms working for trace submission real soon, try it. Other question,
who will have the repository, where will it reside. 

Joe: No status update. 

Johan: Work on mobile decreasing now, so I should be able to spend more
time in this group. Will be taking over after Henrik as staff contact when
he leaves. 

IETF talk: 
Nobody seems to have been at IETF. No input from JimG. 

Last call discussion on WCA repository, UI, XML encoding, validation, etc
Mark: Email yesterday about the reimplemented repository. Take a look at
it, mail comments to list. (Johan: Or use ETA, I tried to create a forum
but did not succeed - we can still use the general WCA issues forum at
Joe: Thought it was OK so far, look at search form. It is hard to do a real
implementation until you have anything get out of there. 
Mark: OCLC done dump of data. Fix, records did not have sufficient fields.
Fixed up when enter in new system. New forms enter data, specific to each sys. 
Bala: Use BIB format? Import them (BIBtech).
Mark: Internal representation is hidden from outside. Good idea to import.
Integrate papers from ext directory. 
Other issue, general philosophy, UI describing trace files, follow up to
discussion to prototype heavily oriented to XML. Jim asked if we were
satisfied, hiding XML. Issue is how general to make repository to hold
trace files. Makes underlying assumptions, about number of records per
line, whitespaces. Is it OK to make a set of assumptions, build on that?
Have to agree on canonical names, in a way that covers the trace files we
are familiar with. 
Bala: Validation? 
(no one spoke up - and I hope I got the following right)
Mark: We did a program to validate properties when they are described in XML.
Bala: Validate a random date format? There are 18 different formats I have
Marks student (sorry missed the name): If you specify that is the date
format, then we can do. We have set it up for you to specify the date.
Bala: THat is incorrect. 
Marks student: THen, we have to include all the 18 formats. 
Bala: There are all manners of errors, even if all the dates are given in
the same format. What is the description I have to do? My ideal would be to
provide a sample string that was scanned. 
Mark: The validation format would read the XML and verify that the format
matches the description. This is an issue about how the UI works. We had a
pulldown menu in the original. We have n canonical names, and the submitter
selects the name. The string idea we haven't thougt about. 
[further technical discussion - but we really need to try this out]
Mark: This discussion was for dates - there are a lot of other fields. Our
programme is a first version. We will now integrate it with the new UI, and
get comments back. 
Bala: You also have to do a validation of the time range, etc. 
Mark: Last call - Jim is more of an optimist than me. 
Bala: This can also be extended to other fields. Think about the scanning
parser as a supplement to the menu of canonical names. It is not foolproof,
the interpretation may be dependent on assumptions. 

Tools - who's got em, how do we get em.
Ed: What kind of tools do we mean? We have tools for a lot of things. 
Johan: Let's start with the anonymization. 
Martin: Jim was going to check anonymization tools. 

Johan: I just wanted to point out that the WWW9 conference is coming up -
the deadline for papers is November 22. This may seem far away, but after
your vacation it is time to start writing. Also, the organisers have not
quite decided on the conference layout yet, but they are very favourable to
the idea of themed tracks, I am told. I was wondering if we should have a
WCA track. 
Ed: We are tentatively going to submit a paper on our sample. I would
encourage a WCA track. 
Johan: I also had the idea that we might want to put together some
historical data, e.g. a time series (say, one log per month over a period
of a few years from one site) that shows the development of a web site. It
is now five years since the web took off, and 10 years since Tims original
proposal. We should try to use that to promote the WCA.

Closed on 14:50 EST
Johan Hjelm

                     Johan HJELM
       Ericsson Research, User Applications Group 
         Currently visiting engineer at the W3C
             The World Wide Web Consortium
    Fax +1-617-258 5999, Phone +1-617-253-9630
   MIT/LCS, 545 Tech. Sq. Cambridge MA 02139 USA 
        opinions are personal, always my own, 
  and not necessarily those of Ericsson or the W3C.