Date: Tue, 20 Apr 1999 13:46:04 PDT To: "firstname.lastname@example.org" <email@example.com> From: Jim Pitkow <firstname.lastname@example.org> Message-Id: <99Apr20.134621pdt."361693"@louise.parc.xerox.com> Subject: Re: repository requirements Marc, thanks for your responses. At 11:21 AM 4/20/99 , Marc Abrams wrote: >All log files used by anyone writing a paper should be in the uncertified >section of the repository. Why? If I read someone's paper and question the >analysis done, I would like to have access to the data used! I agree - having access to the logs other researchers report in papers promotes good scientific rigor. If they submit the log though, we should be able to certify it, no? >Certainly for papers published in the pre-certification days access to the >(uncertified) logs used is desirable. (Or do we retroactively certify all >logs used in the past?) It'd be nice to do the later if possible. >Final point. Suppose I am a researcher and I want to do a study on X. Turns >out there is no certified log in the wca repository on x. What do I do? (1) >Wait until a log gets certified and then do my study, or (2) just do the study >and try to get the log certified later? If the world does (2), and the log >doesn't wind up certified, we again have a paper in the literature for which >the original trace data is unavailable. 1) The certification process should be fairly quick (order days since we are attempting to automate parts of this), no? If so, it seems to me that this delay should not make or break anyone's research time frame. 2) I like your thought experiment of pushing to see under what cases logs will not get certified. Here are some of the reasons I can think of: a) researcher can not submit the logs due to proprietary nature - this already happens a lot as it is today, so there is no realized gain/loss. b) the data is found to be inconsistent/erroneous - if this is the case, then all results based on this data are suspect at best c) incomplete meta-data, methodological description, etc. which can be rectified with the submitting researcher. d) other reasons? Another issue is what to do with continuous data feeds, for example the daily caching logs from NLANR? Once automated, we should be able to certify each daily set of logs and provide cleaned versions if necessary.