W3C home > Mailing lists > Public > public-xml-er@w3.org > February 2012

RE: David's less simple example

From: David Lee <David.Lee@marklogic.com>
Date: Tue, 28 Feb 2012 16:14:17 -0800
To: Noah Mendelsohn <nrm@arcanedomain.com>
CC: Jeni Tennison <jeni@jenitennison.com>, David Carlisle <davidc@nag.co.uk>, "public-xml-er@w3.org Community Group" <public-xml-er@w3.org>
Message-ID: <EB42045A1F00224E93B82E949EC6675E16ADDAAA93@EXCHG-BE.marklogic.com>
Agreed...  "Now were talking !"
This is a harder problem then writing up a set of possible recovery options.
Talking about how this will work in the real world !!!!

Note that my #2 and #3 have almost exactly opposite expectations ... I suggest both are reasonable.
How to handle that ?


-----------------------------------------------------------------------------
David Lee
Lead Engineer
MarkLogic Corporation
dlee@marklogic.com
Phone: +1 650-287-2531
Cell:  +1 812-630-7622
www.marklogic.com

This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation.


-----Original Message-----
From: Noah Mendelsohn [mailto:nrm@arcanedomain.com] 
Sent: Tuesday, February 28, 2012 7:11 PM
To: David Lee
Cc: Jeni Tennison; David Carlisle; public-xml-er@w3.org Community Group
Subject: Re: David's less simple example

Thanks, but I think it would be useful to get clarification on some of your proposed use cases. In particular, when we're fixing up suspect data, I think the most important question is: how bad would be consequences be if we guessed wrong.

So, I can imagine scenarios where I important a bunch of suspect stuff into an XML Database, but then I have some good way of checking the results and doing further cleanup. In such cases, almost any fixup is safe to try. I can imagine a different scenario in which data is loaded into a database, and then trusted for downstream processing. In such cases, I think it's very important to be conservative. So, for example, accepting an unquoted attribute value like:

	<e a=3>

seems way less risky than fixing up poorly nested elements.

Noah

On 2/28/2012 6:57 PM, David Lee wrote:
> Use cases !!!! YES !!!
>
> Whenever I'm stuck on a design my superiors/peers remind me to "Write Use Cases" ... of course I grumble but am never unhappy I did.
>
> I'll start out with a few that *I* would be interested in.   These are personal opinions<disclaimer employer blah blah blah>
>
> 1) As a user of an XML scripting language like say XProc or xmlsh and would like to read in a file that " normal" XML tools wont parse but its 'close'.
> So I would like to do something like
>      xfixup | xslt ...
> Or
>     xfixup unknown_file.txt>  goodfile.xml
>
>
> 2) As a user of a XML database (e.g maybe eXists or MarkLogic) I would like to "Ingest" (aka "Store" aka "Load" ) a file into the database.  The system expects well formed XML.
> But occasionally I run into a broken file.  As a user I would like that file to ingest without stopping the process.  Maybe generating a list of some warnings.
> Predictability is not so important I just "want it to work"
>
> 3) As a developer of an XML Database I would like to provide the feature for user case #2 in a predictable way so that users could load "broken" files and something "reasonable" happens to to alteast let them load.  Predictability is highly important or else people will get mad and someone will get fired for "doing the wrong thing".     Perhaps if they supplied a schema I could do a better job or report better errors.   At worse I could load the file and provide a list of "fix ups" done so that they could optionally report the errors but 'keep on trucking'
>
> 4) As the user of a XML Editor (GUI) I would like to load a file given to me by unknown source and display it more usably then crashing and saying something like "Invalid character encountered at line # 13431415  "  ... Atleast let me load it and maybe highlight spots where fixups were performed and maybe let me approve or reject them or edit them.
>
>
>
>
>
>
>
> ----------------------------------------------------------------------
> -------
> David Lee
> Lead Engineer
> MarkLogic Corporation
> dlee@marklogic.com
> Phone: +1 650-287-2531
> Cell:  +1 812-630-7622
> www.marklogic.com
>
> This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation.
>
> -----Original Message-----
> From: Noah Mendelsohn [mailto:nrm@arcanedomain.com]
> Sent: Tuesday, February 28, 2012 6:44 PM
> To: Jeni Tennison
> Cc: David Carlisle; public-xml-er@w3.org Community Group
> Subject: Re: David's less simple example
>
>
> On 2/28/2012 1:46 PM, Jeni Tennison wrote:
>> Yes, I am arguing that the editor use case is an overwhelming 
>> objection.
>
> As I just wrote in an earlier e-mail, I think the elephant in this particular room is the lack of an agreed list of such use cases. Is the "editor" use case an agreed requirement, a nice to have if it falls out, or not to be worried about at all? Is there such a list that I've missed.
>
> I think we're going to thrash on questions like whether it's worth diverging from HTML5 fixup rules if we don't have some agreement on prioritizing the use cases.
>
> Noah
>
>
Received on Wednesday, 29 February 2012 00:14:39 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 29 February 2012 00:14:39 GMT