- From: Juan Lanus <juan.lanus@gmail.com>
- Date: Sat, 23 Jan 2010 16:04:39 -0300
- To: public-webapps@w3.org
I'm new to this list and to all the W3C work so I might be completely wrong. That said, let's say. Dmitry posed a simple question: If a file's blob should be kept in sync with the file's content in disk, or not. He did not get a "yes" or "no" answer but instead triggered a near 30 posts thread that as I see it denotes a certain lack of definition so far. This is what I think, after having read only the draft and this thread: ** The "mutating blob" The idea of keeping the disk file in sync with its working version, the "mutating blob", as too risky and impractical. IMO doing so will raise a lot of issues while solving none. What is the scenario that calls for such feature? I can't see any, buy I can yes see lots of scenarios where data stability is desirable. For example a disk file holding the data of an active relational database. The scenario is uploading a big file where possibly many concurrent applications introduce changes anywhere in the file, every few seconds. I know that this example is contrived, but there might be many others with similar characteristics, albeit not so clear and dramatic. In this scenario the UA might be completely busy in trying to keep current with the changes, like during a DoS attack. Another requirement for a database file is that it has to be consistent, so sending a slice of one version lumped with a slice of a later version is unacceptable. If, and only if, there is an unavoidable requirement for such a feature then I strongly suggest that the API specifies a flag informing the application that the original file changed during the operation but without doing nothing. Let the developer decide if she wants to take any action, instead of trying in advance to solve her a problem that might not exist. In one post Dmitry says that he found out that "developers expect Blob to be a 'snapshot'". This is the way to go: talking with developers and also with software architects who already solved issues like this years ago. ** Locking What's wrong with file locking? May be it was discussed in prior sessions I didn't read, because it seems to be already discarded. But locking is the universally accepted solution in multitasking operating systems. The API should lock the files to prevent them to be written by other applications, for a short while or during a long time It is a must, to make the read atomic (atomic is not desirable but a must): 1 the UA SHOULD lock the file (a mandatory lock preventing writes by other apps) and open it . 1a the file refuses to be locked .. 1a1 the operation fails with a "file is locked" error .. 1a2 the use case fails 2 the UA uses the file 3 the UA unlocks the file by issuing a close method For small files this does not make a difference. But what happens if the file is huge? In this case leave the problem to the developer, the one who knows about the environment and the particular requirements. For example the developer could choose to swiftly copy the file into a blob and close it to release the brief lock if it is a busy file (database ...), or have if locked during a lengthy transference operation if the file content is static (video, or backup ...). It is not possible to solve all the developer's issues at this point, we can only provide tools, the simpler the better, for the developers to leverage. For very special cases there might be an option locking="no" to open a file allowing other applications to change it. Intuitively I perceive this as a security crack. Such a file could become a communicacion area between the computer contents and the web. A trojan could repeatedly paste information in the file for the UA to send it to the bad guy's server. This could be achieved by setting a trojan listener in the OS to detect when the user selected a file. As I see it when the user allows the UA to grab a file then she means "what the file contains right now" and we MUST not deceive her. ** Avoid involving technology limitations in the design The File API is sort of an impedance adapter between the latency of the Internet connections and the speed of disk drives (disks or whatever, think of the future). As such, it must be able to handle any speed difference. In the future the case difference might change its sign. Also, the API must consider that what today is regarded as "big" might be "regular" in the future and "small" after a while. For example making a memory copy of a 300MB file is possible today but not when the computers, even the mainframes, sported a few MB RAM. The "virtual memory" that most OSs have is an existing implementation of a in-memory file backed by disk storage. This issue is already solved, since the seventies. A program, like the UA, can pump lots of data into RAM and the OS will use the disk to store the bytes in case of a shortage in real RAM. This way computers, like PCs, appear to have twice as much RAM as they have physically installed, at the cost of some performance loss that is completely compatible with Internet latency. Many PCs built today have 2 trhru 4GB of real RAM, so they appear to have 4 to 8GB providing lots of headroom to manage somehow big files. It the files to handle were bigger, then it's the developer's responsibility to manage the issue, for example telling the user not to upload a file. It should not a "making a copy" vs. "using the original data" issue. These are different scenarios and which to use should be up to the developer to knows the requirements and the environment of her application. In the original RFC1867 specification if the user uploaded a very big file she had to wait for ages for the whole file to upload before getting an error message. The File API comes to the rescue, allowing the UA to say so before the upload. ** The scenario I'm working in My scenario for the blob functionality is related to image uploading. The UI gets a bunch of files containing images of various sizes. Somehow the client (UA) resizes them to fit some web application limits before the upload like limiting width to 800px and lowering quality so the size is below 100K, and to do so it stores each image in a blob. The user is looking at the image in the UI during this process and she does not expect it to change due to local file action. If she wants a new version of some image then she "reloads" them. For example she modifies the image colors and saves a new version. ** This is the only chance to make changes to the API Changing the API should not be a bounding constraint at this stage. Joshua Bloch, regarded as the most important API designer says that "Public APIs are forever - one chance to get it right" in this context: http://lcsd05.cs.tamu.edu/slides/keynote.pdf If it is possible to change it for better it must be done before it's too late. Else millions of developers in the future will lose parts of their lives struggling against definitions like the DOM differences that literally swallowed entire lives in terms of time spent doing avoidable work. Bloch publicly shares his knowledge about API design, the documents are findable searching for "Joshua Bloch API design". I apologize if this information is too obvious for someone, I bring it here because I consider it's valuable to keep it in mind, al least for me. For example JB encourages doing the design based in user needs and use cases, and I was unable to find any formal such document (I recognize that I used limited time in my search). He says "Gather Requirements–with a Healthy Degree of Skepticism", meaning that the users MAY propose solutions but that the last work MUST come from a knowledgeable professional. In this thread I didn't see references to users, sauve for two postings by Dmitry. In short: - Trash the mutating blob, - lock the file when opening it (by default) and release it upon close, - let the developer decide about copying the file in memory or slowly read it and feed it to the upload, - let's not design for technological limitations, and - let's make it right at the first and only attempt. Respectfully, -- Juan Lanus Globant
Received on Saturday, 23 January 2010 19:05:38 UTC