Re: [w3c/FileAPI] Rewrite a lot of the internals of Blobs to make reading more explicit. (#154)

@inexorabletash approved this pull request.

I probably didn't catch everything, but I made it to the end!

>  
 <div algorithm="process-blob-parts">
-To <dfn lt="process blob parts|processing blob parts">process blob parts</dfn> given a sequence of {{BlobPart}}'s |parts|
-and {{BlobPropertyBag}} |options|,
+To <dfn lt="process blob parts|processing blob parts">process blob parts</dfn>
+given a sequence of {{BlobPart}}'s |blobParts| and {{BlobPropertyBag}} |options|,

Nit: `{{BlobPart}}s` (plural s, not a possessive s)

>  
 <div algorithm="process-blob-parts">
-To <dfn lt="process blob parts|processing blob parts">process blob parts</dfn> given a sequence of {{BlobPart}}'s |parts|
-and {{BlobPropertyBag}} |options|,
+To <dfn lt="process blob parts|processing blob parts">process blob parts</dfn>
+given a sequence of {{BlobPart}}'s |blobParts| and {{BlobPropertyBag}} |options|,

Also, maybe list rather than sequence? (I'm hazy on when to switch from IDL terms to Infra terms)

>  
-1. For each |element| in |parts|:
+1. Let |bytes| be an empty [=byte sequence=].

This algorithm concatenates adjacent non-blob parts rather than just ending up with more parts. Although this probably matches implementations, it doesn't seem to be observable and makes the algorithm more complicated. Is there a good reason for it?

> @@ -361,16 +458,179 @@ run the following steps:
   1. If |element| is a {{BufferSource}}, <a lt="get a copy of the buffer source">get

Not new, but would be more readable with "Otherwise, if ..."

> @@ -361,16 +458,179 @@ run the following steps:
   1. If |element| is a {{BufferSource}}, <a lt="get a copy of the buffer source">get
     a copy of the bytes held by the buffer source</a>, and append those bytes to |bytes|.
 
-  1. If |element| is a {{Blob}},
-    append the bytes it represents to |bytes|.
+  1. If |element| is a {{Blob}}:

"Otherwise, if ..."

> +1. Set |result|.[=read algorithm=] to the [=multipart blob read steps=].
+
+1. Return |result|.
+
+</div>
+
+A <dfn>multipart blob read state</dfn> is a [=struct=] consisting of:
+
+<dl dfn-for="multipart blob read state">
+: <dfn>parts</dfn>
+:: A [=queue=] of [=blob data descriptions=], representing the not yet read parts of the blob.
+: <dfn>offset</df>
+:: A number, representing the byte offset in the remaining blob parts
+  from which to start returning data.
+: <dfn>nested blob data</dfn>
+:: `undefined` or a [=blob data description=]. This is `undefined` unless otherwise specified.

I don't think we typically style undefined (or null, true, false) as code ? 

(Likely not consistent across specs)

> +
+## Constructors ## {#constructorBlob}
+
+
+<div algorithm="blob-constructor">
+The <dfn constructor for=Blob lt="Blob(blobParts, options)|Blob(blobParts)|Blob()"><code>new Blob(|blobParts|, |options|)</code></dfn> constructor steps are:
+
+1. Let |blob data| be the result of [=processing blob parts=] given |blobParts| and |options|.
+1. Set [=this=].[=[[data]]=] to |blob data|.
+
+1. Let |type| be an empty string.
+1. If the {{BlobPropertyBag/type}} member of the {{Blob/Blob(blobParts, options)/options}} argument is not the empty string,
+  run the following sub-steps:
+
+  1. Let |type| be the {{BlobPropertyBag/type}} dictionary member.
+    If |type| contains any characters outside the range U+0020 to U+007E,

This could be a separate substep

> +
+
+<div algorithm="blob-constructor">
+The <dfn constructor for=Blob lt="Blob(blobParts, options)|Blob(blobParts)|Blob()"><code>new Blob(|blobParts|, |options|)</code></dfn> constructor steps are:
+
+1. Let |blob data| be the result of [=processing blob parts=] given |blobParts| and |options|.
+1. Set [=this=].[=[[data]]=] to |blob data|.
+
+1. Let |type| be an empty string.
+1. If the {{BlobPropertyBag/type}} member of the {{Blob/Blob(blobParts, options)/options}} argument is not the empty string,
+  run the following sub-steps:
+
+  1. Let |type| be the {{BlobPropertyBag/type}} dictionary member.
+    If |type| contains any characters outside the range U+0020 to U+007E,
+    then set |type| to the empty string and return from these substeps.
+  1. Convert every character in |type| to [=ASCII lowercase=].

This could be an "Otherwise,..." and then "return from these substeps" wouldn't be needed.

> -    Note: Use of the {{Blob/type}} attribute informs the [=package data=] algorithm
-    and determines the `Content-Type` header when [=/fetching=] [=blob URLs=].
-</dl>
+</div>
+
+The <dfn attribute for=Blob id="dfn-size">size</dfn> getter steps are to return [=this=].[=[[data]]=].[=blob data/size=].
+
+<div class="note domintro">
+: |blob| . {{Blob/type}}
+:: The ASCII-encoded string in lower case representing the media type of the {{Blob}},
+  or an empty string if the type cannot be determined.
+
+  The {{Blob/type}} attribute can be set by the web application itself through constructor invocation
+  and through the {{Blob/slice()}} call;
+
+  Note: The type of a {{Blob}} is considered a <a>parsable MIME type</a>,

Not new, but grammar here is awkward. Since this is non-normative, it can also be simplified, e.g.:

Note: The type of a Blob is considered a _parsable MIME type_ [no comma] if performing the _parse a MIME type_ algorithm on [not to] the Blob object's type (as an ASCII-encoded byte sequence) does not return failure.

> +
+1. Let |relativeStart| be 0.
+1. If |start| is not `undefined`:
+  1. If |start| < 0, set |relativeStart| to <code>max([=this=].[=[[data]]=].[=blob data/size=] + |start|, 0)</code>.
+  1. Otherwise, set |relativeStart| to <code>min(|start|, [=this=].[=[[data]]=].[=blob data/size=])</code>.
+
+1. Let |relativeEnd| be [=this=].[=[[data]]=].[=blob data/size=].
+1. If |end| is not `undefined`:
+  1. If |end| < 0, set |relativeEnd| to <code>max([=this=].[=[[data]]=].[=blob data/size=] + |end|, 0)</code>.
+  1. Otherwise, set |relativeEnd| to <code>min(|end|, [=this=].[=[[data]]=].[=blob data/size=])</code>.
+
+1. Let |span| be <code>max((relativeEnd - relativeStart), 0)</code>.
+
+1. Let |relativeContentType| be an empty string.
+1. If |contentType| is not `undefined`:
+  1. If |contentType| does not contain any characters outside the range of U+0x0020 to U+0x007E:

There are enough occurrences of "if not ASCII, empty string. Otherwise, ASCII-lowercase it" that a separate algorithm seems like a good idea.

> +
+<div algorithm="create a file">
+To <dfn export>create a file backed {{File}} object</dfn> for a given |native file|,
+run these steps:
+
+1. Let |snapshot state| be an empty [=map=].
+1. Set |snapshot state|[<code><a for="file blob snapshot state">"file"</a></code>] to |native file|.
+1. Let |last modified| be the last time |native file| was modified,
+  as the number of milliseconds since the [=Unix Epoch=].
+  If this can't be determined, set |last modified| to the current date and time
+    represented as the number of milliseconds since the [=Unix Epoch=].
+1. Set |snapshot state|[<code><a for="file blob snapshot state">"last modified"</a></code>] to |last modified|.
+
+1. Let |name| be the file name of |native file|, converted to a string in a
+  user agent defined manner.
+1. Let |content type| be the mime type of |native file| (as a lowercase ASCII string),

nit: capitalize MIME

> +<div algorithm="create a file">
+To <dfn export>create a file backed {{File}} object</dfn> for a given |native file|,
+run these steps:
+
+1. Let |snapshot state| be an empty [=map=].
+1. Set |snapshot state|[<code><a for="file blob snapshot state">"file"</a></code>] to |native file|.
+1. Let |last modified| be the last time |native file| was modified,
+  as the number of milliseconds since the [=Unix Epoch=].
+  If this can't be determined, set |last modified| to the current date and time
+    represented as the number of milliseconds since the [=Unix Epoch=].
+1. Set |snapshot state|[<code><a for="file blob snapshot state">"last modified"</a></code>] to |last modified|.
+
+1. Let |name| be the file name of |native file|, converted to a string in a
+  user agent defined manner.
+1. Let |content type| be the mime type of |native file| (as a lowercase ASCII string),
+  derived from |name| in a user agent defined manner, or an empty string if no type

Are UAs allowed to determine MIME type in other ways, e.g. from content sniffing, extended attributes, etc ?

i.e. does the "derived from _name_" need to be part of the steps?

> +  If this can't be determined, set |last modified| to the current date and time
+    represented as the number of milliseconds since the [=Unix Epoch=].
+1. Set |snapshot state|[<code><a for="file blob snapshot state">"last modified"</a></code>] to |last modified|.
+
+1. Let |name| be the file name of |native file|, converted to a string in a
+  user agent defined manner.
+1. Let |content type| be the mime type of |native file| (as a lowercase ASCII string),
+  derived from |name| in a user agent defined manner, or an empty string if no type
+  could be determined, taking into account the following <dfn export>file type guidelines</dfn>:
+
+  * User agents must return the {{Blob/type}} as an ASCII-encoded string in lower case,
+    such that when it is converted to a corresponding byte sequence,
+    it is a <a>parsable MIME type</a>,
+    or the empty string &ndash; 0 bytes &ndash; if the type cannot be determined.
+  * When the file is of type <code>text/plain</code>
+    user agents must NOT append a charset parameter to the <i>dictionary of parameters</i> portion of the media type [[!MIMESNIFF]].

Inconsistent capitalization of NOT / not (in next bullet)

> +    represented as the number of milliseconds since the [=Unix Epoch=].
+1. Set |snapshot state|[<code><a for="file blob snapshot state">"last modified"</a></code>] to |last modified|.
+
+1. Let |name| be the file name of |native file|, converted to a string in a
+  user agent defined manner.
+1. Let |content type| be the mime type of |native file| (as a lowercase ASCII string),
+  derived from |name| in a user agent defined manner, or an empty string if no type
+  could be determined, taking into account the following <dfn export>file type guidelines</dfn>:
+
+  * User agents must return the {{Blob/type}} as an ASCII-encoded string in lower case,
+    such that when it is converted to a corresponding byte sequence,
+    it is a <a>parsable MIME type</a>,
+    or the empty string &ndash; 0 bytes &ndash; if the type cannot be determined.
+  * When the file is of type <code>text/plain</code>
+    user agents must NOT append a charset parameter to the <i>dictionary of parameters</i> portion of the media type [[!MIMESNIFF]].
+  * User agents must not attempt heuristic determination of encoding,

Since encoding is not specified for `text/plain` (per previous), this can't apply in that case. Should this be scoped to other text/* types?

>    replacing any "/" character (U+002F SOLIDUS) with a ":" (U+003A COLON).
 
   Note: Underlying OS filesystems use differing conventions for file name;
-  with constructed files, mandating UTF-16 lessens ambiquity when file names are converted to <a>byte</a> sequences.
-
-3. Process {{FilePropertyBag}} dictionary argument by running the following substeps:
-
-  1. If the {{BlobPropertyBag/type}} member is provided and is not the empty string,
-    let |t| be set to the {{BlobPropertyBag/type}} dictionary member.
-    If |t| contains any characters outside the range U+0020 to U+007E,
-    then set |t| to the empty string and return from these substeps.
-  2. Convert every character in |t| to [=ASCII lowercase=].
-  3. If the {{FilePropertyBag/lastModified}} member is provided,
-    let |d| be set to the  {{FilePropertyBag/lastModified}} dictionary member.
-    If it is not provided,
-    set |d| to the current date and time
+  with constructed files, mandating UTF-16 lessens ambiquity when file names are converted to <a>byte sequences</a>.

spelling: should be "ambiguity"

>  
-2. Let |n| be a new string of the same size as the {{fileName}} argument to the constructor.
-  Copy every character from {{fileName}} to |n|,
+1. Let |n| be a new string of the same size as |fileName|.
+1. Copy every character from |fileName| to |n|,

Not new: use "code point" not "character" here and elsewhere. 

> -
-3. Process {{FilePropertyBag}} dictionary argument by running the following substeps:
-
-  1. If the {{BlobPropertyBag/type}} member is provided and is not the empty string,
-    let |t| be set to the {{BlobPropertyBag/type}} dictionary member.
-    If |t| contains any characters outside the range U+0020 to U+007E,
-    then set |t| to the empty string and return from these substeps.
-  2. Convert every character in |t| to [=ASCII lowercase=].
-  3. If the {{FilePropertyBag/lastModified}} member is provided,
-    let |d| be set to the  {{FilePropertyBag/lastModified}} dictionary member.
-    If it is not provided,
-    set |d| to the current date and time
+  with constructed files, mandating UTF-16 lessens ambiquity when file names are converted to <a>byte sequences</a>.
+
+1. If |options|.{{FilePropertyBag/lastModified}} member is provided:
+  1. Let |d| be |options|.{{FilePropertyBag/lastModified}} dictionary member.

consistency: "member" or "dictionary member"

> -
-  1. If the {{BlobPropertyBag/type}} member is provided and is not the empty string,
-    let |t| be set to the {{BlobPropertyBag/type}} dictionary member.
-    If |t| contains any characters outside the range U+0020 to U+007E,
-    then set |t| to the empty string and return from these substeps.
-  2. Convert every character in |t| to [=ASCII lowercase=].
-  3. If the {{FilePropertyBag/lastModified}} member is provided,
-    let |d| be set to the  {{FilePropertyBag/lastModified}} dictionary member.
-    If it is not provided,
-    set |d| to the current date and time
+  with constructed files, mandating UTF-16 lessens ambiquity when file names are converted to <a>byte sequences</a>.
+
+1. If |options|.{{FilePropertyBag/lastModified}} member is provided:
+  1. Let |d| be |options|.{{FilePropertyBag/lastModified}} dictionary member.
+1. Otherwise:
+  1. Let |d| be the current date and time

Current date / time (as millis since epoch) comes up several times in the spec. Factor it out into a definition. 

> -
-  1. If the {{BlobPropertyBag/type}} member is provided and is not the empty string,
-    let |t| be set to the {{BlobPropertyBag/type}} dictionary member.
-    If |t| contains any characters outside the range U+0020 to U+007E,
-    then set |t| to the empty string and return from these substeps.
-  2. Convert every character in |t| to [=ASCII lowercase=].
-  3. If the {{FilePropertyBag/lastModified}} member is provided,
-    let |d| be set to the  {{FilePropertyBag/lastModified}} dictionary member.
-    If it is not provided,
-    set |d| to the current date and time
+  with constructed files, mandating UTF-16 lessens ambiquity when file names are converted to <a>byte sequences</a>.
+
+1. If |options|.{{FilePropertyBag/lastModified}} member is provided:
+  1. Let |d| be |options|.{{FilePropertyBag/lastModified}} dictionary member.
+1. Otherwise:
+  1. Let |d| be the current date and time
     represented as the number of milliseconds since the <a>Unix Epoch</a>
     (which is the equivalent of <code>Date.now()</code> [[ECMA-262]]).
 
     Note: Since ECMA-262 {{Date}} objects convert to <code>long long</code> values

Maybe move this into the domintro ?

> -
-  1. If the {{BlobPropertyBag/type}} member is provided and is not the empty string,
-    let |t| be set to the {{BlobPropertyBag/type}} dictionary member.
-    If |t| contains any characters outside the range U+0020 to U+007E,
-    then set |t| to the empty string and return from these substeps.
-  2. Convert every character in |t| to [=ASCII lowercase=].
-  3. If the {{FilePropertyBag/lastModified}} member is provided,
-    let |d| be set to the  {{FilePropertyBag/lastModified}} dictionary member.
-    If it is not provided,
-    set |d| to the current date and time
+  with constructed files, mandating UTF-16 lessens ambiquity when file names are converted to <a>byte sequences</a>.
+
+1. If |options|.{{FilePropertyBag/lastModified}} member is provided:
+  1. Let |d| be |options|.{{FilePropertyBag/lastModified}} dictionary member.
+1. Otherwise:
+  1. Let |d| be the current date and time
     represented as the number of milliseconds since the <a>Unix Epoch</a>
     (which is the equivalent of <code>Date.now()</code> [[ECMA-262]]).
 
     Note: Since ECMA-262 {{Date}} objects convert to <code>long long</code> values

I guess File constructor doesn't have a domintro yet. Add one? Blob too?

That'd be a good place to call out (in giant blinking text...) that the first argument is a array. :)

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/w3c/FileAPI/pull/154#pullrequestreview-419541114

Received on Wednesday, 27 May 2020 23:29:22 UTC