- From: Norm Tovey-Walsh <norm@saxonica.com>
- Date: Tue, 11 Jul 2023 17:28:50 +0100
- To: public-xslt-40@w3.org
- CC: Matthew Patterson <matt@saxonica.com>
- Message-ID: <m2o7kiv2a7.fsf@saxonica.com>
Hello folks,
Here are the minutes. Find also attached, a copy of Matt’s slide deck.
https://qt4cg.org/meeting/minutes/2023/07-11.html
QT4 CG Meeting 041 Minutes 2023-07-11
Table of Contents
* [1]Draft Minutes
* [2]Summary of new and continuing actions [0/5]
* [3]1. Administrivia
+ [4]1.1. Roll call [11/11]
+ [5]1.2. Accept the agenda
o [6]1.2.1. Status so far...
+ [7]1.3. Approve minutes of the previous meeting
+ [8]1.4. Next meeting
+ [9]1.5. Review of open action items [1/6]
+ [10]1.6. Review of open pull requests and issues
* [11]2. Technical Agenda
+ [12]2.1. PR #533: 413: Spec for CSV parsing with
fn:parse-csv()
* [13]3. Any other business?
* [14]4. Adjourned
[15]Agenda index / [16]QT4CG.org / [17]Dashboard / [18]GH Issues /
[19]GH Pull Requests
Draft Minutes
Summary of new and continuing actions [0/5]
* [ ] QT4CG-002-10: BTW to coordinate some ideas about improving
diversity in the group
* [ ] QT4CG-016-08: RD to clarify how namespace comparisons are
performed.
* [ ] QT4CG-026-01: MK to write a summary paper that outlines the
decisions we need to make on "value sequences"
+ This is related to PR #368: Issue 129 - Context item
generalized to context value and subsequent discussion.
* [ ] QT4CG-029-07: NW to open the next discussion of #397 with a
demo from DN See PR [20]#449
* [ ] QT4CG-039-01: NW to schedule discussion of issue [21]#52, Allow
record(*) based RecordTests
1. Administrivia
1.1. Roll call [11/11]
* [X] Reece Dunn (RD)
* [X] Sasha Firsov (SF)
* [X] Christian Gr¸n (CG)
* [X] Joel Kalvesmaki (JK) [0:05-]
* [X] Michael Kay (MK)
* [X] John Lumley (JL)
* [X] Dimitre Novatchev (DN)
* [X] Ed Porter (EP)
* [X] C. M. Sperberg-McQueen (MSM)
* [X] Norm Tovey-Walsh (NW). Scribe. Chair.
* [X] Matt Patterson (MP)
1.2. Accept the agenda
Proposal: Accept [22]the agenda.
Accepted.
1.2.1. Status so far...
issues-open-2023-07-11.png
Figure 1: "Burn down" chart on open issues
issues-by-spec-2023-07-11.png
Figure 2: Open issues by specification
issues-by-type-2023-07-11.png
Figure 3: Open issues by type
1.3. Approve minutes of the previous meeting
Proposal: Accept [23]the minutes of the previous meeting.
Accepted.
1.4. Next meeting
The next meeting [24]is scheduled for Tuesday, 18 July 2023.
No regrets heard.
Reminder: the CG will take a vacation for four weeks in August. We will
not meet on 1, 8, 15, or 22 August.
1.5. Review of open action items [1/6]
* [ ] QT4CG-002-10: BTW to coordinate some ideas about improving
diversity in the group
* [ ] QT4CG-016-08: RD to clarify how namespace comparisons are
performed.
* [ ] QT4CG-026-01: MK to write a summary paper that outlines the
decisions we need to make on "value sequences"
+ This is related to PR #368: Issue 129 - Context item
generalized to context value and subsequent discussion.
* [X] QT4CG-029-01: RD+DN to draft spec prose for the "divide and
conquer" approach outlined in issue #399
+ Overtaken by events.
* [ ] QT4CG-029-07: NW to open the next discussion of #397 with a
demo from DN See PR [25]#449
* [ ] QT4CG-039-01: NW to schedule discussion of issue [26]#52, Allow
record(*) based RecordTests
1.6. Review of open pull requests and issues
The following editorial or otherwise minor PRs were open when this
agenda was prepared. The chair proposes that these can be merged
without discussion.
* PR [27]#597 : Editorial fixes from #566 (fn:parse-uri)
+ Check for technical comments from CG
* PR [28]#595 : 588: (Editorial, XSLT) minor clarifications regarding
xsl:sort
* PR [29]#594 : 592: (XSLT, Editorial) Add missing description of
exponent-separator
* PR [30]#593 : 591: [XSLT, editorial] Add defaults to XSLT element
syntax summaries
* PR [31]#590 : 343: make $collation uniformly optional
* PR [32]#587 : 365: Allow braces in switch and typeswitch
expressions
* PR [33]#586 : 585: [Editorial] Rearrange text (and grammar) for
dynamic function calls
* PR [34]#584 : Editorial: Correction to map:filter examples
* PR [35]#578 : fn:format-integer: $lang -> $language
* PR [36]#577 : Editorial: improve generator for keyword tests
* PR [37]#555 : 464: Revised narrative of normalization steps for
serialization
* PR [38]#547 : Action QT4CG-036-02: Further elaboration of the rules
for function identity
After discussion, #598 removed.
Proposal: Accept these PRs.
Accepted.
It has been proposed that the following issues be [39]closed without
action.
* Issue [40]#539 FLOWR where clause with a "do when false" option
Proposal: Close these issues.
Accepted.
2. Technical Agenda
2.1. PR #533: 413: Spec for CSV parsing with fn:parse-csv()
* See PR [41]#533
* MP introduces the changes proposed with a slide deck
+ ... (Walks through slide deck)
* RD: Why is there only a record for the top level?
+ MP: So it fits on a single slide; also I have questions about
how to define nested records. Also, I have some questions
about where record types are shared.
* MP continues...
* MSM: Trim trims only leading and trailing whitespace, I assume?
* MP: Yes.
* MP continues...
+ ... Extract column names from the first row: boolean or a map
from integer to string to specify headers for the columns.
+ ... There's an option to filter columns.
+ ... You can specify that the number of columns can be fixed.
They're padded or truncated.
* MSM: If I say nothing?
* MP: Then you get what you get?
* JL: Is there an argument for filter rows?
* MP: There isn't, and I haven't thought of a use for it beyond
removing say the first "n" rows. You probably want to evaluate each
row programmatically. Columns are relatively fixed, unlike rows.
* JL: I might just want to test on the first 25 or 40 rows. Some
mechanism that allowed me to truncate parsing might be handy.
* MP: Yes, I think one of the reasons for using a sequence of rows is
that it's easier to generate lazily. And we have a large number of
good tools for extracting "n" rows from a sequence.
* DN: Whenever I see arguments for options, my question is always, is
this a mandatory argument? If it's not, what are the defaults?
* MP: The default is to extract column names from the first row, to
not filter columns, and not to restrict the number of columns.
* MSM: I'd like a way to specify the default behaviors explicitly.
* MP: I'm not sure I have the notation correct, but that's what
you're supposed to be able to do here.
Some discussion of the possible details around specifying defaults,
with enumerated values for example. Whether a keyword is necessary or
if an empty sequence suffices is something of an open question.
* DN: I would like to see exactly these cases in examples in the
spec.
* MP: Yes, exactly.
* MP continues...
+ You can supply column names reliably even if the data doesn't
include them.
* JL: I think it's important that if the boolean in false, the first
row becomes a header row. That needs to be explicit.
* MP: Yes.
* MP continues...
+ filter-columns and number-of-columns ...
+ MP discusses the example on the slide titled "Using
csv-to-xdm()'s response".
+ ... I have questions about how best to deal with namespaces
and cross references.
* JL: The rows are all siblings of each other, but their position
isn't the same as the row position. Having a rows wrapper would
make it more straightforward.
* MP: That makes sense.
* EP: You can supply a boolean or a map. Can you override the
headers? So you want to specify "true" but also specify your own
set.
* MP: Yes. I'm not sure. I think there's an argument that you can
handle that the same way you'd handle the not uncommon case where
there are several rows of header-like data. But maybe there needs
to be another option...
* MSM: I like the idea of saying just apply tail to the sequence of
rows in that case.
* EP: Yes, that would work. I was just pointing out that the way the
option is specified, you can't do both.
* MP continues with csv-to-xml()
+ In a namespace?
+ RD: I wonder if it should use the fn: namespace to be similar
to how analyze-string works.
Some discussion of how this compares to JSON. Consensus: there's a
clear precedent, use the `fn:` namespace.
* MP: The last question is about how to map between fields and column
headers. Either you have id/ref pairs, or you can use the column.
* JK: Why can't we just rely on position?
* MP: You could rely on positionality, but if you have a CSV with 50
or 60 columns and you want the ones with the "name" and the
"amount" then names are better than "columns 2 and 35".
* NW: My preference is the id/ref version.
* MSM: I don't understand why. My gut reaction is "what I'm used to
and what I'm happy with is to have the column names used as element
names." That makes processing the result feel a lot more
convenient.
Some discussion of whether or not column names are likely to contain
strings that don't match cleanly to attribute or element names.
* MP: The other argument is that if you have large, long column names
then you're adding a lot more data into each row.
* MSM: I think relying on position would make sense if people are
worried about data size. The added indirection of having to keep a
table and have a lookup the name from the ID doesn't appeal to me.
* MK: (in the chat window) I think the id/idref approach is an
unnecessary extra level of indirection.
* MP: My goto would be to work with the XDM directly, so maybe my
opinion isn't as relevant.
* MK: I also think if you're worried about space, the number of
attributes is probably more significant than the length of them.
* MP continues with "~fn:parse-csv()~ output"
+ It handles quoting and delimiters. You can build anything you
want from that without having to reimplement the parsing
constraints.
* JL: Isn't there an argument that this one says gives you the header
rows?
* MP: Yes.
* JL: Then the example could be clearer.
* MP: Yes.
* RD: Given that fn:parse-csv is now simple, would it make sense to
have the inverse, "serialize-csv"?
* MP: Yes, I'm hoping to add that. My rough thinking is that you want
a function the generates the field values with quoting and the
rows.
+ ... The record on the "Input options" slide is what you'd had
to these functions.
+ All the information you'd need to generate them is in there.
* MP continues with " fn:parse-csv data input"
+ The problem with unparsed-text-lines is that it strips the
line endings. We can't be sure there's a 1:1 correspondence
between a row and a line in a file.
* CG: We have parse-json and json-doc, maybe it would be reasonable
to have parse-csv and csv-doc for that purpose?
* MP: Yep.
* MK: (in the chat window) Can't we just let the optimizer cope with
streaming the combination of unparsed-text() => parse-csv() ?
* MK: Maybe. I don't know.
* MSM: I think I understand what MP is driving at, but I'm a little
confused by some details.
+ If I'm understanding correctly, in the simple case, the lines
of the CSV file and the records in the records are 1:1, but
that's not always the case.
* MP: Yes.
* MSM: And the case in which that's not true is the case where there
may be multiple lines. It's 1:n not m:1. Right?
* MP: Yes.
* MSM: So if we want unparsed-text-lines() to be usable this way, we
have to be able to specify that you can begin a multi-line quote in
one string and finish it later.
Some discussion of the problems associated with multi-line fields. If
the line ending is stripped away by the uparsed-text-lines() function,
then you'll loose information. It might be important that the embedded
line ending was CR/LF and not just LF.
* MSM: I'm willing to say that is a corner case that may arise and
when it does, you'll want to parse it yourself.
* MP: There's a larger question of dealing with error handling.
* JL: We know that parse-csv() is doing something internally that is
like unparsed-text-lines(). So you don't gain anything by using
unparsed-text-lines().
* MSM: I'm guessing about what the JSON parsing functions do.
* RD: It would be useful to add these corner cases as tests in the
test suite.
3. Any other business?
None heard.
4. Adjourned
References
1. https://qt4cg.org/meeting/minutes/2023/07-11.html#minutes
2. https://qt4cg.org/meeting/minutes/2023/07-11.html#new-actions
3. https://qt4cg.org/meeting/minutes/2023/07-11.html#administrivia
4. https://qt4cg.org/meeting/minutes/2023/07-11.html#roll-call
5. https://qt4cg.org/meeting/minutes/2023/07-11.html#agenda
6. https://qt4cg.org/meeting/minutes/2023/07-11.html#so-far
7. https://qt4cg.org/meeting/minutes/2023/07-11.html#approve-minutes
8. https://qt4cg.org/meeting/minutes/2023/07-11.html#next-meeting
9. https://qt4cg.org/meeting/minutes/2023/07-11.html#open-actions
10. https://qt4cg.org/meeting/minutes/2023/07-11.html#open-pull-requests
11. https://qt4cg.org/meeting/minutes/2023/07-11.html#technical-agenda
12. https://qt4cg.org/meeting/minutes/2023/07-11.html#pr-533
13. https://qt4cg.org/meeting/minutes/2023/07-11.html#any-other-business
14. https://qt4cg.org/meeting/minutes/2023/07-11.html#adjourned
15. https://qt4cg.org/meeting/minutes/
16. https://qt4cg.org/
17. https://qt4cg.org/dashboard
18. https://github.com/qt4cg/qtspecs/issues
19. https://github.com/qt4cg/qtspecs/pulls
20. https://qt4cg.org/dashboard/#pr-449
21. https://github.com/qt4cg/qtspecs/issues/52
22. https://qt4cg.org/meeting/agenda/2023/07-11.html
23. https://qt4cg.org/meeting/minutes/2023/07-11.html
24. https://qt4cg.org/meeting/agenda/2023/07-18.html
25. https://qt4cg.org/dashboard/#pr-449
26. https://github.com/qt4cg/qtspecs/issues/52
27. https://qt4cg.org/dashboard/#pr-597
28. https://qt4cg.org/dashboard/#pr-595
29. https://qt4cg.org/dashboard/#pr-594
30. https://qt4cg.org/dashboard/#pr-593
31. https://qt4cg.org/dashboard/#pr-590
32. https://qt4cg.org/dashboard/#pr-587
33. https://qt4cg.org/dashboard/#pr-586
34. https://qt4cg.org/dashboard/#pr-584
35. https://qt4cg.org/dashboard/#pr-578
36. https://qt4cg.org/dashboard/#pr-577
37. https://qt4cg.org/dashboard/#pr-555
38. https://qt4cg.org/dashboard/#pr-547
39. https://github.com/qt4cg/qtspecs/labels/Propose%20Closing%20with%20No%20Action
40. https://github.com/qt4cg/qtspecs/issues/539
41. https://qt4cg.org/dashboard/#pr-533
Be seeing you,
norm
--
Norm Tovey-Walsh
Saxonica
Attachments
- application/pdf attachment: parse-csv-update-2023-07-11.pdf
Received on Tuesday, 11 July 2023 16:30:56 UTC