- From: Pierre-Antoine Champin <pierre-antoine@w3.org>
- Date: Fri, 3 Mar 2023 14:00:46 +0100
- To: Gregg Kellogg <gregg@greggkellogg.net>, Ivan Herman <ivan@w3.org>
- Cc: "Phil Archer (W3C Calendar)" <noreply+calendar@w3.org>, RDF Dataset Canonicalization and Hash Working Group <public-rch-wg@w3.org>
- Message-ID: <19429c8f-2b33-be40-d81a-88a3a367818d@w3.org>
On 02/03/2023 02:13, Gregg Kellogg wrote: >> On Mar 1, 2023, at 12:29 AM, Ivan Herman <ivan@w3.org> wrote: >> >> Hi Gregg, >> >> all this worries me. I do not want to find the RCH work suspended >> unnecessarily on this issue, which is not really ours. As you say, >> the details of the nquads canonicalization doe not affect the >> algorithm, and the goal of this WG is to standardize that one and >> nothing else. > > I don’t see any technical barriers to getting the canonicalization > work done, once the administrative issues are cleared. I suspect > RDF-star (and thus N-Quads) can do an FPWD soon, although moving to > CR, PR, and REC could be delayed due to the complexities of > standardizing quoted triples and their semantics. The direction of > “text direction” could also slow things down; both of those should be > ‘at risk”. In any case, I think that canonicalizing datasets including > quoted triples should probably require a transformation to a reified > form, and that may be true for language-tagged strings having a text > direction as well. > >> The reason this is the problem because, if we are not careful, we may >> find ourselves hooked on the CR phase unnecessarily. Any change on >> the nquads canonicalization will have to spread to various RDF >> frameworks/libraries out there, and that will take some time. On the >> other hand, the implementation of URDNA usually relies on such >> general frameworks. I do not want to find ourselves in a state >> whereby the URDNA implementer would have to re-implement the nquads >> serialization along the line of the new RDF specification to pass all >> the tests. Well, I don't know about others, by my URDNA implementation /does/ implement its own n-quads serializer, because - the "regular" serializer I have does not have an option to generate /canonical/ n-quads, and - even if it did, I would not trust it, as canonical n-triples/n-quads is currently under-specified (as Gregg points out below); - anyway, writing such a serializer is very simple... > > Existing N-Triples canonicalization is under-specified, and I believe > that implementations likely already vary in the representations of > control characters in strings, we just don’t know about it. Any > tightening of this will likely affect existing implementations. > >> What I would propose this WG could do is to look at all the tests >> which could be affected by any of those proposed changes and either >> remove them or make them optional tests. These tests would not affect >> or main goal of the CR testing, namely to prove that the URDNA >> algorithm, and its textual specification, is correct and >> interoperable (which is the real goal of the CR phase); as a >> consequence, these tests should not stand in a way of passing to >> Proposed Rec when the time comes. > > My investigation shows just test060 as being affected, as it’s the > only one which tries to test the character ranges. Dave’s recent > update stresses this further, and may show up other variations. That > aside, there’s the basic security consideration of having strings > including unescaped control characters, as if presented to a user, > these could be misleading as it is now. Are you referring to something like this: https://www.securityweek.com/trojan-source-attack-abuses-unicode-inject-vulnerabilities-code/ ? I would argue that this is not really an issue for canonical N-Quads, which is not a programming language, and not really meant to be read by humans (whenever I need to "read" RDF, I convert it to Turtle or Trig before...). > >> I realize that the RDF changes may and will affect the URDNA >> deployment as well, and that also worries me. But our first >> obligation is to correctly finalize the standard URDNA specification... > > One possible outcome would be to leave the exiting N-Triples > canonicalization unchanged (other than the ambiguity of xsd:string) > and simply extend to N-Quads as would be expected. The security > considerations of using unescaped strings in these representations > could simply be noted. I think that is a less-perfect form, but > deployment and dependency considerations may be more important. My > guess is that the presence of unescaped in existing data subject to > canonicalization is pretty minimal, and the changes in escaping likely > have little practical impact other than in tests. Depends on what characters you require to escape... If that's any non-ascii character, I would argue that this could have huge impact (consider someone's family name in a VC, in countries using accented or non-latin characters...). pa > > Gregg > >> Ivan >> >> ---- >> Ivan Herman, W3C >> Home: http://www.w3.org/People/Ivan/ >> mobile: +33 6 52 46 00 43 >> ORCID ID: https://orcid.org/0000-0003-0782-2704 >> On 1 Mar 2023 at 00:13 +0100, Gregg Kellogg <gregg@greggkellogg.net>, >> wrote: >>> We might want to sped a little time on N-Quads Canonicalization. >>> There’s an open pull request in the N-Quads repo [1] that’s facing >>> some short-term hurdles due to questions of procedure and RDF-star >>> WG charter. There is also an issue on reconsidering how characters >>> are escaped [2], which would affect existing test results, >>> particularly if simple literals are always serialized with the >>> xsd:string datatype. While they don’t affect the canonicalization >>> algorithm, they may affect the hashes produced for quads containing >>> literals. The RDF-star WG will need some feedback from the RCH WG, >>> and whatever is needed to clear any charter issues. (Note, it is an >>> open erratum against N-Quads [3][4], which is in scope, but the >>> extent of changes to be considered needs clarification). >>> >>> Gregg Kellogg >>> gregg@greggkellogg.net >>> >>> [1] https://github.com/w3c/rdf-n-quads/pull/17 >>> [2] https://github.com/w3c/rdf-n-quads/issues/16 >>> [3] https://www.w3.org/2001/sw/wiki/RDF1.1_Errata#erratum_32 >>> [4] https://www.w3.org/2001/sw/wiki/RDF1.1_Errata#erratum_33 >>> >>>> On Feb 21, 2023, at 2:21 AM, Phil Archer (W3C Calendar) >>>> <noreply+calendar@w3.org> wrote: >>>> >>>> View this event in your browser >>>> <https://www.w3.org/events/meetings/15ef939f-4654-4541-959d-51ba50b4d022/20230301T100000> >>>> >>>> >>>> >>>> >>>> >>>> RDF Canonicalization and Hash Working Group ^Upcoming ^Confirmed >>>> >>>> 01 March 2023, 10:00 -10:55 America/New_York >>>> >>>> Event is recurring every other week on Wednesday, starting from >>>> 2023-02-01, until 2024-07-17 >>>> >>>> RDF Dataset Canonicalization and Hash Working Group >>>> <https://www.w3.org/groups/wg/rch/calendar> >>>> >>>> Bi-weekly meeting of the RCH group, back after a short break. >>>> >>>> >>>> Agenda >>>> >>>> 1. Scribe list (most recent first) Gregg, pchampin, DLongley, >>>> Ahmad, PhilA, AndyS, Manu) >>>> 2. New introductions >>>> 3. Round the room including any comments from VCWG F2F in Miami >>>> 4. Canon Issue 4 <https://github.com/w3c/rdf-canon/issues/4> (What >>>> is the output) >>>> 5. Hash Issue 2 <https://github.com/w3c/rch-rdh/issues/2> >>>> 6. Issue bashing >>>> <https://github.com/w3c/rdf-canon/issues?q=is%3Aissue+is%3Aopen+sort%3Aupdated-asc> >>>> >>>> >>>> Joining Instructions >>>> >>>> Instructions are restricted to meeting participants. You need to >>>> log in >>>> <https://auth.w3.org/?url=https%3A%2F%2Fwww.w3.org%2Fevents%2Fmeetings%2F15ef939f-4654-4541-959d-51ba50b4d022%2F20230301T100000%2Fedit> >>>> to see them. >>>> >>>> >>>> Participants >>>> >>>> >>>> Organizers >>>> >>>> * Phil Archer >>>> * Markus Sabadello >>>> >>>> >>>> Groups >>>> >>>> * RDF Dataset Canonicalization and Hash Working Group >>>> <https://www.w3.org/groups/wg/rch> (View Calendar >>>> <https://www.w3.org/groups/wg/rch/calendar>) >>>> >>>> Report feedback and issues on GitHub >>>> <https://github.com/w3c/calendar>. >>>> >>>> <event.ics> >>> >
Attachments
- application/pgp-keys attachment: OpenPGP public key
Received on Friday, 3 March 2023 13:00:50 UTC