Re: Event Updated: RDF Canonicalization and Hash Working Group


On 02/03/2023 02:13, Gregg Kellogg wrote:
>> On Mar 1, 2023, at 12:29 AM, Ivan Herman <ivan@w3.org> wrote:
>>
>> Hi Gregg,
>>
>> all this worries me. I do not want to find the RCH work suspended 
>> unnecessarily on this issue, which is not really ours. As you say, 
>> the details of the nquads canonicalization doe not affect the 
>> algorithm, and the goal of this WG is to standardize that one and 
>> nothing else.
>
> I don’t see any technical barriers to getting the canonicalization 
> work done, once the administrative issues are cleared. I suspect 
> RDF-star (and thus N-Quads) can do an FPWD soon, although moving to 
> CR, PR, and REC could be delayed due to the complexities of 
> standardizing quoted triples and their semantics. The direction of 
> “text direction” could also slow things down; both of those should be 
> ‘at risk”. In any case, I think that canonicalizing datasets including 
> quoted triples should probably require a transformation to a reified 
> form, and that may be true for language-tagged strings having a text 
> direction as well.
>
>> The reason this is the problem because, if we are not careful, we may 
>> find ourselves hooked on the CR phase unnecessarily. Any change on 
>> the nquads canonicalization will have to spread to various RDF 
>> frameworks/libraries out there, and that will take some time. On the 
>> other hand, the implementation of URDNA usually relies on such 
>> general frameworks. I do not want to find ourselves in a state 
>> whereby the URDNA implementer would have to re-implement the nquads 
>> serialization along the line of the new RDF specification to pass all 
>> the tests.

Well, I don't know about others, by my URDNA implementation /does/ 
implement its own n-quads serializer, because

- the "regular" serializer I have does not have an option to generate 
/canonical/ n-quads, and
- even if it did, I would not trust it, as canonical n-triples/n-quads 
is currently under-specified (as Gregg points out below);
- anyway, writing such a serializer is very simple...

>
> Existing N-Triples canonicalization is under-specified, and I believe 
> that implementations likely already vary in the representations of 
> control characters in strings, we just don’t know about it. Any 
> tightening of this will likely affect existing implementations.
>
>> What I would propose this WG could do is to look at all the tests 
>> which could be affected by any of those proposed changes and either 
>> remove them or make them optional tests. These tests would not affect 
>> or main goal of the CR testing, namely to prove that the URDNA 
>> algorithm, and its textual specification, is correct and 
>> interoperable (which is the real goal of the CR phase); as a 
>> consequence, these tests should not stand in a way of passing to 
>> Proposed Rec when the time comes.
>
> My investigation shows just test060 as being affected, as it’s the 
> only one which tries to test the character ranges. Dave’s recent 
> update stresses this further, and may show up other variations. That 
> aside, there’s the basic security consideration of having strings 
> including unescaped control characters, as if presented to a user, 
> these could be misleading as it is now.

Are you referring to something like this: 
https://www.securityweek.com/trojan-source-attack-abuses-unicode-inject-vulnerabilities-code/ 
?

I would argue that this is not really an issue for canonical N-Quads, 
which is not a programming language, and not really meant to be read by 
humans (whenever I need to "read" RDF, I convert it to Turtle or Trig 
before...).

>
>> I realize that the RDF changes may and will affect the URDNA 
>> deployment as well, and that also worries me. But our first 
>> obligation is to correctly finalize the standard URDNA specification...
>
> One possible outcome would be to leave the exiting N-Triples 
> canonicalization unchanged (other than the ambiguity of xsd:string) 
> and simply extend to N-Quads as would be expected. The security 
> considerations of using unescaped strings in these representations 
> could simply be noted. I think that is a less-perfect form, but 
> deployment and dependency considerations may be more important. My 
> guess is that the presence of unescaped in existing data subject to 
> canonicalization is pretty minimal, and the changes in escaping likely 
> have little practical impact other than in tests.

Depends on what characters you require to escape... If that's any 
non-ascii character, I would argue that this could have huge impact 
(consider someone's family name in a VC, in countries using accented or 
non-latin characters...).

   pa

>
> Gregg
>
>> Ivan
>>
>> ----
>> Ivan Herman, W3C
>> Home: http://www.w3.org/People/Ivan/

>> mobile: +33 6 52 46 00 43
>> ORCID ID: https://orcid.org/0000-0003-0782-2704

>> On 1 Mar 2023 at 00:13 +0100, Gregg Kellogg <gregg@greggkellogg.net>, 
>> wrote:
>>> We might want to sped a little time on N-Quads Canonicalization. 
>>> There’s an open pull request in the N-Quads repo [1] that’s facing 
>>> some short-term hurdles due to questions of procedure and RDF-star 
>>> WG charter. There is also an issue on reconsidering how characters 
>>> are escaped [2], which would affect existing test results, 
>>> particularly if simple literals are always serialized with the 
>>> xsd:string datatype. While they don’t affect the canonicalization 
>>> algorithm, they may affect the hashes produced for quads containing 
>>> literals. The RDF-star WG will need some feedback from the RCH WG, 
>>> and whatever is needed to clear any charter issues. (Note, it is an 
>>> open erratum against N-Quads [3][4], which is in scope, but the 
>>> extent of changes to be considered needs clarification).
>>>
>>> Gregg Kellogg
>>> gregg@greggkellogg.net
>>>
>>> [1] https://github.com/w3c/rdf-n-quads/pull/17

>>> [2] https://github.com/w3c/rdf-n-quads/issues/16

>>> [3] https://www.w3.org/2001/sw/wiki/RDF1.1_Errata#erratum_32

>>> [4] https://www.w3.org/2001/sw/wiki/RDF1.1_Errata#erratum_33

>>>
>>>> On Feb 21, 2023, at 2:21 AM, Phil Archer (W3C Calendar) 
>>>> <noreply+calendar@w3.org> wrote:
>>>>
>>>> View this event in your browser 
>>>> <https://www.w3.org/events/meetings/15ef939f-4654-4541-959d-51ba50b4d022/20230301T100000>
>>>>  
>>>>
>>>>  
>>>>
>>>>
>>>>   RDF Canonicalization and Hash Working Group ^Upcoming ^Confirmed
>>>>
>>>> 01 March 2023, 10:00 -10:55 America/New_York
>>>>
>>>> Event is recurring every other week on Wednesday, starting from 
>>>> 2023-02-01, until 2024-07-17
>>>>
>>>> RDF Dataset Canonicalization and Hash Working Group 
>>>> <https://www.w3.org/groups/wg/rch/calendar>
>>>>
>>>> Bi-weekly meeting of the RCH group, back after a short break.
>>>>
>>>>
>>>>     Agenda
>>>>
>>>>  1. Scribe list (most recent first) Gregg, pchampin, DLongley,
>>>>     Ahmad, PhilA, AndyS, Manu)
>>>>  2. New introductions
>>>>  3. Round the room including any comments from VCWG F2F in Miami
>>>>  4. Canon Issue 4 <https://github.com/w3c/rdf-canon/issues/4> (What
>>>>     is the output)
>>>>  5. Hash Issue 2 <https://github.com/w3c/rch-rdh/issues/2>
>>>>  6. Issue bashing
>>>>     <https://github.com/w3c/rdf-canon/issues?q=is%3Aissue+is%3Aopen+sort%3Aupdated-asc>
>>>>
>>>>
>>>>     Joining Instructions
>>>>
>>>> Instructions are restricted to meeting participants. You need to 
>>>> log in 
>>>> <https://auth.w3.org/?url=https%3A%2F%2Fwww.w3.org%2Fevents%2Fmeetings%2F15ef939f-4654-4541-959d-51ba50b4d022%2F20230301T100000%2Fedit> 
>>>> to see them.
>>>>
>>>>
>>>>     Participants
>>>>
>>>>
>>>>       Organizers
>>>>
>>>>   * Phil Archer
>>>>   * Markus Sabadello
>>>>
>>>>
>>>>       Groups
>>>>
>>>>   * RDF Dataset Canonicalization and Hash Working Group
>>>>     <https://www.w3.org/groups/wg/rch> (View Calendar
>>>>     <https://www.w3.org/groups/wg/rch/calendar>)
>>>>
>>>> Report feedback and issues on GitHub 
>>>> <https://github.com/w3c/calendar>.
>>>>
>>>> <event.ics>
>>>
>

Received on Friday, 3 March 2023 13:00:50 UTC