- From: Nader Helmy <creator.nader@gmail.com>
- Date: Fri, 24 Jul 2020 15:33:59 -0500
- To: Orie Steele <orie@transmute.industries>
- Cc: Leonard Rosenthol <lrosenth@adobe.com>, Manu Sporny <msporny@digitalbazaar.com>, "public-credentials@w3.org" <public-credentials@w3.org>
- Message-ID: <CAKTXcdc6Gh2fUcFqVtA_vsOFQZGDW2p=cEu-dfhpu=zqX4Qucw@mail.gmail.com>
Stepping back for a second to ask a naive/simple question: What is the relationship or the difference between this new DB spec: https://digitalbazaar.github.io/cbor-ld-spec/ And this existing (and somewhat sparse) draft spec at the W3C: https://w3c.github.io/json-ld-cbor/ Is the former spec simply an evolution of the latter? What’s the delta between these approaches? On Fri, Jul 24, 2020 at 11:57 AM Orie Steele <orie@transmute.industries> wrote: > Sorry I am late to the CBOR-LD Party! > > Very excited to have a semantic linked data format that is also usable in > a compact binary representation, and to have bi-directional > transformation out of the box... I have been playing with CBOR on the > weekends, and I have a repo here: > https://github.com/transmute-industries/decentralized-cbor/blob/master/src/__fixtures__/outputs/table.csv > > The repo compares, JSON, JSON-LD, CBOR, DAG_CBOR and ZLIB_URDNA2015_CBOR ( > another approach at compressed linked data format in CBOR)... I am eager to > add tests for CBOR-LD. > > both DAG_CBOR and CBOR-LD have some benefits over CBOR > and ZLIB_URDNA2015_CBOR and JSON.... > > Both are linked data formats where the linked data aspect is preserved at > the binary level. ZLIB_URDNA2015_CBOR is just a compressed JSON-LD object > encoded as CBOR, you cannot leverage internal semantics... in much the same > way you cannot leverage internal semantics of "Pure JSON" and "Pure > CBOR".... However, ZLIB_URDNA2015_CBOR is MUCH smaller than DAG_CBOR / > "Pure CBOR" that was built from "Pure JSON", and CBOR-LD is MUCH smaller > than ZLIB_URDNA2015_CBOR... > > Backing up for a second, one way to think about why CBOR-LD is awesome is > to consider how all software that processes data, has some opinion about > that data... sometimes these opinions are encoded in schema validation of > incoming data (using tools like JSON Schema or ProtoBuff)... If you > consider that changes to data on the wire would cause the software to > explode... you can see why agreeing to a common context, is similar to > agreeing to a data schema.... > > And by relying on an existing context to build a compressed binary > representation of a semantic object, we can leverage these "common > dictionaries / vocabularies" not just for semantic disambiguation, but also > for compression.... > > Obviously the IoT space has been waiting for something like this for a > long time... > > - https://www.w3.org/WoT/ > - > https://github.com/Azure/opendigitaltwins-dtdl/blob/master/DTDL/v2/dtdlv2..md > > We are now able to convert all these ontologies and semantic vocabularies, > into compact, interoperable, binary representations for industries that > have already committed to the semantic web: > https://github.com/semantalytics/awesome-semantic-web#ontologies > > I'm not sure of the potential internal representation benefits for > services like https://developers.google.com/knowledge-graph but > obviously, a small IOT device that only speaks CBOR-LD would not need to > crack out a JSON parser and all the attack surface associated with it, just > to talk to the knowledge graph service. > > OS > > > > On Fri, Jul 24, 2020 at 10:54 AM Leonard Rosenthol <lrosenth@adobe.com> > wrote: > >> It's not just specific schemas but also the order of the schemas, any >> other keys you add, plus additional "techniques" you add. >> >> Using your presentation as a guide: >> Slide 11: >> >> In that case you have picked a single schema, found all the items, and >> given the unique value (let's say 1-10.). Now (not shown on the slide, >> but...), I assume that you then pick another schema and start allocating >> values for it in the dictionary (eg. 11-20), and so on. At some point the >> credentials schema is updated (1.1->1.2) - but you can't update the >> existing entries in the dictionary and just add the new ones to the end >> (eg. 100-105). And then you encode something using that dictionary - how >> does something downstream know that you are using the 1.2 version of the >> context? It would simply have a 100 in there - but w/o that in the >> dictionary, it's not decodable. >> >> >> Slide 14: >> >> This is a good example of how to reduce size by switching from a string >> representation to binary. I assume we will find more of those cases over >> time. *BUT* a decoder needs to understand this encoding approach - but >> again, how would they recognize something new? >> >> >> At a minimum, we need a way to encode the version of the CBOR-SC >> algorithm that is used to encode a given data set. That would go a *long >> way* to resolving my concerns. >> >> Leonard >> >> On 7/24/20, 11:19 AM, "Manu Sporny" <msporny@digitalbazaar.com> wrote: >> >> On 7/24/20 11:00 AM, Leonard Rosenthol wrote: >> > However, the main use case that you present in the presentation is >> > QRCodes - which exist as a mechanism to move from digital to analog >> > (and back). The analog world is long lived - even if not >> > necessarily archival - and the data needs to be retrievable. And >> > that can't happen w/o knowing the right (version of the) dictionary >> > to use. >> >> ... which is why we strongly suggest that all production contexts >> should >> be versioned, frozen, and cryptographically hashed. There is a general >> mitigation for your concern. :) >> >> To be clear, this issue is well known in the JSON-LD ecosystem and >> that >> ecosystem has thrived (deployed on tens of millions of domains) in >> spite >> of the danger. That community has learned how to manage constantly >> evolving vocabularies (schema.org), and how to lock vocabularies >> down (VCs). >> >> There are solutions to the problem you outline, cryptographically >> hashing URLs is one thing we explored, but that bloats the size of the >> CBOR-LD bytes. Like any technology, CBOR-LD is a series of difficult >> design trade-offs. >> >> Just like we made the conscious decision in JSON-LD to be able to >> reference external JSON-LD Context files (which people insisted was >> madness and unworkable when we did it... and still do), we make the >> same >> conscious decision now (because it worked out pretty well for JSON-LD, >> and it's not clear how doing the same thing in CBOR-LD would be any >> different). >> >> If we wanted to eliminate the risk you highlighted, we wouldn't be >> able >> to solve the most pressing use cases. >> >> -- manu >> >> -- >> Manu Sporny - >> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fin%2Fmanusporny%2F&data=02%7C01%7Clrosenth%40adobe.com%7C068dbd2266774d9df7c108d82fe4ec40%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637312007547071439&sdata=9FPko04mJd9Ti%2FqTUGWCAA9L8v6V4N1TfQTeC%2BSwyr0%3D&reserved=0 >> Founder/CEO - Digital Bazaar, Inc. >> blog: Veres One Decentralized Identifier Blockchain Launches >> >> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftinyurl.com%2Fveres-one-launches&data=02%7C01%7Clrosenth%40adobe.com%7C068dbd2266774d9df7c108d82fe4ec40%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637312007547071439&sdata=VRjEMw2dMaAme%2F5ZYMLf7EhcLxxHcyu%2B5rCEOx4N2dU%3D&reserved=0 >> >> > > -- > *ORIE STEELE* > Chief Technical Officer > www.transmute.industries > > <https://www.transmute.industries> >
Received on Tuesday, 28 July 2020 10:08:09 UTC