GRDDL for BigData or CSVW for Avro?

Hi all,

   I am working a bit with big data stacks and RDF recently.
The Big data crowd like to use binary formats such as Apache
Avro [2] These completely seperate the schema from the data, 
encoding the data in purely binary format which would be 
incomprehensible without the schema (for Avro this is a Json Schema).

What seems to be missing is a way to markup the schema [2] the way 
CSVW [1] does it for tables, by allowing one to specify what the URI of the
relations or classes are, or how to construct a URI from the data so
that it could be easy to tie it to linked data cloud.

The advantage of doing this for the BigData crowd would be that it 
would allow Big Data engineers to be able to find the definitions 
of the data they are using, and some logical infrastructure to 
find some established consequences of the relations. It could also
allow one to automate the construction of Avro files from the data
I guess… 

I looked around on the web but could not find anything clearly
going in that direction.

Henry Story

[1] https://twitter.com/bblfish/status/1531932840086077441
[2] https://avro.apache.org/docs/current/



https://co-operating.systems
WhatsApp, Signal, Tel: +33 6 38 32 69 84‬ 
Twitter: @bblfish

Received on Thursday, 2 June 2022 18:45:07 UTC