Re: files or database for solid pods [Was: Towards Solid Lite] from Aron Homberg on 2023-11-01 (public-solid@w3.org from November 2023)

From: Aron Homberg <info@aron-homberg.de>
Date: Wed, 1 Nov 2023 06:51:07 +0100 (CET)
To: Melvin Carvalho <melvincarvalho@gmail.com>, Kingsley Idehen <kidehen@openlinksw.com>
Cc: public-solid@w3.org
Message-ID: <1019880248.48257.1698817867780@office.mailbox.org>
Dear Community,

after following this conversation for a few days, a few opinions and thoughts from my side.

=> On the filesystem vs. database debate:

Following the conversation here, I understand that one initial idea was to have the spec also  defining "how a server manages its data". I would strongly recommend to leave this whole topic  out of the spec, as I consider this an implementation detail.
 
How data is optimally loaded / saved / processed internally in "the most simple/optimal way"  depends on a lot on factors that would only remotely, if ever, generalizable.

I'd suggest to clarify that:
(a) "What protocol servers speak"
(b) "How the data sent/received is shaped"
....would be defined by the Solid Lite spec as they have a huge impact on inter-operability.

But:
(c) "How the data sent/received while speaking is stored"
....would not, because trying to answer the question: "What is the easiest way to implement the storage layer of a Solid Lite server" would always result in as many answers as there is time, developers and use-cases.

Of course, the spec could suggest "filesystem" as a possibly easy solution for developers to implement the storage layer, and it can be a great answer for many use-cases but to give you a simple example, it would complicate things for the way I plan to implement a Solid Lite server.

In modern "serverless" environments, you don't even mount a filesystem anymore. You work in-memory and store with any kind of SaaS-based storage layer.. Many people deploy to hosters like Vercel, Netlify, Cloudflare etc. these days. If you host on the edge, you probably use some simple KeyValue storage API or a NoSQL database API. For large files, you use a storage bucket API connected to a CDN. 
 
If Solid Lite would be opinionated about how a server implements it's storage layer, everyone trying to build a "serverless" solution, would have to come up with a filesystem abstraction layer to comply with the spec.
 
And this is only one concern. What if the future lies within some sort of federated cloud storage based that we can't even imagine today? How could implementation details like that ever be well-tested?

=> On a) "What protocol servers speak"

I'd vote for "LDP without RDF", having only LDP-BC (BasicContainer) support..

With that we already essentially have a "simple filesystem/container storage interface for the web" and it's easy to implement.

On b) "How the data sent/received is shaped"

Concerning classes of errors and fear of "bug potential"... the elephant in the room probably is: "Data shape incompatibilities". When schema is not validated and different servers, support different data shapes, things will get wild quickly in federation. This is why I think, data compatibility in Solid Lite should be solved by enforcing schemas and validation, even if it adds a bit of complexity.

Now I'm probably again spreading an unpopular opinion, but the Shape Trees that I've seen in the Solid spec, and the associated Shape Expression Language are both extremely unpopular and also complex. If it was part of the Solid Lite spec, people would have to build their own parsers and validators. That probably doesn't make much sense if we're going for simplicity.
 
Maybe we can come up with something like "Schema Validation Lite" - a simplified schema validation to ensure data inter-operability, but using more industry standard conform solutions? Even if it means to only ship "partly validation" like 1-level instead of n-level, it would probably be very helpful for basic checks and balances with data-interoperability.

=> On Authentication:

I think WebID and Solid-OIDC is simple enough to be implemented without much hassle - also because here, the spec is close to industry standards.
 
But can we safely assume that Solid Lite would primarily be used "as a server running one pod for a single owner"?
 
Because in this case we can leave out complex user and certificate management. There would only need to be the typical flow for a token-based WebID login to authenticate and authorize an  owner, but all things "create user", "reset password", etc. would not be part of Solid Lite, making it much easier and less work to implement. A simple CLI tool with a GUI on top  would managing the both more simple and more secure.
 
Also, this would reduce the future-hassle of "How to address malicious use of my Solid Lite pod server when exposed publicly to the internet?"
 
The solution for "Can I have many pods?" would simply be: We make it so easy to deploy your own server that this limitation isn't an issue. Aka. "Deploy your own server with one click" (in the cloud). And that probably also eliminates all classes of issues related to "I don't truly own my own data" (or the domains associated with the server).

=> On Authorization:

I'd vote for owner can read+write, all others ("anonymous") can read.

=> On Serialization:

To keep things simple, can we please simply speak JSON as a MUST in Solid Lite? (and maybe RDF and others as a SHOULD, not enforcing it) 

I think, RDF as a data serialization format is one reason for people to not look deeper into Solid and it's ideas, as it is a rather unpopular choice outside of the Solid community. This might be another unpopular opinion now.. But in reality, you'll also find less libraries with good support for RDF/turtle/etc. compared to JSON. If we want people to have an easy time implementing a server, having RDF as a MUST is definitely adding a lot of complexity for most developers.

Serialization is just a stateless transformation process that can always happen in the client that speaks to the Solid Lite server.

=> Outlining Solid Lite server features:

In essence, to implement such a Solid Lite server, there's not so much to implement:

- HTTP routing, setting up TLS, and WebID, Solid-OIDC / .well-known and auth flow endpoints (identification, short-lived token creation for the owner, validated in middleware)
- a simple CLI (maybe a GUI on top) for pod owners identity management
- generalized middleware implementation that is used in each endpoint route handler, to:
  - handle auth via JWT token validation (authentication)
  - handles role-based permission management "read" or "read-write" (authorization)
- a few (generic) HTTP endpoint routes, LDP without RDF (topic a) for CRUD document tasks:
   - JSON document serialization/deserialization
   - invoke simple JSON schema validation (topic b) and associated error handling
   - actual CRUD operations for/on documents via some storage layer (topic c)
   - storage associated operational error handling (not found, etc. pp.)

....and that should be more or less it?

I'm looking forward for any feedback on my thoughts.

@Jeff Great initiative, https://github.com/solid-contrib/practitioners/ looks promising. Thank you for this. I'll try to participate.
@Melvin Thank you for the draft spec and Github repo, I'll copy my input also to the Github issues there
@Jon I just discovered the onboarding initiative https://github.com/solid-contrib/getting-started through Jeffs repo. Looks promising too.

Have a good day and best,
Aron

---
Aron Homberg
Web & AI Technology Expert
 
📅 Schedule a Meeting https://calendar.app.google/TPodNLT2KToKVZ1c9
📱 +49 170 5474455
📧 aron.homberg@fluctura.com mailto:aron.homberg@fluctura.com
🌐 www.fluctura.com https://www.fluctura.com/


Precision Engineering in AI-Driven SaaS Solutions & PWAs

> Melvin Carvalho <melvincarvalho@gmail.com> hat am 01.11.2023 09:23 WITA geschrieben:
>  
>  
>  
> 
> st 1. 11. 2023 v 2:02 odesílatel Kingsley Idehen <kidehen@openlinksw..com mailto:kidehen@openlinksw.com> napsal:
> 
> > On 10/31/23 5:54 PM, Nicolas Chauvat wrote:
> > > Le Tue, Oct 31, 2023 at 03:41:40PM -0400, Kingsley Idehen a écrit :
> > >> That's exactly how we use Solid atop our Virtuoso platform (which is a DBMS,
> > >> WebDAV file server, Middleware combo). Basically, you can work using
> > >> fileystem interaction patterns while the underlying data is accessible via a
> > >> number of interfaces and returned in a variety of negotiated formats..
> > > We tried to do the same a few months back, based on our own semweb
> > > framework <https://www.cubicweb.org> but we got lost in what seemed
> > > like a maze of specs in which we did not manage to trace a clear path
> > > to reach our goal within the time we had available.
> > >
> > > My guess is that other people would be tempted to give a shot at their
> > > own implementation with their preferred toolset and that there is
> > > value in having a clear reading path for implementers in that set of
> > > documentation / recommandation / test suite.
> > >
> > > Is that a problem that Solid-Lite would address in some way ?
> > >
> > 
> > I certainly sense that's the goal of a Solid-Lite i.e., a simpler spec
> > that enables broad implementation.
> > 
>  
> Broad implementation of course a goal, facilitated by an easy set of requirements
>  
> Looking at it in reverse, an easy, bug-free server implementation will inform the elements that are selected, at least at first, for solid lite, from the bigger Solid spec
>  
> There is a caveat that timbl has always said that http:// and file:// spaces are part of the web.  I'm sure this is part of the original vision.  The web space and file space working together in a read-write way.
>  
> I take a photo on my phone, it goes into my shared folder, and my friends get a notification.  This seems quite fundamental?
>  
> So if this is not mandatory, it should be in one of the early optional.  I'm not 100% convinced it's not mandatory, maybe Tim will say one day.  But it's fine to compromise on this issue.
>  
> 
> > --
> > Regards,
> > 
> > Kingsley Idehen
> > Founder & CEO
> > OpenLink Software
> > Home Page: http://www.openlinksw.com
> > Community Support: https://community.openlinksw.com
> > Weblogs (Blogs):
> > Company Blog: https://medium.com/openlink-software-blog
> > Virtuoso Blog: https://medium.com/virtuoso-blog
> > Data Access Drivers Blog: https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers
> > 
> > Personal Weblogs (Blogs):
> > Medium Blog: https://medium.com/@kidehen
> > Legacy Blogs: http://www.openlinksw.com/blog/~kidehen/
> >                http://kidehen.blogspot.com
> > 
> > Profile Pages:
> > Pinterest: https://www.pinterest.com/kidehen/
> > Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen
> > Twitter: https://twitter.com/kidehen
> > Google+: https://plus.google.com/+KingsleyIdehen/about
> > LinkedIn: http://www.linkedin.com/in/kidehen
> > 
> > Web Identities (WebID):
> > Personal: http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i
> >          : http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this
> > 
> > 
> > 
>
Received on Wednesday, 1 November 2023 07:02:15 UTC