Social data /syntax/ vs Social data /vocabulary/

Spurred by a conversation in [1]

Our WG charter says that one of our deliverables is
Social Data Syntax
A JSON-based syntax to allow the transfer of social information, such as status updates, across differing social systems. One input to this deliverable is ActivityStreams 2.0.
Now, there is an open question of should we be defining a syntax or a vocabulary*?

The difference is that a syntax is purely a transport format, whilst a vocabulary is a data model. In particular, it should be possible to usefully place data in a vocabulary in a database and each named object stand on its’ own.

ActivityStreams 1, intentionally or not, defines a vocabulary; social protocols based upon it tend to use it as both a transport format, and to define the model used by their internal database. 

ActivityStreams 2, per the current specification, defines a syntactic model. It does not make sense to store ActivityStreams 2 objects in a database as discrete objects - they only make sense in context. Meaningfully storing said data involves manually decomposing them into some internal representation (which may involve detailed knowledge of all of the types involved).

My opinion on this is that we should define a vocabulary. I say this especially as someone interested in the upper layers of the stack we are chartered to build - that is, the social API and federation protocols. I have a proposal[2] I’d like to bring to the committee in the future, based upon experience and existing practice with AcivityStreams 1, which covers both with a small and compact specification, but this depends upon ActivityStreams 2 being able to fulfil the role of a data model.

The trade-off here is that we make the AS2 specification slightly more complex - the current spec abstracts nearly everything away as an “Object”. We would probably need to bring back something like the “Media link” concept from AS1 (I prefer the term Media Source, to more clearly explain the intent).

But I feel it would be worth it - this simplifies the data model for everyone interacting with the protocol, and makes it useful as a data model. It would make the data much easier to rationalise, and help clarify what data “stands alone” vs being an integral part of some other object.

* Technically I suppose a syntax is a subset of a vocabulary. The question is if we should define a syntax which is a vocabulary, or just a syntax.
[1] https://github.com/jasnell/w3c-socialwg-activitystreams/issues/11#issuecomment-53518263
[2] My current very early working draft of which can be found here: http://oshepherd.github.io/activitypump/ActivityPump.html

- Owen
[sorry for the delay in sending this - I’ve been busy]

Received on Monday, 8 September 2014 23:31:56 UTC