123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172 |
- Q: How is data stored?
- A: Plain text only, preferably S-expressions, maybe canonical (stricty spoken being binary).
- Reason: Foster simplicity in long-term storage. We shall need all kinds of formats, like atom/RFC4287, webfinger/RFC7033, activitypub. Some are xml based, some json.
- So we have all kinds of ugly formats that we have to write and in some cases read. But the only source of truth is exclusively S-expressions.
- # Requirements
- - human readable storage
- - i18n
- - quick text search
- - map well to Atom RFC 4287
- - map well to https://www.w3.org/TR/activitystreams-vocabulary/#dfn-note
- - cheap append, no modify, expensive delete (re-group entries), no tombstones (leave no trace)
- - scale to 100k posts, 10k followers
- - make undesired state unrepresentable
- - human usable, terse ids, e.g. minute since epoch in human base32 or https://opam.ocaml.org/packages/base32
- - moderate size
- ## Approach
- - default page size is 50
- - most recent page has no, 2nd the highest number, oldest is page zero
- - most recent has 50 posts, 2nd has 1-49, older 50 each
- - on post deletion that page and all more recent ones have to be rewritten, archives, too
- - modifying a post creates a new one with reference to the ancestor, deletion optional
- - there may be additional archive chunks with a size of 5k (100k posts = 20 archive chunks)
- - each page has a cdb each for url->id and id->pos
- - there is no global id or url lookup (ask each page/archive instead)
- - there is a global counter however for tags
- - the same page/archive apply for tags (/o/t/<tagname>/) and days (/o/d/<dayrfc3339>/)
- - urls ARE unique as changes delete the old post (thus ids 404)
- - ids have a total order
- ## Single entry is
- - public
- - id (url) mandatory
- - updated date mandatory
- - published date optional
- - title mandatory, language optional
- - text content (incl. language) optional
- - enclosure (=attachment) optional
- - additional titles and content (one per language)?
- - a Link https://www.w3.org/TR/activitystreams-vocabulary/#dfn-link or Note https://www.w3.org/TR/activitystreams-vocabulary/#dfn-note
- - 'acvitity/note' (=create), reply, 'announce' (=boost), like, dislike,
- ## Actions on posts
- - create+note (add,reply)
- - delete
- - announce+link (boost) = reply without content
- - like/unlike
- ## Actions on other actors
- - Block https://www.w3.org/TR/activitystreams-vocabulary/#dfn-block
- - Undo Block https://www.w3.org/TR/activitystreams-vocabulary/#dfn-undo
- ## Flow on add
- - write single post to a single file,
- - add global store length to index file (evtl. cdb uri->pos), append post to global store,
- - if 2nd-most recent page has 50 entries, start a new, empty one
- - move oldest single post to 2nd-most recent page
- - add a marker char to a page.cnt file
- - refresh dependant atom & activitypub
|