00003_--_arch_design.txt 2.7 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172
  1. Q: How is data stored?
  2. A: Plain text only, preferably S-expressions, maybe canonical (stricty spoken being binary).
  3. Reason: Foster simplicity in long-term storage. We shall need all kinds of formats, like atom/RFC4287, webfinger/RFC7033, activitypub. Some are xml based, some json.
  4. So we have all kinds of ugly formats that we have to write and in some cases read. But the only source of truth is exclusively S-expressions.
  5. # Requirements
  6. - human readable storage
  7. - i18n
  8. - quick text search
  9. - map well to Atom RFC 4287
  10. - map well to https://www.w3.org/TR/activitystreams-vocabulary/#dfn-note
  11. - cheap append, no modify, expensive delete (re-group entries), no tombstones (leave no trace)
  12. - scale to 100k posts, 10k followers
  13. - make undesired state unrepresentable
  14. - human usable, terse ids, e.g. minute since epoch in human base32 or https://opam.ocaml.org/packages/base32
  15. - moderate size
  16. ## Approach
  17. - default page size is 50
  18. - most recent page has no, 2nd the highest number, oldest is page zero
  19. - most recent has 50 posts, 2nd has 1-49, older 50 each
  20. - on post deletion that page and all more recent ones have to be rewritten, archives, too
  21. - modifying a post creates a new one with reference to the ancestor, deletion optional
  22. - there may be additional archive chunks with a size of 5k (100k posts = 20 archive chunks)
  23. - each page has a cdb each for url->id and id->pos
  24. - there is no global id or url lookup (ask each page/archive instead)
  25. - there is a global counter however for tags
  26. - the same page/archive apply for tags (/o/t/<tagname>/) and days (/o/d/<dayrfc3339>/)
  27. - urls ARE unique as changes delete the old post (thus ids 404)
  28. - ids have a total order
  29. ## Single entry is
  30. - public
  31. - id (url) mandatory
  32. - updated date mandatory
  33. - published date optional
  34. - title mandatory, language optional
  35. - text content (incl. language) optional
  36. - enclosure (=attachment) optional
  37. - additional titles and content (one per language)?
  38. - a Link https://www.w3.org/TR/activitystreams-vocabulary/#dfn-link or Note https://www.w3.org/TR/activitystreams-vocabulary/#dfn-note
  39. - 'acvitity/note' (=create), reply, 'announce' (=boost), like, dislike,
  40. ## Actions on posts
  41. - create+note (add,reply)
  42. - delete
  43. - announce+link (boost) = reply without content
  44. - like/unlike
  45. ## Actions on other actors
  46. - Block https://www.w3.org/TR/activitystreams-vocabulary/#dfn-block
  47. - Undo Block https://www.w3.org/TR/activitystreams-vocabulary/#dfn-undo
  48. ## Flow on add
  49. - write single post to a single file,
  50. - add global store length to index file (evtl. cdb uri->pos), append post to global store,
  51. - if 2nd-most recent page has 50 entries, start a new, empty one
  52. - move oldest single post to 2nd-most recent page
  53. - add a marker char to a page.cnt file
  54. - refresh dependant atom & activitypub