123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277 |
- TODO:
- - Make another test program that will test all operations carefully, one by
- one, always only using the operations that have already been tested before
- (supposing these are working correctly). This should help with making
- bugless frontends.
- - add profiling (randomly sample lines with debug info)
- - COMUN PHASE 2: after self hosted implementation start working on new version
- incorporating proposed changes, mainly make the bytecode suck less!
- - Maybe for phase 2: simplify the file include system? As related to the issue
- of writing the "include" preprocessor that takes all files and resolves the
- includes into one big file -- now it either requires a lot of RAM or
- unelegant solution by feeding the files in many passes. Would be ideal if
- including could be solved just by appending all files together (now it can't
- be done because some pointers may require to be defined before including
- some file).
- ============================================================================
- - COMUN SHELL: part of the future PD computer, now it could be simply written
- in C just to define the interface, features:
- - running programs (no multitask, just like DOS)
- - some basic file operations, support for temporary RAM files?
- - basic stuff like getting system time, its name, specs, shutdown, restart
- etc. (IRC suggested: command to increment any pointer by stack top ot
- type env. 0)
- - scriptablity ofc (or maybe rather not), possibility to execute a script
- from file?
- - the NEW IDEA:
- comun shell will be the shell, the OS and the I/O library, all in one
- The shell is a program that reads input (text commands) and
- "does things", like switch mode (no screen, text screen, graphics
- screen, ...), play sound, write out something, run a program,
- list files, communicate over network, get capabilities, ...
- In user shell input simply comes from keyboard. As a library the program
- simply does the same by sending input characters to the shell.
- Maybe there could be a global buffer that will hold latest N characters
- output by previous program => could be used for non-multitasking
- pipelining.
- ============================================================================
- - BC: comparison instrs like greater could prolly be dropped, they can be
- replaced by "swap, less comparison"
- - possible new syntax for pointers:
- - $ptr1:ptr2 instead of $ptr1>ptr2
- - $ptr1=ptr2, $ptr1>ptr2, $ptr1<ptr2 for pointer comparison?
- - Make some kind of source code simplifier that pre-optimizes a plain text
- comun source code? E.g. just regex-style detects functions that are just
- constants and replaces then in the source etc. It could allow compilation
- of larger source code on weaker computers.
- - for comun implementation MAKE A BETTER STRING PSEUDOHASH FUNCTION, this one
- fails too much, mostly at strings that are 1 or 2 characters longer than
- the max string length, possible improvement: first store n LAST (not first)
- chars of the identifier, as the beginning of identifiers are many times the
- same ("CMN_getCurrent..." etc.), TEST this on some huge dataset of
- identifiers
- - uxn interpreter/compiler
- - consider: allowing more interactivity between type envs., e.g. adding a
- 32 bit value to a pointer in type env. 8
- - maybe logical xor would better be ||! than |!!? the latter looks like double
- negation or something
- - regexp library
- - proposed operator: $:, pop1 X, N, sets Nth value below stack top to X
- (dual operator to $)
- - markdown (simplified) library
- - preserve also label names?
- - port comun onto other langs by writing the comun bytecode translator to the
- language itself in the language itself, i.e. for example python program
- that translate comun BC to python
- - example program: arrays (with minilib)
- - make a tiny toy IDE using SAF
- - test out popping vs non-popping variants of various commands (e.g. pointer
- ones), see if errors occur where they should
- - add warning on ops that do nothing? (e.g. $<1) maybe add a function that
- says instr. does nothing, can also be used for optim.
- - program: raycasting
- - directive to hint on inlining a function (making it kind of a "macro"?)
- - directive hinting on minimum stack size
- - more optimizations
- possible bytecode optimizations:
- - remove instructions that do nothing (like no-pop transfer to same env)
- - remove NOPs while recomputing addresses <--- DONE
- - sequences like "MUC 1, DIC 1" can just be eliminated <--- KINDA DONE (but can we can detect more sequences)
- - merge multiple pointer increments or pops to a single increment by a constant
- TODO/CONSIDER IN FUTURE VERSIONS:
- - SUGGESTION for comun next version (suggested by two people now): allow
- adding values to pointers from different TEs (i.e. for example shift TE8
- ptr by the stack top value from TE0) -- in BC this could be done by just
- adding IC to PAX instruction that would say from which TE the value should
- be taken. Just make a nice comun syntax for it.
- - program that translates brainfuck to comun
- - block comments? could be between ## and ##
- - since bit shifts were added, change optimization of mult by pow. of 2 to
- a bit shift!
- - ADD ACTUAL BIT SHIFT OPERATIONS AND INSTRUCTIONS. Yes, they are needed,
- because shifts by variable size can't be easily encoded.
- - run on Pokitto etc., test even Arduboy
- - make SAF programs with comun
- - remove unused functions and labels
- - add bit shift operations?!?! no new instructions would be needed (they
- translate to MUC/DIC). Normal mul/div are hard to convert to shifts. OR
- maybe find a way to optimize e.g. CON 2, MUX to other two instructions with
- MUC -- this could be done by making CON popping, i.e. normally CON would be
- used more like CON', and CON' 2, MUX could be translated to MUC 2, CON 2
- - <-- command to read string from input? Though this isn't nearly as simple
- as -->, maybe just don't do this.
- - switch statement in future version? would make faster code, but compiler
- can do something akin switch during if optimization (this would need a new
- instruction in bytecode). Switch should be just an extension of if branch,
- maybe like this:
- condition ?
- # else
- ;
- # for condition = 2
- ;
- # for condition = 1
- ;
- # for condition = 0
- .
- As for bytecode: there could be an instruction "jump by" given step, then
- a multibranch if would just be this instruction followed by constant jumps
- to individual labels, like:
- <JUMP BY>
- <JUMP TO CASE 0 ADDR>
- <JUMP TO CASE 1 ADDR>
- ...
- - maybe remove logical functions (i.e. ||, &&, ...), they don't serve much
- purpose (no short circuit as in C)
- DONE:
- - self hosted: in TE 0 pushing -1 and +xffffffff creates the same BC! By
- current BC spec the latter should probably push half and then ADC the rest,
- OR specify that CON always pushes UNSIGNED value and implement -1 as CON 0;
- SUC 1.
- - BUG: -O3 makes comunpress stuck in an infinite loop? doesn't even print
- anything
- - IN SELF HOSTED IMPLEMENTATION: possibly split the impl. into multiple files
- (general, interpreter, compiler, optimized, ...) so that we can make a
- MINIMAL COMPILER that is able to compile itself even on low-RAM devices
- (a full comun including interpreter and everything may eventually be over
- 30 kb which might not fit e.g. on Pokitto).
- - programs: bytebeat
- - BUG: imagelib testing program transpiled to comun doesn't work!
- - INCLUDE ISSUE: if two libraries included at the same level both inclulde the
- same library, both will be included and cause name collisions! Includes
- should probably always behave as "include once" (is there any scenario in
- which we'd want to include the same lib twice? not even preprocessor
- templates need this as preproc can simply generate the same code twice in
- a loop) -- prolly change this in spec and implementaion.
- - in comun to comun transpile maybe try to detect what could be a string
- literal (DONE) and translate it so, plus try to detect what could be a "-->"
- command and translate it so
- - create a comun mini library in example programs
- - make the C transpiled output nicer (there are weird literal formats etc.)
- - add compilation to comun (i.e. when loading bytecode, we can turn it back
- to comun)
- - fix/improve the vim highligter, KINDA
- - try to make preprocessing stage 1 code smaller (reduce unnecessary spaces
- etc.)
- - add new possible value to DES that would indicate start of string literal?
- ^ rather not now, there are not many free DES values left, plus there would
- likely have to be two values (string start and end), plus we would again
- make everything more complex... string literals can be guessed just from the
- instructions alone anyway
- - specify minimum stack size?
- - option for non-minimized preprocessor output (can be useful for debugging
- or just generating human readable sources)
- - raised error param in IC
- - test program for gotos
- - implement CLI arguments
- - test: very big program
- - BUG: goto test with -O3 shits itself
- - expand big program
- - function for basic sanity check of bytecode
- - hash collisions happen, e.g. SAF_loop and SAF_COLOR_YELLOW <-- FIXED
- - make a uber test, a shell script that tries all the test programs with
- different optimization levels etc.
- - change findExternalFunc to just findFunc and allow also searchin for defined
- functions + make a function in compiler for calling such func
- - Consider 64 bit support? Currently only 32 bit is supported due to useing
- uint32_t e.g. in interpreterGetXY etc.
- - bash script that takes comun program and makes a syntax-highlighted HTML
- - BEFORE RELEASE: try to make small executable, currently smallest one seems
- to be produced by gcc -Os, also try to compress it with gzexe, AND make
- a statically linked executable and see how much that one takes
- - add beautify and minify options (can just use the tokenizer maybe), maybe
- create a Formatter "class" that does this automatically, can be used in
- preprocessor to minimize the underlying code so that the resulting 1st
- stage preprocessing bytecode is smaller
- - add measure option (-m, -M ?) that runs the program and writes how many
- steps it took, how many symbols were needed to store, the highest address
- touched in every type env, bytecode size (before and after optim) etc.
- - rename "variables" to "pointers" in source code
- - focus on safety between unsigned <-> signed conversions, simple cast is
- probably not super portable
- - Check out the casts from int64_t * to uint64_t * -- prolly not OK, fix.
- ^ dunno maybe it's actually OK
- - make doxyfile and test
- - unify names in the comun.h library
- - minicomun.h: extremely simple minicomun pure text interpretation
- - change interpreter to incorporate the separate 0 type environment!
- - function to estimate the memory needed for type envs and pointers from
- bytecode, use in interpreterInit
- - create syntax highlighter for vim
- - gotos!
- - somehow handle reporting correct error position in code with
- includes/preprocess. (with includes push the pos on stack, with prep. prolly
- can't easily do this, maybe just don't report pos.)
- - inline functions whose bytecode is same or shorter than the shortest call
- of that func :) but this shouldn't be default because it dropts the info
- about func (e.g. bad for transpile), make an option like -O2
- - interpreter doesn't have all instructions implemented (those that never
- get generated now)
- - program "$3" segfaults (should return interpretation error)
- - need to also add string output instruction? dunno how to make it with normal
- instructions if its non-popping <-- nope, changed specs of string output
- - possibly add halt (whole program) and return (from function) commands? halt
- could use the END instruction -- the last END instruction would have to have
- IC = 0, otherwise 1. <-- done with jump
- - if no iofunction provided, ignore it (if interpreter->iofunction == 0 ...)
- - add option which enables special external function that cause interpreter
- to do various things, e.g. print debug info? somethinkg like a small
- built-in library. Not sure if this is a good idea tho. <- RATHER NOT
- - Test all the sign stuff (especially pushing negative literal) on 64 bit CPU!
- - in a sequence of NOPs (and DESs etc.) add jump instead of the first one to skip all the NOPs, easy to do <- BAD IDEA because not well formed
- - Specify that the minimum size of type env 0 should be e.g. 16 bits?
- Otherwise many programs can't simply be though of as portable because env
- 0 may in theory be just 2 or something.
- - C transpile: throw error if goto jumps out of a function
- - add CLI option to run with debug (-d?)
- - maybe separate comun.c and different frontends, i.e. frontend_c.h,
- frontendy_py.h etc.
- - bytecode optimization
- - try if everything works if we increase pseudohash size
- - add goto test to main comun test!
- - maybe add pointer comparison, like $ptr1=ptr2, is useful for stopping
- pointers (maybe returns 0 on equality, 1 if ptr1 > ptr2 else 2?)
- - test the !. command in general tests!
- - maybe remove the jump offset instrs? they're not really used
- - add pointer comparison ($p1=p2) to tests!
- - with runtime errors report the number of steps of interpreter
- - Throw error (NOT SUPPORTED) when trying to push literal outside 32bit range
- (can̈́'t be dont because internally we use int32_t)
- - add convenience function to comun.h that just takes comun string and
- interprets it
- - self hosted comun: add stack trace to error reports.
- - Add debug info to bytecode! Maybe like this: make a new DES type; when the
- constant in this is let's say 0, this marks start of a new line in source
- code (i.e. no need to record actual line numbers in the instr, its enough
- to just count number of these markers since BC start). However an issue is
- how to handle different source code files. (now done in BC spec)
- - WTF, it seems compiler/interpreter don't take into account sign extension
- with signed ops, also if we fix this C transpiler has to be fixed (the
- constToC func) to deal with this!
- WIDER PROBLEM: All the things with storing consts with taking into account
- sign extension is mess, we'd have to store all constants with 0 at the
- beginning because we don't know by what operation (sign or unsig) they'll
- be used. We could just say fuck it and only store unsigned consts, but what
- if we e.g. need to store const. -1 in type env. 0 in which we don't know
- number bit width?!?!?!?! Possible solution:
- Just store unsigned bits and only at maximum as many as needed by given
- type env. (easily known from instruction), AND for type env. zero just
- suppose bit width 32 -- this won't allow for storing some values (e.g. those
- outside 32 bit range, or those above range of signed 32 int for signed ops),
- but will probably mostly work. I.e. even if int has 64 bits on some
- platform, -1 will be stored as 0xffffffff in BC and CMN_instrGetConstSigned
- will correctly return -1.
- Also mention this in limitations in README.
|