comun, the best programming language

598 Revize

1 Větve

Miloslav Ciz 3e165b4cd5 Update TODO		před 5 měsíci
docs	8f1ab4cdde Fix tutorial a bit	před 5 měsíci
other	0e48370584 Add bootstrap script	před 5 měsíci
programs	6db2de2e20 Update game of life	před 5 měsíci
src	8d3a0cf754 Tidy some programs	před 5 měsíci
LICENSE	cd160573a1 Init	před 2 roky
README.md	eb2d310931 Update scripts, fix interp bug.	před 5 měsíci
TODO.txt	3e165b4cd5 Update TODO	před 5 měsíci

comun

The best programming language, for the benefit of all living beings.

This repository contains specification of, supplemental materials for and implementation of comun, a minimalist and idealist KISS/suckless public domain language wholly made with selfless mindset of just creating purely good, non-consumerist, maximally helpful and harmless technology without any self interest.

Current state: comun became self hosted and bootstraps itself, though still remaining work in progress; there is still the old C implementation around as an alternative and substitute. Keep in mind that EVERYTHING is work in progress. It's not super tidy yet but already quite usable.

Want to program in comun? There is a tutorial!

Don't forget to check out the docs directory with specification, frequently asked questions etc.

There is already a tiny game written with comun and SAF: Flying Ball.

Fun note: I just noticed brainfuck is trivially translated to comun just by renaming commands, proving comun is indeed Turing complete :)

about the language

Brainfuck is minimal and elegant but practically unusable, C is extremely usable but bloated (500 page specs?) with unnecessary stuff (even the super minimal TCC has over 20 KLOC and took an enormous effort to write)... and is legally owned by ISO. We need something in between (and free; both legally and practically). Joke languages like C++ and Rust are an insult to intelligence and won't be mentioned beyond this sentence.

We don't need another "modern" language because in the age when everything is getting worse "modern" means "as of yet the worst" -- we need a new iteration of an old language, additionally made with selfless attitude and a wider vision of how technology should ideally be, i.e. simple and helpful.

The language needs to be usable but also simple as simplicity is necessary for true freedom because it implies that many implementations can be made on many platforms by many people and so no one controls/owns the language. It has to respect old and weak devices (such as embedded computers) and devices that would likely be made in ideal society (simple computers), and must of course have no dependencies on cancerous "modern" technology -- it must be free from it not only physically, but also by philosophy and mentality. It must ideally not depend on anything.

The philosophy and features of the language are summed up here:

completely free, public domain: Not only is the language implementation free by a license, as is standard nowadays (though many still try to keep trademarks), it is completely public domain without any legal restrictions, it is suckless (guaranteeing practical freedom rather than just freedom on paper) AND the language specification is also completely public domain (unlike e.g. with C). It's pretty depressing that nowadays it's even POSSIBLE to somehow "own" something about a language (its description, name, implementation or mere ideas used in the implementation) -- such concepts must be strictly shit on.
specification that fits on a sheet of paper (well, using both sides and a lot of text squeezing), you can hold the whole language easily in your head -- there are about 50 built-in commands, most of which are typical operators such as +, -, < etc.
KISS/suckless, no bloat/bullshit: No package manager, no standard library, no floating point (say no to the float rabbit hole!), no Unicode (sorry, no pregnant men in comments), no IDEs, no legal conditions, no trademarks, no OOP, no generics, no furry mascots, no COCs, no memory safety, no handholding, no AI, no toxic woke discord communities etc.
great balance of minimalism vs usability: The language is pretty minimal but not "pseudominimal" (i.e. minimalist only on the outside, like most other languages nowadays) -- if desired, its interpreter can be implemented in a few hundred lines of pure C without any libraries, but it still tries to be usable for writing real programs (even if it may take more effort than with "modern" bloat languages in which you basically write an AI prompt and get a program). Programs written in simple languages are inherently superior in the end, and worth the effort; there is no counter argument that isn't justified by technology consumerism.
imperative, no bullshit paradigms: Computers are inherently imperative and therefore imperative paradigm is the closest, most natural, best predictable, most easily implemented and mapped to hardware paradigm while also being closest to natural human thinking. In terms of universal applicability there is nothing better, there is nothing to improve or invent, forcing arbitrary silver bullet paradigms is just a bullshit soydev fashion. Higher abstractions are to be implemented as a libraries or languages above this one.
stack-based: Stacks are based so we base our language on stacks. The language is similar to FORTH, expression evaluation happens on stack, as well as passing arguments to functions. This is also how most CPUs compute, so it nicely maps to assembly.
reverse Polish notation: Simple, no complex parsers, grammars, no bracket symbols or operator precedence needed (yes, there are downsides like not being able to tell how many parameters a function takes from an expression, but it's no big deal).
optional pointers: Global pointers can be used to serve as variables or arrays in traditional languages, or to create and manage multiple stacks. With pointers memory can be statically allocated. There is no dynamic allocation, that's bloat.
minimal, intuitive and consistent syntax: The goal was to try to use as few symbols and rules as possible, no complex grammar; there are no semicolons, expression brackets, English words etc. Intuitive symbols are used instead (e.g. ? instead of if etc.). Consistent rules are used, e.g. directives start with ~, pointer commands with $, appending ' to a command makes it non-popping etc.
simple 1-2-3 symbol math-like commands, no English: This is firstly more elegant and secondly makes source code shorter (which may matter on devices with limited memory and small displays). Also it will make golfers happy.
low level but structured and portable: Traditional control structures exist, goto command can optionally be supported to e.g. allow the possibility of translating assembly to comun.
minimum abstraction: Only as little abstraction as possible should be present to offer portability. High abstraction hard wired to language is poison as any higher abstraction's scope is limited and causes problems outside this scope. Only generally applicable abstraction are portable imperative commands as every computer works this way. Any higher abstraction is to be offered by a library, not the language itself.
functions, recursion: must have
different width data types (optional): If you care about different width types, you may use separate type environments (one for native type, one for 8bit integers, 32bit integers etc.). This allows utilizing memory well on devices with low amount of RAM if needed. Two's complement is guaranteed, values are considered unsigned but there exist signed operations (e.g. signed divide or signed compare, like in assembly languages).
avoiding issues with platform specifics (int width, pointer size, endianness, ...): Where these specifics would cause an issue, there is a mechanism to abstract them away (for example pointers exist in a table separate from the main memory so that their size doesn't matter to the program). Different width integers are in separate memories so byte sex doesn't affect the program etc.
comfy features that don't cost much: e.g. string literals, hex and bin literals, comments, a bit of syntax sugar like --> for printing strings, ...
file includes: Simple "copy-paste include" of files allows for creating libraries.
Unix philosophy/do one thing well: Be a programming language, not a platform, IDE, package manager, virtual machine or anything like that. Implement true modularity and reusability (not just that which serves internal project organization but one that truly encourages wild hacking by others).
no paying for unused features, non-basic features are optional: The design is such that if you e.g. don't want to use different data types, you simply do nothing and the language uses the default one (platform's native type). If you don't need pointers, they're not there, if you don't need functions, they're not there, if you don't need signed operations, they're not there, if you don't use preprocessor, it's not there etc. This is great because even partial language implementations can still accept many programs that don't make use of advanced features.
avoid boilerplate: Why require some weird copy-paste magic code in a simple programs when it doesn't have to be there?
indentation doesn't matter: Some languages like Python make indentation express program structure. This is bad because there are many cases when you want to compile unindented or badly indented code, e.g. minified code, one liners or code in which you use wrong indentation to mark temporary debugging code.
non-commercial, no aim for profit, goal is purely higher good: The language is created as a part of LRS (less retarded software/society) which aims for creating purely good (free, simple, future proof, hackable, repairable, hardware nondiscriminating, ...), selfless technology that's supposed to help all living beings without exploiting them.
preprocessor using the same language: Preprocessor is present (though optional) because it is important e.g. for compile-time precomputations (important type of optimization), templated code or writing portable code that decides which platform-specific libraries to include or what constants to set before compilation etc. However it is specified in a very simple way (and is simple to implement), without needing a separate language: with preprocessing one simply writes a comun program that outputs a source code of the final comun program.

examples/overview

Simple "hello" program in comun may look like this:

0 "hello :)" -->

Here are a few basic functions written in comun, notice how beautifully simple the code is (please forgive lack of syntax highlight, there is a syntax highlighted example in other/syntax_highlight_example.html):

max: <' ? >< . ^ .      # takes maximum of two values

max3: max max .         # takes maximum of three values

# recursive factorial
factR:
  ?'
    $0 -- factR *
  ;
    ^ 1
  .
.

# iterative factorial
factI:
  $0 --

  @'
    >< $1 * ><
    --
  .
  ^
.

# power, raises A to B
pow:
  1 ><

  @'
    >< $2 * ><
    --
  .
 
  ^ >< ^ 
.

# 8bit sine function approximation with quadratic curve
sin8:
  $0 256 % 64 / # quadrant
  >< 64 %

  $1 2 % 1 = ?
    63 >< -
  .

  63 - $0 * 32 /

  >< 1 <= ?
    255 >< -
  .
.

# converts number to zero-terminated string
numToStr:
  0 ><
  @@
    $0 10 % "0" + ><
    10 /

    $0 0 = ?
      !@
    .
  .
  ^
.

# converts signed number to zero-terminated string
numToStrS:
  $0 0 <<
  ?
    -1 >< - 1 +
    numToStr
    "-"
  ;
    numToStr
  .
.

# prints number
printNum: numToStr --> .

# prints number as signed
printNumS: numToStrS --> .

For more insight see the specification, tutorial and example programs.

repository structure

The following documents some of the important files and the structure of the repository. Each directory will potentially have its own README with further information.

src: source code of comun
- *.cmn: self hosted implementation, working but still WORK IN PROGRESS
- < 5000 LOC
- backends: compiler backends for various platforms/languages
- c.c: C backend, < 500 LOC
- python.py: python backend, < 400 LOC
- src_c_old: comun implementation in C99 (until self-hosted one is finished)
- comun.h:
  - full comun implementation as a library
  - < 4000 LOC
  - single header library, KISS/suckless
  - pure C99
  - no dependencies, not even standard library (except for tiny and trivially replaceable stdint.h)
  - no dynamic allocation (malloc, ...)
  - very efficient, works even on EXTREMELY weak embedded devices, so far tested on: x86 PC, Pokitto (ARM, 36 KB RAM, ran the SAF comun game), Arduboy (8 bit CPU with only 2.5 KB RAM, ran minitest.h)
  - supported type environments: 0 (native type), 8bit, 16bit and 32bit
  - simple implementation-specific bytecode: 16 bit instructions (8 bit opcodes), assembly-like but with meta-information allowing also easy transpiling to higher level structured languages
  - simple optional optimizations of the bytecode implemented: inlining, removing unused functions, replacing operations with more efficient ones etc.
- comun.c:
  - standalone compiler/transpiler/interpreter/debugger/tool using comun.h
  - so far can compile to C, comun bytecode and comun (e.g. for decompiling bytecode)
  - transpiled code is itself KISS, it's a single file that doesn't use any libraries which aren't strictly needed
  - < 60 KB binary (achieved with gcc -Os)
- minicomun.h:
  - comun subset, yet simpler than full comun, can be used e.g. as a super tiny, extremely simple mini scripting language
  - tiny single header pure C99 implementation
  - < 1000 LOC
  - < 20 KB binary (standalone version, achieved with tcc)
  - compared to full comun: no goto, no preprocessor, no file includes, only type environment 0, no pointers except for built-in 1 letter pointers that can only be used as variables
  - extremely KISS, interprets the source code string directly without any transformations (no bytecode, syntax trees etc.), uses no dynamic allocation etc.
  - single header library which can itself also be compiled into a standalone interpreter (just compile the header itself with -dMCM_STANDALONE=1)
  - zero dependencies, not even standard library (only stdio required for the standalone version, NOT needed for pure library)
docs: documentation and documents related to the language
- specification.md: language specification, < 5000 words (wc -w)
- bytecode.md: implementation-specific bytecode documentation
- tutorial.md: language tutorial for beginners
programs/*: various programs in comun, serving as examples and tests, including some basic libraries
other/*: other files, including e.g. vim syntax highlighter and precompiled minicompiler for bootstrapping

limitations

There are still limitations, disadvantages, TODOs etc. These include:

No uber features like memory safety, floats, huge standard library etc. -- this is actually a feature and should be pretty clear, the language is low level and simple.
At the moment there is no switch statement (it may be added in the future). Multibranching therefore has to be done with if statements which firstly doesn't look awesome (many endifs at the end) and secondly is slower (compared to proper O(1) switch with address jump table) -- smart compilers (e.g. when you transpile to C) may still optimize such an if sequence to address jump table, but comun itself doesn't currently do this.
Current C implementation limits the maximum token length -- this is for simplicity and also because of input flexibility; we need to keep the token in memory to analyze it but don't want to use dynamic allocation, so we have a statically preallocated fixed sized memory for it. The implementation is also made to allow any input stream of characters, not just a string in memory, so it can't just keep a pointer to the token in input string, it has to copy it somewhere. This practically just limits you in one way: you can't use very long string literals, but this is easily solved: just split them.
No function pointers -- function pointers aren't necessary though they can be useful and fast in many situations. But they come with a lot of extra complexity.
Current C implementation isn't 100% correct and robust -- Again, for simplicity the current implementation doesn't work in 100% cases, e.g. some things such as maximum token length are limited, or as a primitive hash table is used for identifiers, it is possible for a name collision to occur -- though unlikely, this could prevent compilation of a theoretically valid program (still names up to a certain length are guaranteed to not collide, and you can always increase the size of the hash if you encounter issues). If you try hard enough you can probably segfault the current implementation by doing some real crazy stuff like trying to interpret some manually hexedited bytecode -- such things haven't been tested and they're sometimes not even checked. There may be more things like this. Keep in mind the current C implementation is for now considered a temporary solution to allow us to write comun programs until we make a beautiful self hosted implementation.
Current C interpreter isn't super efficient: it's written to be simple rather than extremely fast or memory efficient, it could be made much faster; self-hosted implementation will aim to improve this).
When enabled, preprocessor RAM usage grows with source code size, i.e. no O(1) preprocessor. This is because preprocessor generates a program that prints the final source code and for a bigger source code this program will be bigger -- a more complicated preprocessor (such as that in C) doesn't have to suffer from this, but is more complicated to make. Higher RAM usage isn't an issue on PCs or mainstream mobiles at all, but may play a role on very limited embedded devices. It however recommended and supposed that preprocessor won't be heavily used (like it is e.g. in C) and that only pretty complex programs will resort to it. Most things can be done without it (note that e.g. file includes are separate from preprocessor and don't trigger its use, i.e. you can happily include files without worrying about burdening your program too much).
Preprocessor is Turing complete -- while this may be also seen as a feature, it allows the preprocessor to e.g. get stuck in an infinite loop.
No block comments at the moment. So far there are only line comments (though they can be ended before end of line). This may be a small annoyance e.g. for debugging where you need to quickly disable big parts of code. This can now be achieved in other ways, e.g. with blind if statement, preprocessor, deleting the code or mass line-commenting the block with the help of your editor. But block comments would be nice, they'll be considered in the future.
Reverse Polish notation can be harder to read, firstly because we're not used to it and secondly because it lack some information and allows errors to sneak in, e.g. sometimes you can't tell how many arguments a function takes. But it's still OK and completely usable and very cheap to implement. I've also found this makes me better think about every expression I write so that in the end I write nicer code.
Separate type environments may sometimes be cache unfriendly. This is because they're typically far away from each other in RAM and if you e.g. make something like a struct with different data types in different type environments, this may kill CPU cache. However this only applies in specific cases, like the one mentioned, it can be prevented and could also be addressed by a different implementation (interleaved type environments?). It also probably won't ever be noticeable on "big" computers and small computers (embedded) many times don't even have any CPU cache so they don't care.

"rights"

NO RIGHTS RESERVED

fuck copyright and capitalism

I, Miloslav Číž (aka drummyfish), have created everything in this repository (except for the text in the LICENSE file) myself (with occasional external suggestions and feedback of other people) from scratch and my intent is to waive all my exclusive "intellectual property" rights so that my work can be used absolutely by anyone for any purpose without any conditions whatsoever.

Everything in this repository is released under CC0 1.0 (public domain, https://creativecommons.org/publicdomain/zero/1.0/) + an extra waiver of all other IP rights (including patents and trademarks, just in case CC0 isn't enough to ensure complete public domain). Some files in this repository have a waiver notice embedded in them (to better ensure they carry the information about their legal status even when taken out of this repository) while some do not, however the waiver mentioned here applies to ALL files in this repository, no matter whether they bear the notice themselves or not.

The additional waiver of all IP rights follows:

The intent of this waiver is to ensure that this work will never be encumbered by any exclusive intellectual property rights and will always be in the public domain world-wide, i.e. not putting any restrictions on its use.

Each contributor to this work agrees that they waive any exclusive rights, including but not limited to copyright, patents, trademark, trade dress, industrial design, plant varieties and trade secrets, to any and all ideas, concepts, processes, discoveries, improvements and inventions conceived, discovered, made, designed, researched or developed by the contributor either solely or jointly with others, which relate to this work or result from this work. Should any waiver of such right be judged legally invalid or ineffective under applicable law, the contributor hereby grants to each affected person a royalty-free, non transferable, non sublicensable, non exclusive, irrevocable and unconditional license to this right.

README.md