tutorial.md 41 KB

Comun Tutorial

by drummyfish, released under CC0 1.0, public domain

This is a tutorial that teaches programming in the comun language. This only teaches the pure language, not e.g. the details of the implementation specific bytecode or how to integrate comun to your programs -- such information will be provided elsewhere.

Who is this for? To complete programming newcomers this tutorial may be too difficult to digest; we rely on some basics programming knowledge here and the language itself is low level and therefore not so easy to handle by beginners. To read this tutorial you should at least already be able to write very basic programs in an easy language such as Python and know basic terms such as algorithm, interpreter, data type, stack, function, array etc. If you already know about advanced topics such as pointers and can program in low level languages such as C, this tutorial should pose no difficulties.

Besides this tutorial also consider reading the specification (which is very short, to an experienced programmer it can serve as a condensed tutorial and a cheatsheet), provided example programs and other documents in the documentation directory of this repository.

What is comun and what is it good for? Comun is a programming language focused on simplicity and freedom, it is trying to take a different path than the mainstream languages such as Python or JavaScript. It's been wholly developed by a single guy with the goal of creating a basic language for an alternative, simpler computer technology than that which is currently plaguing the world. You can use comun in many ways, even if you don't share the worldview it is based on, for example you may utilize it as a simple scripting language in your game -- it's much simpler to incorporate into a program than for example Lua, a language that's itself considered very simple and easy to integrate. Comun will run even on extremely weak computers (we might even say calculators) with just kilobytes of RAM, it can be ported almost anywhere (this means NOT just comun programs, but the comun compiler itself). You may also use comun to write regular programs, the advantage you'll gain is freedom; you free yourself from the highly corrupted, bloated, proprietary and overcomplicated world of mainstream technology that's "protected" by copyrights and trademarks, that's ever changing, breaking, dropping compatibility and fueling consumerism. Programs written in comun are meant to last and hopefully be run along with other freedom-focused technology such as small embedded personal computers without operating systems. If you write your program in comun, it's very likely to survive far into the future and run on basically anything we might call a computer, simply because it's pretty easy to create comun compiler/interpreter. But beware, if you're looking for features of a "mainstream" language, such as great safety, huge standard library or rapid development, comun is not for you.

A quick word about comun in case you know a bit about programming already: comun is very low level and similar to FORTH, it is imperative, stack-based and uses postfix notation. It can be both compiled and interpreted (current implementation even has its own simple bytecode). It offers the option to use different width integer types, pointers and even has an optional preprocessor. But all in all it tries to keep it simple and avoid what we call "bloat", so it doesn't sport advanced features such as floating point or user defined data types.

Setting It Up, First Program

We suppose you are using a Unix-like system (e.g. GNU/Linux) and know the basic of working with its command line. Other systems support the language as well, but you may have to make an extra effort to mimic the commands.

Firstly download the whole repository with the language, e.g.:

git clone https://codeberg.org/drummyfish/comun.git

Now compile the C implementation of comun compiler with your favorite C compiler (here we use gcc, but you also may use clang, tcc etc.):

cd comun
gcc -O3 -o comun src/src_c_old/comun.c

To test whether everything is working, try to interpret some of the provided example programs, e.g.:

./comun programs/mandelbrot.cmn

You should see a picture of the Mandelbrot set drawn in terminal.

NOTE: If you are more experienced, you may check out various options of the compiler with ./comun -h -- for example you may compile comun programs to C as ./comun -C program.cmn.

Now let's try to write a first "hello" program. Open a plain text editor, create a file named first.cmn in the directory we're currently in and write the following in the file:

# our first program!

0 "hello :)" -->

Now you should be able to run the program as:

./comun first.cmn

And see:

hello :)

What the source code actually means will be explained later. For the more experienced this is a summary: # our first program! is a comment, 0 pushes the value 0 on stack (which will serve as a string terminator), "hello :)" pushes the ASCII values of the string in reverse order on the stack and --> is a built-in command that outputs a zero-terminated string from the stack.

Basics (Stack, Operators, Commands, Postfix Notation, ...)

The basic concept in comun is that of a stack. You should be familiar with the term, but a quick summary is this: stack is an abstract data type which we can imagine for example as a gun magazine, except that instead of bullets we have numbers. We may perform two main operations with the stack: push (insert a number) and pop (take one number from the top of the stack). I.e. stack behaves as so called LIFO (last in, first out). Stack top refers to the value that's been pushed last and is therefore at the very top of the stack.

Comun internally sees computer memory as a stack. At the beginning there is nothing on the stack, we may visualize this as follows:

memory: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        ^
        stack top

Now in a comun program whenever we write a "number", we are telling the program to push that number to the stack. Consider e.g. the following comun program:

1 2 3 4

This is a series of commands that say: push 1, push 2, push 3, push 4. If you run the program, comun executes the commands and its internal stack will look like this:

memory: 0 1 2 3 4 0 0 0 0 0 0 0 0 0
                ^
                stack top

The beauty of stack is in that it can be used to make common computations very simply and elegantly. For example to add two numbers we can simply use the + operator which says "pop 2 numbers and push their sum". Let's try to add this operator to our comun program:

1 2 3 4 +

If we run this, when comun encounters the + operator, it pops two top-most values (here 3 and 4), adds them (to get 3 + 4 = 7) and then pushes this sum on the stack. So at the end the stack will look like this:

memory: 0 1 2 7 4 0 0 0 0 0 0 0 0 0
              ^  
              stack top

Now notice one thing that may seem confusing: the value 4 stays in the memory after the value 7 -- this is a behavior that will be important for advanced manipulation of memory later. For now just notice it is so and ignore it -- for our purposes we may simply imagine that the values above stack top simply don't exist.

Let's try to perform more operations now:

1 2 3 4 + * + 5 /

The * operator will again make comun pop two values (2 and 7), multiply them (2 * 7 = 14) and push the product. Then there is another addition, then we push value 5 and perform division. Let's see the execution of the whole program step-by-step:

step 0  memory: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
                ^
step 1  memory: 0 1 0 0 0 0 0 0 0 0 0 0 0 0
                  ^
step 2  memory: 0 1 2 0 0 0 0 0 0 0 0 0 0 0
                    ^
step 3  memory: 0 1 2 3 0 0 0 0 0 0 0 0 0 0
                      ^
step 4  memory: 0 1 2 3 4 0 0 0 0 0 0 0 0 0
                        ^
step 5  memory: 0 1 2 7 4 0 0 0 0 0 0 0 0 0
                      ^
step 6  memory: 0 1 14 7 4 0 0 0 0 0 0 0 0 0
                    ^
step 7  memory: 0 15 14 7 4 0 0 0 0 0 0 0 0 0
                  ^
step 8  memory: 0 15 5 7 4 0 0 0 0 0 0 0 0 0
                     ^
step 9  memory: 0 3 5 7 4 0 0 0 0 0 0 0 0 0
                  ^
                      

We can see the final value we have on stack top now: 3. Now notice that what we've written as our comun program could be traditionally written as a mathematical expression with the use of brackets:

comun:              1 2 3 4 + * + 5 / 
normal arithmetic:  (1 + (2 * (3 + 4))) / 5

Here "normal arithmetic" means the common notation we are used to use in everyday life -- it is called an infix notation because we put operators (such as + and /) between the operands (numbers) -- here we sometimes need to use brackets to make it clear which operations to perform first. In comun we rather use so called postfix notation (or reverse Polish notation) which puts the operator after the operands -- this is more convenient as it corresponds exactly to how we perform operations on the stack (the operands and operators are written in the exact order in which they're executed) and it also turns out this notation doesn't need any brackets!

Now let's make our program write out the final computed value so that you can experiment with these simple calculations. We will append a code that computes the quotient and remainder of the final value when divided by 10, then converts them to ASCII characters (add them to the ASCII value of character "0") and prints these characters with the -> command. Note that at this point you don't have to fully understand this code, just append it and let it work, the details will be explained later. Also note that this will correctly only write results that are lower than 100. With that said, try to run the following program:

1 2 3 4 + * + 5 /
$0 10 / "0" + -> 10 % "0" + -> # this prints the final value

You should see:

03

Of course comun has more operators and operations you can use, here are just some of them (for a complete list see the language specification):

  • +, -, * (multiplication), / (integer division), % (remainder after integer division)
  • ++ (increment, i.e. add 1), -- (decrement)
  • <, >, <=, >=, =, !=: comparisons, two values are popped and either 1 or 0 is pushed based on the result of the comparison (e.g. 1 2 < pushes 1 because 1 is smaller than 2)
  • ><: swaps two values at the stack top
  • ^: pop, performs a single pop of a value
  • && (logical AND), || (logical OR), & (bitwise AND), | (bitwise OR), ! (bitwise NOT)
  • ??: conditional (ternary operator), pops 3 values, pushes second popped value if third popped value is non-zero, otherwise pushes the first popped value
  • $0, $1, $2, $3, ... $9: pushes Nth value below stack top, i.e. for example $0 * squares the value at the stack top (it is duplicated and them multiplied with itself)
  • ->: pops 1 value and outputs (prints) it as ASCII character
  • <-: reads 1 value from the input (user) and pushes its ASCII value on the stack
  • -->: string output, keeps popping values and printing them to output until it reaches the value 0 (if this value is not found, your program may crash!)

Appending ' to a command makes it non-popping, so for example +' pushes a sum of the two top-most values on the stack, without popping those two values. This is useful sometimes, but it doesn't work with some special commands (e.g. --> or user functions).

Furthermore for convenience we may specify numbers (numeric literals) in other bases, e.g. instead of 44 we may write the value as hexadeciaml +x2c or binary +b101100.

There are also string literals which are a convenient feature for pushing ASCII values of text characters in reverse order, as this is a common operation when we want to work with text. For example the following program:

"hi all"

does exactly the same thing as a program:

108 108 97 32 105 104

NOTE: There exist no escape sequences for special characters in strings; if you need such a value in your string, you have to push it manually as a numeric literal.

Now recall the first program we've seen:

# our first program!

0 "hello :)" -->

Now we may understand it better:

  • # our first program! is a comment for humans and has no meaning in the program, anything that starts with # is ignored by comun until the end of line or another # character.
  • 0 pushes the value 0, this will serve as a stopper for the operation that will print our text string.
  • "hello :)" just pushes the individual ASCII values of the characters in this string, which will be here for the command that prints strings to the output.
  • --> is the built-in string output command that will print the zero-terminated string we've just pushed.

Finally here are a few exercises:

  1. Modify the program we used to show expressions and operators to evaluate an expression we'd normally write as *(8 + 6) / (2 * (4 - 3))*. Hint: the result is 7.
  2. Write a program that reads two ASCII characters from the input (with the <- command) and then writes out a character that lies in the middle between them (e.g. given input AE the output will be C). Hint: the middle character is the average of the input characters.

And here are the solutions:

  1. Change the comun expression to 8 6 + 2 4 3 - * /.
  2. <- <- + 2 / ->

Control Structures (Branches, Loops)

In any non-trivial program you will need to use branches ("ifs") and loops ("for", "while", ...).

A branch in comun starts with ? and ends with . and can have an optional else branch (separated by ;), like this:

?
  # commands
.

?
  # if-commands
;
  # else-commands
.

It works very similarly to traditional languages: when comun encounters the branch, it pops one value and if it is non-zero (i.e. true), the commands inside the branch are executed, otherwise they're skipped (if there is an else-branch, then that is executed).

For loops, there are basically just two types: a while loop, starting with @, and an infinite loop, starting with @@. Loop again ends with the . character, i.e.:

@ # while loop
  # commands to repeat
.

@@ # infinite loop
  # commands to repeat
.

Any loop can be exited immediately with the break command !@.

The infinite loop, unsurprisingly, just keeps repeating its commands forever (usually it is manually exited with the break command). The while loop behaves similarly to a branch; when encountered, one value is popped and if it's non-zero, the loop is entered; at the end of such loop the program jumps to just before the loop to perform the same pop and check again.

Furthermore control structures can again be made non-popping, just as commands -- i.e. a branch/loop starting with ?'/@' performs its check without popping the stack value.

Here is a simple example showing the use of a branch (NOTE: equality comparison in comun is just =, NOT == as in some other languages):

0 "Hello, did you have a good day? (y/n) " -->

<- # read the answer

"y" = ?        # is the answer equal to "y"?
  0 10 "That's great!" -->
;
  0 10 "Sorry to hear that." -->
.

Now Let's see another example, this time combining both branching and a loop:

10                   # iteration counter, perform 10 iterations 

@'                   # while loop counter is non-zero
  $0 2 %             # duplicate stack top, take remainder after division by 2

  ?                  # if this remainder is non-zero (i.e. 1)
    0 10 "odd" -->   # NOTE: 10 here is ASCII value of a newline
  ;                  # otherwise the number is even
    0 10 "even" -->
  .

  --                 # decrement loop counter
.
^                    # pop the loop counter (no longer needed)

When run, you should see:

even
odd
even
odd
even
odd
even
odd
even
odd

Notice how @' has ' appended so that the loop doesn't pop the iteration counter as we'll need the loop counter in following iterations (we will only manually pop it with ^ after the loop ends). On the other hand the branch ? is popping because it checks a value which we won't need anymore.

Functions

Functions in comun are very similar to functions (also procedures or subroutines) in other languages. A function is basically a subprogram or a user-defined command, a named set of commands that can be invoked as a whole from different places in the code. The invocation of a function is called a function call. Functions are important so that we can divide a big program into smaller parts and also so that we don't have to repeatedly write the same code in multiple places; suppose we e.g. need to solve a quadratic equation in many places throughout our program -- it would be very wrong to keep writing the same code for solving a quadratic equation in each such place; a function allows us to write such code only once and then refer to it from multiple places.

A function is defined ("created") by writing its name (which may only consist of alphanumeric characters or the _ character, but mustn't start with a digit) with : at the end, then the function's commands follow and the . character terminates the function definition. A function call (invocation) is performed simply by writing its name (without the : at the end) anywhere a normal command could be used.

Functions can be defined anywhere but not inside another function. However a function can call itself, i.e. recursion is possible. Function can also call functions that will be defined later in the source code (i.e. for C programmers: forward declarations aren't needed).

Here is a simple example that defines a function named square which computes the square of a number (i.e. x * x), then calls the function with an input number 3 and prints the result (9):

square: # function, pops 1 value, pushes its square
  $0    # duplicate stack top
  *     # multiply it with itself
.

3 square
"0" + ->

We see that unlike in traditional languages there are no parameters or return values indicated in the definition of the function -- in comun parameters and return values are passed to and from functions manually using stack. A function is simply a named piece of code. Similarly when we call the function we don't pass arguments to it in a special way like in other languages (where such a function call might e.g. look like square(3)) -- again, we pass the arguments by pushing then on the stack.

You should also know there exists a "return" command, !., which we can use to immediately exit the function we're currently in -- it's pretty convenient many times.

Functions can also be used to implement constants -- those coming from the C language may be used to define constants with preprocessor, for example as #define PI 3.1415. Though in comun preprocessor can be used in the same way, it's usually easier to just do it with a function (for those fearing about performance: compiler will likely optimize this so there will be no penalty). Here is a simple example showing how we define a constant SQUARE_SIDE_LEN that's later used in multiple places in the code:

SQUARE_SIDE_LEN: 10 .    # function used as a constant

# draw a square with ASCII characters:
SQUARE_SIDE_LEN @'
  SQUARE_SIDE_LEN @'
    "##" -> ->
    --
  .
  ^

  10 ->

  --
.
^

A very useful function is one for printing out numbers. The following is a primitive version of such function that works for numbers up to 999 by simply printing the number of hundreds, tens and units in the number:

printNum999: # pops 1 value (< 1000) and prints it
  $0 100 / 10 % "0" + ->
  $0 10 / 10 % "0" + ->
  $0 10 % "0" + ->
  ^
.

As an exercise try to use a loop to write a function which works for number of any magnitude and which doesn't print any unnecessary leading zeros. This function will appear later in this tutorial so that you can check how it compares to your version :)

Pointers

Pointers in comun allow us to store and jump between memory addresses which can make many things easier and also allows to do a bit more advanced things. If the advanced pointer stuff in this section seems too complicated for you, you can skip it for now, but remember to get back later :)

We can use pointers e.g. in a way in which we use global variables in other languages -- this saves us many headaches in complex situations when we'd otherwise have to juggle many variables on the stack.

Pointers also allow us to create multiple stacks. So far we've only operated only with one stack, but it may sometimes be much easier to e.g. have one "main" stack and then a number of different stacks, e.g. for functions, so that whenever we call a function we know that it won't mess up with our main stack. We may also use such separate stack as an array in which we want to store some larger data so that they don't stand in our way when we are performing computation on the main stack.

What is a pointer anyway? Basically it's something that has a name and stores a memory address. In fact we've already seen one pointer; it's the special stack top pointer whose name is simply 0 (zero) -- the command $0 which we've known as "duplicate stack top" is actually a pointer command (which start with $) which pushes a value at address where given pointer (in this case 0) points. I.e. the pointer 0 always points to the stack top of the main stack, and $0 says "take the value from memory where pointer 0 points and push it". Besides the special pointer 0 there are also special pointers 1, 2, ... 9 which point that many places below stack top, so e.g. $2 pushes the value stored two addresses below the stack top address.

Pointers are stored in a memory separate from the memory in which we make our computations, so called pointer table. This is for several reasons, some of which being safety (we can't just accidentally overwrite addresses in our pointers) and the possibility that addresses stored in pointers may in theory be higher than what would fit into one memory cell in the "main" memory.

However now we need to talk about the main thing -- creating our own so called user defined pointers, or pointers to which we can give our own names and with which we can manipulate. We can create our own pointer by writing ~ and then its name (for user pointer names the rules are the same as rules for function names), so e.g. ~ourPointer creates a pointer named ourPointer which will by default point to 1 free memory cell that's reserved just for this one pointer (unless we specifically do something nasty, we don't have to be afraid that the value at this address will be overwritten by someone else).

NOTE: the declaration of a pointer starts with ~ (and not $) because it is not really a run time command but rather a compile time directive. Also keep in mind that pointer definition is always global, there is no sense of a local scope as it might be in other languages, so it may be best to just declare all your pointers at the top of your program source code.

If we want to write some value to the memory cell which ourPointer points to, we can simply do it as $:ourPointer -- this will pop one value from the main stack and write that value to the memory where ourPointer points to. To get a value back from a memory cell where ourPointer points to, we simply do it as $ourPointer (notice the syntax is the same as for the $0 command).

Here is a simple demonstration program of using pointers as variables:

~ourPointer

1 2 3 4       # push some values on the main stack
$:ourPointer  # write the last one to our pointer "variable"
5 6 7 8       # push more values on the main stack
$ourPointer   # get back the value we saved in our pointer "variable"

When we run the program, after it finishes the internal state of our program may look like this:

memory addresses    0  1  2  3  4  5  6  7  8  9 ...
memory cells        4  0  1  2  3  5  6  7  8  4 ...
                    ^                          ^
                    ourPointer                 stack top

pointer table:
  0 (stack top) = 9
  ourPointer    = 0

Firstly we can see that the pointer table is separate from the main memory -- the addresses stored in the table for individual pointers correspond with the arrows that point to the main memory at the top. Stack top ended up pointing to address 9 at which final value 4 is stored. This value was pushes by the last command $ourPointer which took it from the address 0 which is where ourPointer points. The value 4 was written to address 0 by the command $:ourPointer that took it from the stack top at the time the command was performed.

Now let us move to a bit more advanced use of pointers.

When declaring a pointer, we can specify the number of free (unoccupied) memory cells we want to get -- by default this is 1, i.e. above we only created a pointer and it automatically pointed to one free cell. But if we need to store more values, we can create something that can be seen as an array in a way which in languages such as C we would call static allocation. This we do at the time we are declaring a pointer as ~pointerName:N where N is the size (counted in cells) of the block of memory at the start of which pointerName will point. In the C language this would be equivalent to something like int myArray[N];.

NOTE: The requested number of cells may be even 0. This will create a pointer that doesn't point to any free cell, i.e. it will point so some random cell which we shouldn't touch. Why create such pointers? It may be just for example a pointer which we use to temporarily save other pointer addresses to.

Next we should take a look at more pointer commands that are necessary for advanced pointer manipulation. Let's just sum up the most common commands we will typically use with pointers (note that for a few exceptions the name p may also be a name of the special pointers 0, 1, 2, ...):

  • ~p: declare pointer p with size 1
  • ~p:N: declare pointer p with size N
  • $p: push the value stored in cell to which pointer p points
  • $:p: pop 1 value and store it to the cell to which pointer p points
  • $>p: make pointer p point one address "up" (increment the address stored in the pointer)
  • $<p: make pointer p point one address "down"
  • $p>q: make pointer q point to the same cell as pointer p (copy address stored in p to q)

Here is an example that prints given text string in reverse, showcasing what we've been talking about:

~array:100       # array of 100 cells
~stack2:0        # pointer we'll use for our second stack
~tmp:0           # temporary storage for old stack top address

printReversed:
  $array>stack2  # set stack2 pointer to start of the array
  $>stack2       # this is necessary to not pop at address 0
  0 $:stack2     # move string terminating 0 to stack2

  @'             # for all chars of the string on main stack
    $>stack2     # shift second stack one address up
    $:stack2     # move this char to the second stack
  .
  ^              # pop the string terminating 0

  $0>tmp         # store current stack top to tmp
  $stack2>0      # set current stack top to stack2
  -->            # print what's there
  $tmp>0         # restore the old stack top
.

0 "star dog" printReversed

The program should print god rats. Let's analyze what it's doing a bit. ~array:100 creates a pointer which will point to a block of 100 free memory cells -- we may see this continuous block of memory as an array and we'll use it as a second stack to reverse a string on the main stack. In the function printReversed we first set the stack2 pointer to the beginning of the array and shift this pointer one address up -- this is because if stack2 theoretically pointed to address 0, the command --> at the end of the function would try to do a pop of the terminating 0 at the lowest possible address, which from comun specification can't be done (popping a value when at address 0 would get us to illegal address -1). The rest should be clear from the comments in the code: we use a loop to pop individual characters from the main stack to the second stack, then we temporarily move the main stack pointer to the second stack, print the string there, and move back to the main stack.

Hopefully this wasn't all too difficult -- if so, don't worry, pointers are hard. Give it a bit of time, go through the examples again and try to experiment yourself, it should all make sense in the end.

Data Types/Environments

Comun has a notion of different width unsigned integer data types, but by default this is all hidden. All integers are implicitly unsigned and in two's complement representation, but some operations may consider integers signed -- for example / represents unsigned division while // means signed division (this is similar to how assembly languages work, there are distinct instructions for signed operations). Note that most operations, such as +, - or * don't have to distinguish between signed and unsigned integers, there is only one version for them.

Suppose now that we are on a 32 bit computer -- this means that by default integers in comun have 32 bits and can store values from 0 to 4294967295 (2^32 - 1). If you push value -1 on the stack, comun actually pushes the value 4294967295 (or 11111111111111111111111111111111 in binary), which in two's complement representation means -1. The value stored at stack top will be the same, 4294967295, it is up to us whether we, programmers, see it as a signed (-1) or unsigned (4294967295). We may for example perform the operation -1 -1 + just as well as 4294967295 4294967295 +, as this gives exactly the same result on a 32 bit computer thanks to the wrap around behavior, the result will be 4294967294, or -2 if we see this value as signed. The distinction between signed and unsigned values is important to comun only in some situation, e.g. in case of division or comparison. For example -1 1 < will result in value 0 because the < operator is UNSIGNED comparison and therefore it sees the operation as 4294967295 < 1 which is false; on the other hand if we use the signed comparison as -1 1 <<, we get 1 as the result because the operator really interprets the first value as negative one.

In comun there are no composite data types -- comun is very low level and is only able to distinguish between different width integers. Anything more complex such as "structs", arrays or even objects has to be implemented by the programmer (note that arrays can however be implemented pretty easily with pointers).

As mentioned, by default your program works with the platform's native integer type. It is only when you decide to care about data types more that you can tell comun to use different types.

Data types in comun are implemented by so called type environments. A type environment is basically its own isolated "world" with its own memory and pointers (i.e. each environment has also its own stack top pointer, but note that e.g. functions don't belong to different environments as they simply name parts of the source code). Different type environments are distinguished by the width of their integer data type, i.e. the integer that's stored in that environment's memory cells. I.e. type environment 8 stores 8 bit integers in its memory cells, type environment 16 stores 16 bit integers in its memory cells etc. Type environment 0 is a special environment whose integer width is equal to the platform's native integer (but not smaller than 16) and which is the default type environment.

The command to change type environments is ~N where N is the number of type environment, so e.g. ~8 switches to type environment 8 and ~0 switches to the native type environment. Keep in mind that this is not a run time command but a compile time one! Such a command in the source code just tells the compiler that from that point onward in the source code operations such as +, -, * and / will operate in the specified type environment.

You can transfer values between type environments with the >N command where N is the environment to transfer the value to; the command pops 1 value in current type environment and writes it to the stack top of type environment N (watch out, it doesn't push the value, just overwrites stack top!). This allows e.g. for computing some expression in a type environment with high range of values (to prevent overflows) and then getting the final value back to the native environment.

NOTE: comun implementation doesn't have to support all possible type environments (only type 0 is required), it may e.g. only include the common ones (such as 0, 8, 16 and 32).

An example will demonstrate these concepts best:

printNum:
  0 ><
  @@
    $0 10 % "0" + ><
    10 /

    $0 0 = ?
      !@
    .
  .
  ^

  -->
.

# by default we're in type env. 0 (native type)

0 "highest value in type environment 0: " -->
-1       # this will wrap around to the highest integer value
printNum
10 ->    # newline

~8       # switch to env. 8
0 "highest value in type environment 8: " -->
-1
>0       # transfer the value to env. 0 for printing

~0       # back to env. 0
$0 printNum 10 ->

~16      # switch to env. 16

0 "highest value in type environment 16: " -->
-1
>0

~0       # back to env. 0
$0 printNum 10 ->

This may print e.g.:

highest value in type environment 0: 4294967295
highest value in type environment 8: 255
highest value in type environment 16: 65535

What is this good for? Sometimes you need control over the data type width, for example if you perform calculations in which you know values can get very high and might overflow, you may want to do that with 32 bit integer types (with maximum value 4294967295) because the default, native data type (environment 0) may in theory be only 16 bits wide (allowing a maximum value of 65535). On the other hand if you want to store large data in a form of a long array of bytes, you may want to do that in type environment 8 because there one byte fits exactly into one memory cells -- this will save a lot of RAM, which can be important e.g. on limited embedded devices.

More Details: File Includes, I/O, Program Arguments, ...

Now we'll mention some useful features we didn't get to yet.

One such feature is a file include that allows for creating libraries. File includes make it possible to include source code from another file in your program. Thanks to this you may split your big project into multiple files or create libraries of reusable commonly used functions and then just include these libraries in many different programs. A file include is done with the command ~"filename" where filename is the path to the file to be included -- imagine this as a simple text copy-paste, whatever is in the included file will be pasted in the place where the include command appears. Note that the file include command in comun is not considered part of preprocessing (it can be used even with preprocessor disabled).

Input/output: "vanilla" comun only has one input and one output stream of values; these values are understood to be ASCII values of text characters, so if you output value 65, A will appear on the output (usually on screen, but remember that on a Unix system your program's output may also be redirected e.g. to a text file). Similarly if you ask for input, you will get ASCII values of the text characters that are passed to your program (usually what user types on his keyboard). The built-in commands for input and output of a single value are <- and ->, respectively. There is also a command for checking whether input is still unfinished: <?. The command pushes 0 if all input values have been read (which applies mostly to reading from files) and is equivalent for checking for so called EOF (end of file) value in the C language. Finally there is a convenience function for printing out zero-terminated text strings which we've already seen: -->.

Program arguments: your program may receive arguments from the operating system when it starts, these arguments are typically the command-line flags with which your program is being executed (in C you may know this as argc and argv). Before the program starts, these arguments are pushed on the stack as zero-terminated strings (argv) and after them the number of these arguments is pushed (argc).

Preprocessor

Preprocessor is an advanced feature of the language, it can automatically modify the source code before it is processed, however in comun it's not supposed to be used very often (as is common e.g. in C). It allows for implementation of things such as templates, macros, conditional inclusion of libraries, selection of features to be compiled into the program etc. You very likely won't need to use preprocessor in most of your normal programs, but it may help in making some more complex ones.

As said, unlike in the C language where preprocessor plays a crucial role, in the current official implementaion of comun preprocessor isn't enabled by default, it has to be enabled with the -p flag. It is not supposed preprocessor will be as heavily used in comun because it comes with more complexity, more difficult and hardware demanding compilation and it is usually easy to avoid anyway: note that things such as file includes and definition of global constants can be done without preprocessor (file includes are not part of preprocessor and constants can be implemented with functions). So consider only using preprocessor if you really have a good reason for it.

Comun preprocessor uses the same languge -- comun -- for its work, i.e. unlike in C, preprocessor here doesn't use a special language. In comun a program to be preprocessed is basically a comun program that prints a source code of the final comun program to be compiled. There is a little bit of syntax sugar to make this all comfortable: preprocessor code is separated from the underlying code with [ and ] brackets, i.e. everything between such brackets is considered to be code that belongs to preprocessor.

NOTE: for technical reasons [ and ] characters cannot be used for any other purpose in comun program, they cannot appear even in comments or strings.

When source code is to be preprocessed, the preprocessor basically sees everything outside the [ and ] brackets as a string to be printed out, and everything inside [ and ] brackets as a code to be executed. By executing a code in this way one gets the final code to that will be passed to the compiler.

Let us see an example program:

[
  ~polite
  0 $:polite  # switch between 0 and 1 to change politeness
]

0
[ $polite ? ]
  "Good sir, would you kindly enter a character please? "
[;]
  "enter some character: "
[.]
-->

<-

0
[ $polite ? ]
  "The excellent character you entered was: "
[;]
  "you entered: "
[.]
-->

->
10 ->

If we compile this program, the preprocessor looks at this program and sees something like this:

~polite
0 $:polite

# print: 0

$polite ?
  # print: "Good sir, would you kindly enter a character please?"
;
  # print: "enter some character: "
.

# print -->
# print <-
# print 0

$polite ?
  # print: "The excellent character you entered was: "
;
  # print: "you entered: "
.

# print: -->
# print: ->
# print: 10 ->

Note that for readability here we put in comments what should really be true comun print commands. If you want, you can actually see the true output of preprocessor with special flags passed to comun compiler, see comun -h for details. Preprocessor executes the code and so creates the preprocessed code:

0
"enter some character: "
-->

<-
0
"you entered: "
-->

->
10 ->

This code is then passed to the compiler and compiled. A run of the program may therefore look like this:

enter some character: a
you entered: a

Now if we want to change the politeness of our program, we can simply change the line in the original program from:

0 $:polite  # switch between 0 and 1 to change politeness 

to

1 $:polite  # switch between 0 and 1 to change politeness 

This will lead to preprocessor passing a different code to the compiler. It may look like this:

0
"Good sir, would you kindly enter a character please?"
-->

<-
0
"The excellent character you entered was: "
-->

->
10 ->

And the run of the program may look like this:

Good sir, would you kindly enter a character please? a
The excellent character you entered was: a

Now it should be at least a little obvious that preprocessor can make it possible to shape the source code by just changing a single value somewhere at the beginning of the code. Of course preprocessor may use all the power of the comun language, for example loops and functions, but here should also come a warning that this may be dangerous and lead to loss of readability, so always consider to use less rather than more. All in all, you should only start using preprocessor once you get very comfortable with the language.