UPDATE: obsolete, made my language now (comun)
This is a draft of an idea for the ultimate, ideal programming language.
Let's start with C, a language that's pretty suckless already.
What's seems bad about C:
The basic goals of the language are:
The implied rules are:
@
, !
, :
etc.).The priorities are not:
Perhaps use brackets for both if branching and while cycle, like in Brainfuck. E.g.:
a == a // pushes 1 on stack
( // pops stack, if 1, continues on, else jumps after amatching endbracket
c == c // pushes on stack
) // pops stack, if 1, jumps back to matching start bracket, else continues
notation comparison:
name | example | FAVSRIMGC (pros) |
---|---|---|
C | 10 * sin(PI - f(x,y)) + 1 |
222120002 = 11 (3) |
lisp | (+ (* 10 (sin (- PI (f x y)))) 1) |
022011122 = 11 (5) |
postfix plain | 10 PI x y f - sin * 1 + |
000212222 = 11 (9) |
prefix plain | + * 10 sin - PI f x y 1 |
000211222 = 10 (8) |
postfix upper. vars | 10 PI X Y f - sin * 1 + |
002212220 = 11 (9) |
postfix prefixed vars | 10 $PI $x $y f - sin * 1 + |
102112122 = 12 (7) |
prefix terminators | + * 10 sin - PI; f x; y;;;;; 1; |
111001122 = 9 (4) |
pros (suckless attributes, the number in the bracket, are SRIMG):
myfunc a b c: # comment goes either until \n or another #
+(a -(b c))
@ < (i 10)
print(i)
i: -(i 1)
;
(& (>(x 10) <(x 15))) ?
(print x)
~
(print "no")
;
(@
(> i 10)
(myfunc a b c)
(myfunc d e f)
)
(?
(= x 50)
(print "sasasa")
~
(print "eeeee")
)
THIS SEEMS GOOD:
if (cond right after "(") :
(
cond ?
...
)
while (cond right before ")"):
(
...
cond ?
)
TODO: optional program parameters at the start of the file? such as extensions
TODO: lazy evaluation? likely not, but if and a loop have to be exceptions, because we don't want both branches of if be executed.
TODO: short condition evaluation? (e.g. a != 0 & b / a)
TODO: types? arrays? structs?
Every command is a function (returns a value). This function can however have side effects and doesn't have to be a pure function (for same input always return same output).
Function returns the value returned by its last statement
IDEAS:
Let's call the project A. The project doesn't just define one language, but a whole family and hierarchy of languages.
A specifies multiple languages at different levels.
This has many advantages. we can e.g. fork and derive multiple versions of a language from a common ancestor language, drop-in another ancestor language for an existing language, we can reuse existing parsers, have data-description languages use the same syntax as the programming language itself, which can in turn allow the programming language to easily parse programs written in the programming language itself... etc.
Languages at this level are called A0, followed by 2 parameters:
E.g. A0() or A0[].
Simple language for capturing nested bracket expressions.
A0-alphabet is the alphabet of the language, and is identical to the 7bit ACII. (The language is case-sensitive.) A white character is any character with decimal value less than 33 or equal to 127.
A0-name is a sequence of characters that aren't white, A0-(, A0-), "
and #
.
A0-string is a sequence of characters that aren't "
, starting with "
and ending with "
.
A0-comment is a sequence not inside A0-string, of characters that aren't #
and \n
, starting with #
and ending with either #
or \n
.
A0-atom is either A0-name or A0-string.
A0-sequence is a sequence composed of 0 to many A0-atoms and/or A0-brackets, called elements, separated by one or more white characters, possibly left and/or right padded with white characters.
A0-bracket is a A0-sequence with A0-( in front of it and A0-) after it, possibly left and/or right padded with white characters. An element of a bracket means an element of its sequence.
Given A0 language is any A0-sequence.
Example of A0() language sentence:
hello # this is a comment
(
(() 323 - XYZ)
123
(
(abc def ghi)
)
)
Each of these languages is a subset of some A0 language. Its name starts with A1 and is followed by these parameters:
F
or f
(supporting or not supporting functions, respectively)J
or j
(supporting or not supporting jumps, respectively)E.g. A1()Fg. Languages at this level capture control flow of an algorithm (ignoring other details such as operators or data types).
A1-sequence is A0-sequence of which no element is ?
, @
, '
, ~
, ^
and ``` (ASCII value 96).
A1-branch is A0-bracket whose first element is ?
, which is followed by an element called A1-condition, which is followed by A1-sequence called A1-then, which may or may not be followed by ~
, which may or may not be followed by A1-sequence called A1-else. This represents the if-then and if-then-else structures.
A1-loop is A0-bracket whose first element is @
, which is followed by an element called A1-condition, which is followed by A1-sequence.
A1-function is A0-bracket whose first element is '
, which is followed by an element, which is followed by A1-sequence.
A1-label is A0-bracket with two elements, first one being ``` (ASCII value 96) and second one being A0-name.
A1-jump is A0-bracket with two elements, first one being ^
and second one being A0-name.
A1-call is A0-bracket whose sequence is A1-sequence and which has at least one element. The sequence starting with second element up until the end is called A1-arguments. This represents a function call (either user defined or otherwise available, e.g. as built-in or via library).
Given A1 language is a A1-sequence. If the language is parametrized with f
, no A1-function structure is allowed to appear. If the language is parametrized with j
, no A1-jump structure is allowed to appear.
Example of A1()FJ sentence:
(' func1 # my function
(` mylabel)
(? (= x 1)
(somefunc a b c)
(func2)
)
)
(' func2
(^ mylabel)
(+ 3 2)
("asasa" abcd) (3) (c)
)
(func 2)
Each of these languages is a subset of some of A1 language, and their name starts with A2 (followed by possible parameters). These languages define additional syntactic and semantic elements (macros, data types, operators, ...) needed for a complete programming language.
An SDT (simple data type) defines a set of integers. Any SDT is described by two pieces of as SDT(w,x):
The set of numbers defined by SDT(w,x) contains 2^w numbers, and these are all integers between 0 and 2^w - 1 (including the bounds) if x = u, or -2^w / 2 to 2^w / 2 - 1 (including the bounds) if x = s.
Representation of a value of given SDT is not defined, i.e. it is NOT guaranteed that the value will be encoded as two's complement or have specific endianity.
Sometimes we may need to compare SDTs, so let's define a linear ordering: SDTa(wa,xa) is greater than or equal to SDTb(wb,xb) if and only if
Values can be converted between SDTs, which is called casting. The conversion is as follows:
TODO
A CDT (complex data type) is a finite sequence of SDTs: SDT1, SDT2, ... SDTn. A complex value is a specific value a CDT can take, it is a sequence of values of SDT1, ... SDTn.
TODO: pointer type
TODO: size of data type representation?
Operators are basic mathematical and logical operations (performed on values called operands) that often correspond to single instructions in machine code. Operators behave like built-in functions, i.e. they are used in the same way as function calls, but their internal implementation may be (and usually should be, for performance sake) special, i.e. an addition operator should, if possible, translate to a single addition instruction as opposed to a call instruction.
The general rule of all operators is that the operation is performed in the space of (and limited by) the greatest SDT of the operands. The general algorithm of operation is as follows:
The operator call therefore has SDT identical to the greatest SDT of its operands.
For example when multiplying a value x of SDT(16,u) with value y of SDT(8,s), y is casted to SDT(16,u), then 16-bit unsigned multiplication is performed yielding a 16-but unsigned result (with possible overflow), which is then converted to the greatest SDT of the language and returned.
The language operators are:
TODO: bitwises
These are final programming languages of which each is a subset of some A2 language.
This is a simple interpreted language meant as a helper language to be embedded in another language. It should be very easy to implement. Features:
This is an interpreted language that can be embedded in programs and allow their scripting. This language is also easier to implement than the compiled language. Features:
This is the more complex language meant to be compiled to binary forms or to other compiled languages such as C. Features: