MyLang Overview

Kevin Atkinson
kevina at cs utah edu

1 Intro

MyLang will be a system programming language designed primarily to replace C and C++. But will also be powerful enough to replace Ada, Fortran, Java, and C#. The name MyLang is a temporary name until a better name is decided on.

1.1 The Need

C is an old language with plenty of limitations and flaws however it is still used by a large number of people. C++ is designed to be an improvement of C but it is rather ugly and few programmers fully understand the rules, thus it is not used by as many people as it could be. For example, system programmers, because they are afraid that it may do things, such as allocate dynamic memory, with out them realizing it. MyLang aims to be a much better tool for system programming than C but still give the programmer complete control of what is going on when needed.

Macro's are looked down upon by many language designers to the point that most new languages do not provide them in any fashion. This is due to the fact that the only macro system many programmers are familiar with are C preprocessor macros which are extremely limited and error prone. However even with these limitations they are in fact in heavily used by C programmers as many times a macro is the easiest way to get a job done. Higher level language constructs avoid the need for macros for many cases but don't eliminate theme. The fact of the matter is that Macro are a very powerful tool that should not be overlooked. MyLang will have a powerful macro system which will avoid the many pitfalls of the C preprocessor.

Many new languages focus on safety above all else. As such, such languages can never truly replace C as they can never be as efficient, both space and speed wise, as C in all cases. In order for a language to truly replace C, safety should be provided, but when speed or space is important the programmer should be allowed to unsafe things. MyLang will be one such language.

Most language are designed around a particular programming paradigm. Some languages, such as Java, force everything into this model, even when it is ill suited to the problem. Not only will MyLang support multiple programming paradigms, it will allow new ones to be created.

1.2 Philosophy

The major aim is to be a practical language which will allow the programmer to get the job done with a minimal amount of effort. However the language should also be a flexible and expressive language that does not force the programmer to think about a problem in a non-natural way. If the current language constructs are ill-fitted to the problem at hand the language should allow new ones to be defined.

The language should be safe by default but it should still allow a programmer to do something unsafe provided that they know what they are doing. It should not be possible to do something unsafe unintentionally.

MyLang will not be designed around a particular paradigm, but will support as many paradigm as possible, and quite possibly allow other ones to be invented.

1.3 On Language Design

MyLang will incorporated as many useful features as possible. Features will not be rejected because:

It has the potential of being abused (for example goto)
It can be useful, but a good programmer will always find another way

Features may be rejected if:

It can easily be misused
It can easily lead to unreadable code

Features will most likely be rejected if:

It can easily do something unexpected
It is too difficult to implement

MyLang will take features of other languages and generalize and simplify. It won't just provide that feature, but rather provide the framework for implementing the feature. For example:

operator overloaded: what about defining new operators?
what is const? it is a modifier to a type that restricts how it can be used why not allow new modifiers?
virtual functions nice, why not simply provide access to the raw tools and let the user define how they are used. For example, why can't one store constants specific to a type in the vtable?

Take a HUGE step back and reexamine what programmings are really looking for.....

1.4 On Safety

MyLang should be safe by default but it should still allow a programmer to do something that has the potential of being unsafe provided that they know what they are doing and are sure it is safe. It should not be possible to do something unsafe unintentionally. For example all array's should have bound checks by default but a programmer should be able to disable the checks. With garbage collection it should not be necessary to manually free memory, but a programmer should still be allowed to do so. Code that does unsafe things will be labeled as ``unsafe''. However, a programmer, after carefully reviewing the code in question, should be able to declare it as safe. Thus an individual component can do unsafe things but still be considered safe to use.

However, the language should NOT be designed so that it is necessary to do unsafe things, unless there is no way around it. One example of this is when interfacing with low level hardware. Other than when it is strictly necessary, the main reason to do unsafe things is for performance reasons. For example, bound checking can be a serious bottle neck in an inner loop. In the simple case the compiler may be able to eliminate the checks, but there will always be cases when the programmer knows it is safe but the compiler can not prove it because the programmer has more domain knowledge than the compiler.

Very few languages have attempted this middle ground. C# does to some extent with C style pointers, but it is not very well developed.

1.5 Overview

MyLang will be designed around Compile Time Functions (CTF). This will keep the core language (ie what the compiler has to be able to handle) as simple as possible. CTF will be used to define almost all of the user visible language constructs. The use of CTF will allow the users to design they own language constructs.

Due to the minimal nature of the core language when I talk about MyLang I will be referring to features of the Core Language, those provided by CTF, and standard libraries.

MyLang will have a simple C like syntax and an advanced type system. Type inference will be used on local variables. MyLang will be statically typed by default but dynamic typing will be available when desired. Garbage collection will also be available but it does not have to be used as many simple programs simply don't need it.

1.6 Key Design Points

Other things which will be kept in mind when designing MyLang include:

Almost all programs written in MyLang can either be compiled are interpreted. However, the language will be designed to allow a compiled program to run as fast, if not faster, than the equivalent C or C++ program. Will very likely provide a byte code representation, but will be based on a efficient register transfer language rather than a stack based machine.
The programmer can have precise control of what is going on when needed, to at least the same level that C and C++ give. But this control will only be needed when it is requested. For example structure members will be allowed to be reordered so the programmer does not have to think about the best way to pack the data, however it will also be possible to control the layout of the structure when needed.
Aim to be ABI compatible with C and C++ if at all possible.
Should incorporate all the core features of the at least the following languages: C, C++, Java, Fortune and maybe Ada, Objective C, Basic.
Via trivial syntactic changes the new language should be able to compile all of the languages listed above (however ABI compatibility is another story)
Should also incorporate features from other languages when practical including but not limited to: Lisp, Perl, ML, Haskell, Matamatica, Smalltalk, Ruby, Eiffel.

1.7 Paper Organization

The rest of this paper will detail various aspects of MyLang. MyLang is a work in progress and this paper is by no means meant to be a complete specification of the language. It will focus on key points which I think are important. Some sections simply mention key features that I would like to see in MyLang without any additional information. The last section, section 9, is a disorganized list of notes that don't fit anywhere else.

2 Basic Language

2.1 Top Level Grammar

The top level grammar is extremely simple

: <start> := <expl> <expl> := <exp> [SEP [<exp>]] <exp> := (TOKEN|<group>)* <group> := O <expl> C (O and C pair must match) SEP := ; O/C := ( ), [ ], { } TOKEN := anything else

and thats it. All other language constructs are defined as specialized expressions.

But just the grammar alone is rather suggestive. For one thing, everything is an expression. And separate expressions are separated by a ';'. It also suggests that expressions are groups by an O/C pair and not a begin and end clause. Putting these two ideas together suggests a syntax such as

: if ( a == b ) { do something; } else { do something else; }; c = b; while ( c != d ) { do some stuff; };

Notice the ';', after each block. At first glance the ';' seams unnecessary. But without it the parser given the grammar above will not now if the c = b after the } is part of the if expression or not. From context it is obvious it is not, however with out knowing the meaning of if it is not so obvious. However, the idea was to avoid having to know any context in order to be able to separate expressions. This will allow a large degree of freedom in how an expression can be defined. In fact it will allow the user to define there own expressions.

2.2 Expressions

The core language will be as simple as possible. Everything else is defined as specialized expressions. More over, built in expressions will generally not be used by the end user. Instead macros will transform ``standard expressions'', provided by the default library, into builtin ones.

Examples of builtin expressions

if,then.else, maybe defined such as ``$IF (condition, if true, otherwise)''
infinite loop (control flow with if and break statements)
goto
functions

Since these constructs will not be used directly trivial macros will transform thinks like ``if (...) {...} else {...}'' into the builtin construct. More advanced expressions will be defined via macros such as

OR, AND, NOT
non infinite loops

2.3 Compile Time Functions

Compile Time Functions (CTF) will be an integral part of the language. In short CTF are functions which are executed at compile time rather than run time. They will be very similar to lisp macros except they are even more powerful, and not always textural expansions. CTF will generally be refereed to as macros throughout this text even though they are not always expansions. For more info on CTF see section 5.1.

2.4 Standard Language

There are no predefined statements in my new language meant for the end user. Instead a set of standard statements and expressions will be provided in the default library. In order for a program to be considered MyLang it must use the default libraries.

By default everything is case insensitive unlike most other languages. However, certain identifies can me made case sensitive when it is desirable to use case to distinguish between two identifiers.

2.4.1 Basic Constructs

The standard control flow statements will be provided such as if/then/else, while, switch and the syntax will be very similar to those of C++ except for an extra semicolon at the end (see 2.1 for why this is necessary).

Variables are prefixed with ``var'', functions with ``fun'', and truly const variables with ``const''. Types now come after variables. Examples conversion from C++ to MyLang syntax.

: int x => var x : int struct S {int x; int y} => type S {x : int; y : int} typedef S Ss => type Ss = S int f(int x) => fun f (x : int) : int

A const is like a variable except that it is a more of a binding than a variable. Its value can not be changed. It is similar to a const variable in in C++ except that its address can not be taken. Read only variables can also defined which are more like const variables used in C++. The syntax for a const is the same for var except that const is used:

: const x : int

Functions and consts can be defined in any order and can not be redefined. Furthermore a const can only be defined from other const or ``pure'' functions. That is function whose output only depends on the input and do not modify any non non local memory.

Types will use ML style syntax:

: list<int> => int list list<list<int > > => int list list list<pair<int,int> > => (int, int) pair list

possible syntax for new type:

: 'type' name [ '(' <parms> ')' ] ( '{' ??? '}' | '=' <type> ';'

Like ML and most other functional languages MyLang with have a tuple type. Tuple are a special type of struct whose members are numbers

: (:int, :int) => type {: int; :int}

There will be two types of pointers, ones which only point to an object, and ones in which pointer arithmetic will be allowed. The syntax will be something like:

: int * x -> var x : int ptr

No "->". The dot ('.') operator is always used to dereference objects. It does not matter if its a pointer or the actual object.

Arrays will also be provided however the array subscript operator can also be multi dimensional

: [1, 3] => [(1,3)]

No comma operator, instead use {}. Unlike C++ blocks can be treated as expressions. The last expression evaluated in the block is the return value:

: (x = 20, y = 30, x * y) => {x = 20; y = 30; x * y}

Goto still allows. Labels, however are local to the inner most block, like variables, as oppose to C++ where they are local to a function.

Labels can also be used for blocks so that they can be used to break out of multiple loops at once by breaking to the label.

A batter switch syntax will be provided. Will at least allow ranges and avoid the need for break.

Possible provide Perl style ifs ``x = 20 if ....''.

2.4.2 Possible Operators

2.4.2.1 Special Operators:

: = assignment . member access

2.4.2.2 Arithmetic Operators:

: + - / * %(mod) (exp)

int / int" returns a rational, not an integer. It can be truncated to an integer however. This way 1/2 will work as expected.

2.4.2.3 Comparison:

: == /= < > <= >=

2.4.2.4 Boolean:

: (not) & | ->(implies)

Are short cut operators with Perl like semantics.

2.4.2.5 Bitwise:

: .(not) .&(and) .|(or) .(xor) .<(left shift) .>(right shift)

2.4.3 Assignment

A very common program mistake is using = instead of ==. We think of the two as being the same but in fact there are two very different operators. Some languages use := for assignment, however assignment is used more often than comparison is in most programs so it makes sense for the comparison operator to be changed rather than the assignment operator. It is also possible to only allow assignment to appear in certain places and comparison in other, this way the same operator can be used. But this can limit the expressiveness of the language. One solution I thought of is to only allow assignment at the beginning of a statement. This like ``while ((x = next()) != 0) ...'' can become ``while ({x = next(); x != 0}) ...'' (as {} are now treated as expressions). However that begs the question, what is a statement and what makes a statement different from an expression. Another solution is to adjust the return type of the assignment operator to be void so ``if (x = 5) ...'' is not valid, but that will also prevent ``x = y = 5'' which is sometimes useful. There is no easy answer and I am not sure how I will handle it.

3 Type System

MyLang will have a powerful, flexible, and precise type system.

3.1 Everything is an Object

In MyLang everything is an object with a specific type. Variables are special objects that can be assigned to, Function are object that can be called, etc. Objects can have sub-objects which are generally accessed via the dot operator but not always as the dot operator for an object can be defined to do anything.

3.2 Type Inference

Types for local variables generally will not need to be specified, instead the type is implied using simple and easy to understand inference rules like ML.

: var i = 10 + 20; // i is an int; var x = 10.20; // x is a double var y = 10; y += x; // y a double; var foo = new Foo; // x a foo

3.3 Basic Types

3.3.1 Integers

There will be one int type. However the range, size, and overflow behaviour can be modified to provide other integer types.

I am considering making the default range for an int to be only 24 bits (ie [224+1, 224-1]) to leave 8 bits for the compile to use for whatever. Larger values will be undefined. I am not sure if this is necessary or even a good idea. Unsigned types may also have a limited range of the possible positive values for an integer of that size if it is an int, this way comparisons between an unsigned and a signed are always safe.

A basic integer can be modified by the use of type attributes. These attributes are: size (in bits), unsigned|signed, range, and overflow mode which is one of undefined modular (wrap around), or saturated.

Typedefs will be provided such as

byte = 8 bytes, unsigned, modular

short 16 bytes, signed, modular

u8 = byte

i8 = 8 bytes, signed, modular

3.3.2 Characters

Characters are not integers but enumeration.

3.3.3 Strings

Strings are an array of characters but will be much more powerful than C strings and an integral part of the language.

3.3.4 Raw Memory

A raw memory type is designed for dealing with blocks of raw memory. Something like ``void *'' with the extension of allowing pointer arithmetic. This type has special type conversion rules similar to ``void *'' but acts more like a ``unsigned char *''.

Two different types can not alias each other unless they both are also aliasing a raw memory type.

3.4 Type Modifiers

Types in MyLang can be modified in several ways.

Types can be restricted on how they can be used. For example the const modifier can be used to make a type read only. For integers values the range can be restricted. A less restricted type can be implicitly converted to a more restricted type but not vise versa. For example a non-const object can be converted to a const object but not vise versa, and an int with a range of [1,10] can be converted to a int with a range of [1,200] but not vise versa.

The behaviour of a type can also be modified. For example for integers the overflow mode can be changed from undefined to either wrap-around or saturated.

3.5 Class are just Objects

Since everything is an object there is nothing special about a class are a struct, they are just objects with specific sub objects.

3.6 Fancy Enumerations

MyLang will provide enumeration type which is much more powerful than those provided in C or C++:

support reverse lookup. Give string return enum value.
enum constants have attribute which will return string
can specify string value, doesn't have to be the same as constant
the enum type has a min and max method.
have a complete ordering, thus it is possible to iterate over all of them
can also support other attributes other than just string
can inherit each other
an enum type can optionally be allowed to be extended to support more values than the ones initially specified
have a numeric value associated with it, but this value in practice will rarely need to be used

The implementation aspects of a enum can also be customized:

can specify the underlying type for an enum, which does not necessary have to be an integer
can specify how the reverse lookup is performed

Extended enum type will also be provided. Much like data types found in functional programming languages.

: type Maybe = Nothing | Some 'a

For an extended enum the user can specify how an extended enum is packed by providing functions to pack and unpack the structure (which including recognizing which enum it is). This can be useful if the layout needs to match some ABI. If layout functions for an extended enum is not provided the compiler will generate them using macros.

Basic pattern matching on extended enum types may also be possible.

3.7 Inner Objects

Each sub-object of an object can either be static, inner, or static inner (with the default being static inner). An inner objects knows about its parent. An inner objects maintains a pointer back to its parent. Much like Java inner classes.

A static "inner" classes does not maintain a pointer back to its parent, instead it is provided automatically. It is an error to call a static inner class method without providing the compiler a way to figure out the outer class in the expression. For example:

: type A = { var x : int; type B = { var y : int; fun ff : void; }; var b : B; fun f : void; }; Legal: var a : A; a.b.ff(); fun A.f() = {b.ff();}; var a : A; var b : B ptr wide; b = a.b; b.ff(); Illegal: var a : A; var b : B ptr; b = a.b; b.ff();

It is possible to precisely control how an object or sub-object behaves. For example an object may act like a variable but not not have any storage associated with it. Functions are used to provide an actual value for the function or to allow it to be assigned to.

3.8 Opening Types

Any type can be opened which means that all of its sub-objects are directly accessible. For example if the object ``X'' is opened than instead of using ``X.foo'' you can just use ``foo''. Inside class members C++ ``opens'' the class so that the class members can be accessed with out using this. It is the same basic idea but a lot more flexible. Given two objects X and Y, if Y is a sub-object of X than X can open Y so that Y sub-objects can be directly accesses from X. For example instead of using ``X.Y.foo'' you can just use ``X.foo''. When a class in inherited in C++ the members of the parent class are ``opened'' so that they can be accessed directly via the child class.

3.9 Constructor / Destructor

For any object a:

default constructor
copy constructor
assignment
destructor

can be defined which are unlike there counterparts in C++ as there are not code blocks but rather precise instructions how to make or destroy a copy. Compiler can optimize away any of them. May also do tricks to avoid making unnecessary copy. For example

: var huge_object; ... ... huge_object = f(...);

might end up calling huge_object destructor, let f write directly to huge_object there for making it as efficient as if huge_object was passed by reference. Thus these special methods may be called at unpredictable places. Therefor they should only do what there are designated to do and not other weird things.

Other examples of things the compiler is allowed to do:

If the compiler can determine that a variable was never written to than a copy constructor may be used in an assignment instead of the assignment method. Or maybe only the default constructor as in the case of huge_object above.
If a compiler can determine a object is never used past a certain point the object may go out of scope early.

However you ARE allowed to specify how an object may be manipulated:

can only be constructed explicitly
can not be copied
can only be copied at time of construction

Other methods can be used when behavior similar to C++ behavior is needed.

3.10 Type Conversion

It should be able to to precisely define how types can be converted. For example saying an int can convert to a double, this new double may be converted again. C++ does not allow multiple conversions. These rules should be specified in ``src -> dest'' form and not in the form of type conversion operators or single parameter constructors like in C++. Although certain constructors may implicitly add type conversion rules.

3.11 Common Framework for RTTI

Extended Enum (see 3.6), virtual functions, boxed types, etc. are really all the same concept, "run time type identifications". They should all be merged into a unified concept with only syntactic sugar separating them. Basically they specify what overloaded function to use. RTTI is very similar to providing function pointers (or more powerful closers) for each operation performed on the type so that should be tied into the same framework as well. Some of the elements of this common frame work include:

Be to control the layout of classes even when virtual functions are involved. User may provide the following functions for the base class

type Base = { register_vtable(VTable) get_vtable() => VTable get_typeid() => typeid<Base> }; fun eq(typeid<Base>, typeid<Base>) : bool fun lt(typeid<Base>, typeid<Base>) : bool (partial ordering on base class relation)

Functions are only used when needed so that all of them do not necessarily have to be defined. It is compile time error to compare typeids of different base classes.
Be able to manipulate the virtual table. Such as accessing the function pointers or assigning a virtual method to a function pointer.
Be able to specify boolean constraints on what methods need to be implemented. For example:

append(Item); append(Item array, size);

both can be implemented in terms of each other but one or the other must be implemented. With C++, if both are implemented in terms of the other, than this will cause infinite recursion. But it can be checked at compile time so it should the constraints should be something like:

append(Item) | append(Item array, size)

3.12 Overloading Constants

Constants can be overloaded based on type for example

: (PI * float) => float (PI * double) => double

Which also allows

: type Bool = True | False; type Bool2 = True | False | Undef;

3.13 Low Level Objects

As stated previously everything is an object. To the compiler there is no real distinction between basic types such as integers and aggregated types such as arrays, and structs, or more high level types such as classes.

Every object type has the following members:

name: The fully qualified name of the object
size: The size of the object in bytes
align: The required alignment of the object
data: The sub-objects of the object. It is an array of the tuple (name, type, offset). Multiple sub-objects can have the same offset which effectively allows the C union type to be created
allocatable: A boolean variable which is true if the object can be allocated by the user. This does not prevent this object from being a sub-object of another object.
static_data: A pointer to static data that is common for all objects of the same type.
dot: A very powerful macro which gives objects most of there power. Called when every the dot operator is used. It prototype will be something like ``macro(obj_info, obj, id)'' where id is the identifier after the dot for example when for ``s.bla'' id would be ``bla''.
destroy: A macro that is called when ever the object is destroyed. The destroy method of the sub-objects are not automatically called.

Low level objects are created via special macros and are copied via copying conversion macros.

Sub-objects can be accessed via the builtin ``element(info, obj, id/num)'' and info on a sub object can be accessed via ``element_info(info, id/num)''.

All higher level structures are created from the basic low level objects via macros. MyLang will not provide native support for anything but the low level object, including inheritance as there is no need to. For example a C structure can be created something like:

: var s : int, d : (name,type) array, o = new_raw_object(); for i = 0 to d.size { s = round_align(d[i].type.align); o.data[i].name = d[i].name; o.data[i].type = d[i].type; o.data[i].offset = s; s += d[i].type.size; } o.dot = macro(obj_info, obj, id) {element(obj_info, obj, id)};

With a little more code more advance types can be created.

The dot operator allows unlimited freedom in how the members can be accessed. There is no reason that the dot operator can only be used to access sub-objects. For example to implement simple non-virtual inheritance are anonymous structures is necessary to also access a sub-object sub-object via the dot operator:

: macro(obj_info, obj, id) { element(obj_info, obj, id) | element(inherit_obj_info, element(obj_info, obj, inherit), id) }

The dot operator can also be used for adding methods to an object by returning a function instead of a sub-object, among other things.

Since it is possible to get information on an objects sub-objects it is possible to create generic code that performs an operation on all of its sub-objects such as printing them. Something that is imposable to do in C++.

4 Functions

4.1 Syntax

The basic syntax for a function will be something like:

: fun name(var, var, var, var ...) [=>] (var, ... [; var, ...])] [ : return_type] [=] { code }

The second tuple is the return type the ones before the ';' can not be ignored the ones after it can. The variables are named so that they can be refereed to in the function. Returned objects are not copied. They are directly allocated on the stack of the calling function, which makes it okay to return huge values.

4.2 Parameters

Parameters to functions or passed by:

Value (either copy variable or pass by truly const reference, which is determined in a platform specific manner)
Reference
Const Reference
Copy

Passing by value frees the programmer from having to worry if it a particular parameter should be passed by copying or by const reference since the compiler will make that decision.

4.3 Pre/Post conditions and Attributes

Functions can have pre and post conditions which can be checked at run time. A compiler can also use these conditions to optimize better.

Attributes are a special form of Pre/Post conditions. For example a "pure" function is one that only takes only parameters by value, modifies no external state, and returns a value. The output should only depend on the input or global variables. If global variables are used the function is annotated by which global variables it uses. A pure function can only call other "pure" functions, this will be enforced by the compiler type checking system.

4.4 Implicit Parameters

: fun f0(implicit x, y) fun f1(implicit x, y) { f0(y); } fun f2() { implicit var x; f0(20); }

4.5 Type Safe Variable Length Parameters

: fun f(str : string, ... {ints : int vector; fun add(i) {ints.push(i);}}) { foreach (ints) ... }

4.6 Named Parameters

: fun f(x, y = 10, z = 100) f(x = 20, z = 30);

4.7 Option Parameters

: fun f(x, option a, option b) f(10, :a) // enable a

4.8 Overloading

Like C++ functions can be overloaded based on there parameters, but unlike C++ they can also be overloaded based on the return type. To avoid extreme confusion functions that are overloaded by return type should essentially do the same thing, but perhaps just return the result in a slightly different way.

When an exact match is not found for any given function than the compiler may do one of three things

Convert the parameters or return type
Generate a new function based on a template
dynamic??

4.9 Nested Functions

Two types of nested functions:

Nested
Closures

Nested functions will behave like gcc nested functions and may even be implemented as such. A nested function reference will become invalid once it goes out of scope.

Closures will make a copy of any variables needed from the local environment and will not go out of scope.

: closure name(var, ... ; var, var) = ...

the ones before the ';' are variables from the local environment a copy will be made when the closure is created. The ones after the ';' are the normal function parameters. Functionally a closure is equivalent to (in C++)

: struct name { var1; var2; ... name(var1_, var2_ ...) // constructor : var1(var1_), var2(var2_) ... {} operator() (var, ...) // normal function }; fun ... {var1, var2; return name(var1, var2);}

but a lot more convenient.

A closure may be optimized as a nested function if the compiler can be sure that a reference to the function will not be used when the closure goes out of scope.

Closures can also be created by partially calling a function. For example: ``return fun(x,,y)'' will return a closure which takes a single parameter.

4.10 Misc Notes

It should be possible to be able to prevent certain functions from calling other functions:

: block A { f1; f2; f3; } block B { f4; f5; f6; }

B should not be allowed to call A but A should be allowed to call B, and other blocks can only call A.

4.11 Optimization

4.11.1 Specialization

Be able to provide multiple versions of the same function. With certain parameters one version will be called, with other a different version. This will allow specialization to optimize a function for a common set of parameters. Perhaps the compiler can decide when to do this if it can determine that it will be beneficial.

4.11.2 Fast Function Calls

Allow the calling conventions of functions to be changed - when it is not an external one - in order to be able to make function calls cheap. For example if only one function calls another one but does so multiple times the calling convention of that function could be changed to avoid to avoid unnecessary overhead, such as pushing parameters on the stack or shuffling registers around.

5 Compile Time Expressions

5.1 Compile Time Functions

Compile time functions (CTF) are functions that are executed at compile time rather than run time. There are similar to preprocessor macros (ie ``#define'' in C and C++) but are a lot more powerful and less error prone. Unlike preprocessor macros CTF are more than simple expansions. CTF are written in MyLang them self and thus have the complete language at there disposal rather than a limited set of operators as preprocessor macros do. Unlike preprocessor macros, CTF also obey namespace rules so they can be defined locally without polluting the global namespace.

CTF functions will be triggered based on pattern matching. This will allow new language constructs to be defined with CTF.

5.1.1 Types

5.1.1.1 Basic Macros

The basic MyLang macro will ones that expand to a list of tokens or a string to be used in place of the function. However, unlike C macros these will be written in MyLang. As with all CTF they will obey namespace rules. In addition they will not be able to expand to anything. They must evaluate to a valid grammatical element or list of.

For example:

: "a * b" OK "* b" NOT "var x; var y;" OK "var x; var" NOT "var" MOST LIKELY NOT "a = 10; b = 20" OK "=10; b=" NOT "} else {" NOT

These macro will also have access to special functions in to create new structures and the like. See section 3.13 (Low Level Objects) for an example of these special functions.

This type of CTF essential offers the same power that Lisp macros do. However since MyLang will have syntactic closures (as in Scheme) the problem of unintentional variable capture will be avoided.

5.1.1.2 Simple Expansions

Another type of macro are those that simply expand into another set of tokens or a string. Like the basic macros these can only expand into valid grammatical elements. An expansion can be recursive thus avoiding the need of any sort of loops. On the surface these macros are similar to preprocessor macro but since they obey namespace rules and can only expand into valid grammatical element they are a lot safer. They are also more elegant as it avoids the need to have to explicitly generate code.

Unfortunately they are not as powerful as the basic macros, and thus do not avoid the need for them. Perhaps basic macro and pattern expansion macros can some how be combined into one. For example a macro can by default be a pattern expansion but also allow a special syntax to be used when code is needed.

5.1.1.3 Compile Time Functions

A more advance type of macro that MyLang will support will directly manipulate the compile time environment via API calls. For complex tasks this type of CTF may be cleaner and less error prone. However supporting them means that a stable API must be developed.

For example CTF that manipulates the created a new object type might look something like:

: ctfun new_object_type(...) { var obj = new_obj_type(); ... add_object_type(obj); };

And one that prints a structure out might look like:

: ctfun print(obj) { foreach sobj (sub_objects(obj)) { add_code(print(sobj)); }; };

5.1.1.4 Pseudo Compile Time Functions

A even more advances tyep of macro will be ones that manipulate the compile time environment and are executed as if they were executed at run time. These CTF can contain code that depends on both the compile-time and run-time environment. This type of CTF will be the most natural to write because the user does not have to worry about the separation of the compile-time and run-time environment.

For example one that prints a structure out might look like (notice the lack of the add_code function)

: ctfun print(obj) { foreach sobj (sub_objects(obj)) { print(sobj); }; };

Unfortunately these will be the most difficult to implement, in particular because MyLang is meant to be a compiled and not interpreted. If implemented there will have to be some restrictions on what these functions can do. For example it will be difficult to support expressions that depend on both the run-time and compile-time state. If the expressions also modify the compile-time state than they will be virtually imposable to support.

5.1.2 When used as like a function

CTF can also become ``ordinary'' functions. For example consider the macro ``OR(x,y)'' which will return true if x or y is true but will only evaluate y if x is false. When called directly the macro will be used, but when it needs to be treated as a ordinary function the function ``OR_f(x,y) {OR(x,y)}'' will be used. Naturally this new function loses it special ability to avoid evaluating y, but it can now be passed to functions expecting another function as a parameter. Of course, this trick will not work for all macros. If such a macro is attempted to be used as a function the compiler will throw an error.

5.1.3 Paramaters

Since CTF are executed at compile time rather that run time the parameters they take will be slightly different. In particular the type of an expression will generally not be known unless it is a constant. Thus, most expressions will be passed in as strings. The result of an expression is not known by the function. However it is also possible to pass in compile time constants that the CTF function can used, the most common type will be an integer constant, however other types are possible. Also CTF may also take in compile time objects so that they can get information and manipulate the compile time environment. Thus, the types of parameters a CTF can take will be something like:

Unevaluated Expression
Evaluated Expression
Number
String
Identifier
Compile Time Object

5.2 Other Compile Time Expressions

MyLang will also have general support for compile time expressions which are not necessarily functions. Preprocessor ``#if'' are a good example of this. Of course they will be written in MyLang, but will have special syntax to indicate that the expression is to be eveluated at compile time rather than execution time. If I can figure out how to implement pseudo-CTF than this distionction might not even be necessary.

6 Exceptions

Exceptions are very useful, and will definitely be included but they have problems. MyLang if possible will try to avoid these problems.

One major problems with exceptions is that they can sometimes mask errors if an unexpected exception is thrown by some function and is passed through to the caller which is not expecting it, so it also passes it down, even though it is not suppose to throw that exceptions (but not specified explicitly as generally the case with C++ where the specifications where an afterthought). It passes down until some function which is expected to "itself" throw that exception, but not expecting it from any functions it calls. It than gets handled, when it shouldn't.

Having to always handle all exceptions is very annoying especially when something unexpected happens (like maybe its working directory was deleted) and the best course of action is to abort as there is nothing useful it can do. (Well, maybe it can attempt to quit cleanly, but I account for that by having a special class of exceptions for the truly unexpected).

My Lang will likely have two classes of exceptions, Exceptions, Errors.

Functions much specify what exceptions they will throw. If a functions attempts to throw an exception not specified than the exception will turn into an error. Errors can be handled via the try, throw block OR via a registered functions, much like signal handlers. In fact POSIX signals will generate exceptions which if not caught will turn into "errors". (Name not the best, how about unexpected).

This is a compromise between Java which must handle all exceptions, and C++ where exception specifications were an afterthought.

Certain types of Exceptions can also return and/or can be thrown asynchronously. This will allow exceptions to be able to handle signals elegantly.

Exceptions can also turn be turned off for a region of code. If any function throws an uncaught exceptions it will either cause an abort or be postponed depending on the nature of the exception.

If possible, exceptions will be implemented so that the exception object is NEVER copied. Exceptions can also refer to local objects on the stack provided used local objects are marked so that the compiler knows not to call the destructor or overwrite them. This will allow exceptions to be implemented extremely effectively. In fact they may even be more efficient than returning an error code.

7 Header Files

No user written header files. That job is up to the compiler.

Very precise dependences which will greatly cut back on unnecessary recompiles. A dependency header file is written for each object. It will describe precisely what symbols it uses and how. For example given the struct A {int x; int y; int z;} if the object files only referees to z than the dependency info will say object file X depends on symbol z in struct A which it expects to be an int with an offset of 8. If an object file only creates a new A but never uses it than the dependencies info will say that object file X creates a new A which it expects to be of size 12 and without trivial constructors.

C++ (and to some extent C) programmers spend a good deal of effort design there interfaces to minimize recompiles. Sometimes C++ programmers will go to great length to avoid exposing the implementing so that the entire project does not need to be recompiled because of an addition of a private helper function. This is because C++ requires way to much information in the header file due to the class syntax. (See C++ FAQ Lite).

By automatically generating header files and emitting very precise dependency information the programmer will never have to worry about this anymore. Programmers can design the interface without worrying about what will go in the header files...

8 Standard Library

The standard library should at least provide

efficient const size ALLOCATOR which can take special hints to try to provide good spacial locality
generic HASH TABLE
BALANCED TREE for effect implementation of sorted structures
efficient VECTOR which provides continuous memory can grow on demand
efficient LARGE VECTOR which provides continuous memory can grow on demand *without* having to reallocate memory

Provide effect algorithms to work on

SEQUENTIAL LISTS
LINKED LISTS
RANDOM ACCESS LISTS

Such as

SORT
BINARY SEARCH

ALSO PROVIDE high level structures where the user does not need to worry how it is implemented. For example simple dictionary structure that supports lookup and insertion of objects based on a key. With a small number of elements it is implemented as a linked list or a vector, for larger sizes implemented as a hash table, for really large objects implemented as a b-tree on disk.

9 Miscellaneous Notes

9.1 Vector Types

Provide vector types which are mini arrays. They main purpose it to make it easy to take advantage of vector based instructions of the hardware. An array of X can be implicitly converted to a vector of X.

User specifies vector type are the compiler can decide the best size based on the current instruction set.

9.2 Layout Rules

May also support layout rules where { } and ; are placed implicitly, like Haskell. Layout rules WILL be context sensitive out of necessary.

9.3 Scope

Perhaps make it possible be able to force a variable in a scope lower scope.

: { int0 x; int y = 20; x = y * 20; } = int x; {int y = 20; x = y * 20;}

Compiler is allowed to rearrange the storage of the variables on the stack in order to allow for variables of a lower scope to appear anywhere is the statement. Of course there will be restriction. Might not be possible, but worth pressuring.

Also it might not be a good idea in the first place as due to readability problems.

Also allow for a special grouping syntax where ALL variables are in the lower scope. Useful is for loops etc. This is a must.

Syntax Maybe (exp; exp;...)

9.4 Swiss Army Knife Loop

Possible provide a very powerful looping construct which has 7 parts: init, preinc, pretest, body, posttest, postinc, final. Implemented something like:

: { init; _loop: preinc; if (!pretest) goto exit body; if (!posttest) goto exit _next: postinc; goto loop _exit: final; }

The question is what should the syntax for such a beast be?

9.5 Laziness

Should I implement some sort of laziness?

Lazy lists would be useful. Much like iterator concept

9.6 C++ Deficiencies to Avoid

It should not be necessary to provide both

: struct ConstRow { const byte * data; int pitch; ConstRow() {} ConstRow(byte * d, int p) : data(d), pitch(p) {} }; struct MutableRow { byte * data; int pitch; MutableRow() {} MutableRow(ConstRow o) : data(o.data), pitch(o.pitch) {} MutableRow(byte * d, int p) : data(d), pitch(p) {} };

Some how prevent this: Let O be some object that has resources that need to be freed;

: struct A { O a; } struct B : public A { O b; } void f(A * a) {delete a;} B b; f(a);

f(a) will call the destructor for A and NOT B. Will lead to a memory leak that is very hard to trace down unless you are already familiar with this type of mistake.

9.7 Table Oriented Programming

Include elements of Table Oriented Programming: http://www.geocities.com/tablizer/

Kevin Atkinson 2004-11-22