Newspeak


In the end the whole notion of goodness and badness will be covered by only six words -- in reality, only one word. Don't you see the beauty of that, Winston?
Nineteen eighty-four, George Orwell.



Newspeak is a simplified programming language, well-suited for the purpose of static analysis.
C2Newspeak compiles C programs into Newspeak. C2Newspeak is distributed under the LGPL.

Distribution

C2Newspeak v. 1.3 source code is available: newspeak-1.3.tgz.
Previous C2Newspeak versions: newspeak-1.2.tgz, C2Newspeak-1.1.tgz, C2Newspeak-1.0.tgz, C2Newspeak-0.9.tgz.

Requirements

C2Newspeak is written in Objective Caml.

Documentation

Development version

The latest version of C2Newspeak source code can also be retrieved from this mercurial repository: https://hg.penjili.org/c2newspeak-ref. Mercurial is a distributed source management tool, which can be found at http://www.selenic.com/mercurial/wiki/.

Bug reports

The code can be browsed here, and tickets submitted there to report bugs, comments, missing features...

Examples

Legend

Here are a few compilation examples from C to Newspeak. In the following, the C code will be on the left side and the corresponding Newspeak code on the right side:
C code
Newspeak code

Types

Integer types are normalized according to their size and sign. Their size, which is architecture dependent, is made explicit.
int i1;
unsigned int i2;
char i3;
unsigned char i4;
int4 i1;
uint4 i2;
int1 i3;
uint1 i4;

Casts (and unions) in C allow programmers to manipulate sequences of bytes with any type. Consequently, Newspeak distinguishes only two types of pointers: data and function pointers.
int *p1;
unsigned int *p2;
int (*p3)[10];
struct { int x; } *p4;
int (*fp)(int);
ptr p1;
ptr p2;
ptr p3;
ptr p4;
fptr fp;

Newspeak composite data structures are arrays and regions. A region is a sequence of bytes. Some offsets in the region are indicated to store values of a given type. Regions can encode both C structures and unions, while making explicit their architecture dependent parameters: namely, fields' offsets, paddings and the overall type size.
int t[10];
struct {
int x; char y; char* z;
} s;
union {
int x; char y; char* z;
} u;

int t1[10][20];
int t2[10][20][30];

struct {
int x; struct { char z; } y;
} s1;
struct {
int x[10];
struct { char z[10]; } y[10];
} s2;
struct {
int z;
union { int x; char y; } t;
} s3;
int4[10] t;
{
int4 0; int1 4; ptr 8;
}12 s;
{
int4 0; int1 0; ptr 0;
}4 u;

int4[20][10] t1;
int4[30][20][10] t2;

{
int4 0; { int1 0; }1 4;
}8 s1;
{
int4[10] 0;
{ int1[10] 0; }10[10] 40;
}140 s2;
{
int4 0;
{ int4 0; int1 0; }4 4;
}8 s3;

Variables

Global variables are designated by their name. Variables are pushed on a stack, and local variables are accessed by their offset from the top of the stack.
int x;
void main() {
int y;
int z;
x = y;
x = z;
}
int4 x = 0;
main() {
int4 y;
int4 z;
x =(int4) 1-_int4;
x =(int4) 0-_int4;
}

Left values and expressions

Fields and array elements are accessed by shifting the structure or array address by some offset. In the case of array element access, the operator belongs allow to check that the index is well within bounds.
struct {
int a; int b;
} x;
int t[10];
int i;

x.b =
t[i];
{
int4 0; int4 4;
}8 x;
int4[10] t;
int4 i;

2- + 4 =(int4)
1- + (belongs[0,9] (0-_int4) * 4)_int4;
Integer operations are decomposed in an exact operation followed by a coercion back to the result's expected range.
int x, y, z;

x = y + z;
x = y * z;
int4 x; int4 y; int4 z;

2- =(int4) coerce[-2147483648,2147483647] (1-_int4 + 0-_int4);
2- =(int4) coerce[-2147483648,2147483647] (1-_int4 * 0-_int4);
The coerce operator is also used for cast between integer of different size or sign.
Pointer creations are annotated by the size of the buffer they designate, so as to allow invalid pointer operations checks.
int* x;
int t[100];

x = &t[3];
x = x + 5;
*x = 3;
ptr x;
int4[100] t;

1- =(ptr) (&_400(0-) + (3 * 4));
1- =(ptr) (1-_ptr + (5 * 4));
[1-_ptr]4 =(int4) 3;
Casts between integer and pointers are forbidden.
int* p;
int x;
x = p;
Fatal error: translate cast: Invalid cast 'int *' -> 'int ' in '(int )p'
Unless option castor is set.
int* p;
int x;
x = p;
ptr p;
int4 x;
0- =(int4) (int4) 1-_ptr;

Commands

Conditionals are translated into à la Dijkstra alternative choice commands.
int x;
if (x < 10) {
x++;
}
int4 x;
choose {
--> assert((10 > 0-_int4));
0- =(int4) coerce[-2147483648,2147483647] (0-_int4 + 1);
--> assert((0-_int4 >= 10));
}
Function return statements are replaced by jumps and labels.
int main() {
int x;
if (x < 10) {


return 1;
}


return 0;
}
main() {
int4 x;
choose {
--> assert((10 > 0-_int4));
1- =(int4) 1;
goto lbl1;
--> assert((0-_int4 >= 10));
}
1- =(int4) 0;
lbl1:
}
Loops are built with a combination of the alternative, jumps and the infinite loop.
int x;
x = 0;
while (x < 10) {
x++;
}
int4 x;
0- =(int4) 0;
forever do {
choose {
--> assert((10 > 0-_int4));
--> assert((0-_int4 >= 10));
goto lbl2;
}
0- =(int4) coerce[-2147483648,2147483647] (0-_int4 + 1);
}
lbl2:
Function calls have no arguments and look like assembly calls. Newspeak takes advantage of its stack to transmit parameters.
int f(int a, int b) {
return a + b;
}

void main() {
int x, y, z;
z = f(x, y);
}
f() {
2- =(int4) coerce[-2147483648,2147483647]
(1-_int4 + 0-_int4);
}

main() {
int4 x; int4 y; int4 z;
int4 value_of_f;
{
int4 a;
int4 b;
1- =(int4) 5-_int4;
0- =(int4) 4-_int4;
f();
}
1- =(int4) 0-_int4;
}
There is much more! Feel free to experiment and let us know your thoughts.