In the end the whole notion of goodness and badness will be covered by
only six words -- in reality, only one word. Don't you see the beauty
of that, Winston?
Nineteen eighty-four, George Orwell.
Newspeak is a
simplified programming language, well-suited for the purpose of static
analysis.
C2Newspeak compiles
C programs into Newspeak. C2Newspeak is distributed under the LGPL.
Distribution
C2Newspeak v. 1.3 source code is available:
newspeak-1.3.tgz.
Previous C2Newspeak versions:
newspeak-1.2.tgz,
C2Newspeak-1.1.tgz,
C2Newspeak-1.0.tgz,
C2Newspeak-0.9.tgz.
Requirements
C2Newspeak is written in
Objective Caml.
Documentation
Development version
The latest version of C2Newspeak source code
can also be retrieved from this mercurial
repository:
https://hg.penjili.org/c2newspeak-ref.
Mercurial is a
distributed source management tool, which can be found at
http://www.selenic.com/mercurial/wiki/.
Bug reports
The code can be browsed
here,
and tickets submitted
there
to report bugs, comments, missing features...
Examples
Legend
Here are a few compilation examples from C to Newspeak. In the following, the
C code will be on the left side and the corresponding Newspeak code on the
right side:
Types
Integer types are normalized according to their size and sign. Their size,
which is architecture dependent, is made explicit.
int i1;
unsigned int i2;
char i3;
unsigned char i4;
int4 i1;
uint4 i2;
int1 i3;
uint1 i4;
Casts (and unions) in C allow programmers to manipulate sequences of bytes
with any type. Consequently, Newspeak distinguishes only two
types of pointers: data and function pointers.
int *p1;
unsigned int *p2;
int (*p3)[10];
struct { int x; } *p4;
int (*fp)(int);
ptr p1;
ptr p2;
ptr p3;
ptr p4;
fptr fp;
Newspeak composite data structures are arrays and regions. A region is a
sequence of bytes. Some offsets in the region are indicated to store values
of a given type. Regions can encode both C structures and unions,
while making explicit their architecture dependent parameters: namely,
fields' offsets, paddings and the overall type size.
int t[10];
struct {
int x; char y; char* z;
} s;
union {
int x; char y; char* z;
} u;
int t1[10][20];
int t2[10][20][30];
struct {
int x; struct { char z; } y;
} s1;
struct {
int x[10];
struct { char z[10]; } y[10];
} s2;
struct {
int z;
union { int x; char y; } t;
} s3;
int4[10] t;
{
int4 0; int1 4; ptr 8;
}12 s;
{
int4 0; int1 0; ptr 0;
}4 u;
int4[20][10] t1;
int4[30][20][10] t2;
{
int4 0; { int1 0; }1 4;
}8 s1;
{
int4[10] 0;
{ int1[10] 0; }10[10] 40;
}140 s2;
{
int4 0;
{ int4 0; int1 0; }4 4;
}8 s3;
Variables
Global variables are designated by their name. Variables are pushed on
a stack, and local variables are accessed by their offset from the top of the
stack.
int x;
void main() {
int y;
int z;
x = y;
x = z;
}
int4 x = 0;
main() {
int4 y;
int4 z;
x =(int4) 1-_int4;
x =(int4) 0-_int4;
}
Left values and expressions
Fields and array elements are accessed by shifting the structure or array
address by some offset.
In the case of array element access, the operator
belongs allow to check that
the index is well within bounds.
struct {
int a; int b;
} x;
int t[10];
int i;
x.b =
t[i];
{
int4 0; int4 4;
}8 x;
int4[10] t;
int4 i;
2- + 4 =(int4)
1- + (belongs[0,9] (0-_int4) * 4)_int4;
Integer operations are decomposed in an exact operation followed
by a coercion back to the result's expected range.
int x, y, z;
x = y + z;
x = y * z;
int4 x; int4 y; int4 z;
2- =(int4) coerce[-2147483648,2147483647] (1-_int4 + 0-_int4);
2- =(int4) coerce[-2147483648,2147483647] (1-_int4 * 0-_int4);
The
coerce operator is also used for
cast between integer of different size or sign.
Pointer creations are annotated by the size of the buffer
they designate, so as to allow invalid pointer operations checks.
int* x;
int t[100];
x = &t[3];
x = x + 5;
*x = 3;
ptr x;
int4[100] t;
1- =(ptr) (&_400(0-) + (3 * 4));
1- =(ptr) (1-_ptr + (5 * 4));
[1-_ptr]4 =(int4) 3;
Casts between integer and pointers are forbidden.
int* p;
int x;
x = p;
Fatal error: translate cast: Invalid cast 'int *' -> 'int ' in '(int )p'
Unless option castor is set.
int* p;
int x;
x = p;
ptr p;
int4 x;
0- =(int4) (int4) 1-_ptr;
Commands
Conditionals are translated into à la Dijkstra alternative choice commands.
int x;
if (x < 10) {
x++;
}
int4 x;
choose {
--> assert((10 > 0-_int4));
0- =(int4) coerce[-2147483648,2147483647] (0-_int4 + 1);
--> assert((0-_int4 >= 10));
}
Function return statements are replaced by jumps and labels.
int main() {
int x;
if (x < 10) {
return 1;
}
return 0;
}
main() {
int4 x;
choose {
--> assert((10 > 0-_int4));
1- =(int4) 1;
goto lbl1;
--> assert((0-_int4 >= 10));
}
1- =(int4) 0;
lbl1:
}
Loops are built with a combination of the alternative, jumps and the infinite
loop.
int x;
x = 0;
while (x < 10) {
x++;
}
int4 x;
0- =(int4) 0;
forever do {
choose {
--> assert((10 > 0-_int4));
--> assert((0-_int4 >= 10));
goto lbl2;
}
0- =(int4) coerce[-2147483648,2147483647] (0-_int4 + 1);
}
lbl2:
Function calls have no arguments and look like assembly calls. Newspeak takes
advantage of its stack to transmit parameters.
int f(int a, int b) {
return a + b;
}
void main() {
int x, y, z;
z = f(x, y);
}
f() {
2- =(int4) coerce[-2147483648,2147483647]
(1-_int4 + 0-_int4);
}
main() {
int4 x; int4 y; int4 z;
int4 value_of_f;
{
int4 a;
int4 b;
1- =(int4) 5-_int4;
0- =(int4) 4-_int4;
f();
}
1- =(int4) 0-_int4;
}
There is much more! Feel free to experiment and let us know your thoughts.