27 KiB
sx language specification
1. Lexical Structure
Comments
Line comments start with // and extend to end of line.
// this is a comment
Identifiers
- Lowercase or mixed-case for variables, functions:
x,compute,main - UPPER_SNAKE_CASE for constants:
SOME_INT,SOME_STR - PascalCase for types:
Foo
Literals
| Kind | Examples | Type |
|---|---|---|
| Integer | 0, 42, 0xFF, 0b1010 |
s32 |
| Float | 0.3, 0.9 |
f32 |
| String | "Hello", "z: {z}" |
string |
| Boolean | true, false |
bool |
| Enum | .variant1 |
inferred from context |
| Undefined | --- |
context-dependent |
Keywords
if, else, then, while, break, continue, true, false, enum, struct, union, case, return, defer, xx, and, or
Operators
| Operator | Meaning |
|---|---|
+ |
addition |
- |
subtraction / negation |
* |
multiplication |
/ |
division |
== |
equality |
!= |
inequality |
< |
less than |
> |
greater than |
<= |
less or equal |
>= |
greater or equal |
and |
logical AND (short-circuit) |
or |
logical OR (short-circuit) |
+= |
add-assign |
-= |
sub-assign |
*= |
mul-assign |
/= |
div-assign |
Delimiters and Punctuation
| Token | Meaning |
|---|---|
:: |
constant binding / definition |
:= |
variable binding (mutable, inferred) |
: |
type annotation |
= |
assignment (in typed var decl) |
; |
statement terminator |
, |
separator |
. |
field access / enum literal prefix |
-> |
return type annotation |
=> |
lambda arrow |
$ |
generic type parameter introduction |
--- |
undefined value |
() |
grouping / params |
{} |
blocks / bodies |
2. Type System
Primitive Types
s1..s64— signed integers (1 to 64 bits).s32is the default for integer literals.u1..u64— unsigned integers (1 to 64 bits).f32— 32-bit floating pointf64— 64-bit floating pointbool— boolean (true/false)string— string of charactersAny— type-erased value, represented as{ i32, i64 }(type tag + payload). Used for variadic arguments and runtime type dispatch.Type— compile-time type value. At runtime, represented as ani32type tag (same tag space asAny).
Enum Types
User-defined sum types with named variants.
Foo :: enum {
variant1;
variant2;
}
Variants are referenced with dot-prefix syntax: .variant1
Struct Types
User-defined product types with named fields.
Vec4 :: struct {
x, y, z, w: f32;
}
Fields are declared as name1, name2: type; (comma-separated names sharing a type, semicolon-terminated).
Field Defaults
Fields may have default values. Fields without an explicit default have a zero-value default. --- marks a field as explicitly undefined.
Foo :: struct {
a : u2; // default is 0
b : u8 = 42; // default is 42
c : u8 = ---; // default is undefined
}
Struct Literals
// Positional (with type annotation — type inferred from annotation)
v1 : Vec4 = .{ 1, 2, 3, 0 };
// Positional (with type prefix)
v2 := Vec4.{ 4, 1, 1, 3 };
// Named fields (any order)
v3 := Vec4.{ w=0, x=2, y=3, z=4 };
// Mixed named + shorthand (bare identifier = field name matches variable name)
z := 5.0;
w := 6.0;
v4 := Vec4.{ y=3, x=9, w, z };
Field Access and Assignment
v1.x // read field x of struct v1
v1.x = 3.0; // assign to field x of struct v1
Struct Interpolation
Struct values in string interpolation print as TypeName{field:value, ...}:
print("{}", v1); // Vec4{x:1.0, y:2.0, z:3.0, w:0.0}
Union Types (Tagged Unions)
Sum types where each variant can carry typed data or be void. Internally represented as { i32, [max_payload_size x i8] }.
Declaration
Shape :: union {
circle: f32; // typed variant
rect: s32; // typed variant
none; // void variant
}
Construction
s :Shape = .circle(3.14); // inferred from context
s = .none; // void variant (enum literal syntax)
s = Shape.rect(42); // explicit prefix
Payload Access
r := s.circle; // load payload as f32 (undefined behavior if wrong variant active)
Pattern Matching
if s == {
case .circle: print("circle\n");
case .rect: print("rect\n");
case .none: print("none\n");
}
Union Interpolation
Union values in string interpolation print as <TypeName tag=N>:
print("{}", s); // <Shape tag=0>
Array Types
Fixed-size arrays with element type and length.
buffer : [5]f32 = .[0, 2, 3.5, 4, 0];
val := buffer[2]; // 3.5
buffer.len // 5 (compile-time constant, s32)
Arrays can also be constructed programmatically with the Array builtin:
MyArr :: Array(5, s32); // equivalent to [5]s32
Slice Types
A slice []T is a fat pointer {ptr, i32} referencing a contiguous sequence of T elements. Same runtime layout as string.
// Arrays implicitly coerce to slices at call sites
arr : [5]s32 = .[3, 1, 4, 1, 5];
sortSlice(arr); // [5]s32 → []s32 coercion
// Slice operations
items[i] // read element at index
items[i] = val; // write element at index
items.len // length (s32)
items.ptr // raw pointer
Slices support generic type parameters: []$T introduces type parameter T inferred from the element type of the argument (array or slice).
Subslicing
Arrays, slices, and strings support subslice syntax to create zero-copy views:
arr : [5]s32 = .[3, 1, 4, 1, 5];
sub := arr[1..4]; // []s32 → [1, 4, 1]
head := arr[..3]; // []s32 → [3, 1, 4]
tail := arr[2..]; // []s32 → [4, 1, 5]
msg := "hello world";
word := msg[6..11]; // string → "world"
expr[start..end]— elements fromstart(inclusive) toend(exclusive)expr[start..]— elements fromstartto endexpr[..end]— elements from beginning toend- Result type:
[]Tfor arrays/slices,stringfor strings - No memory allocation — the result points into the original backing storage
Pointer Types
| Syntax | Meaning | .len |
[i] |
|---|---|---|---|
*T |
pointer to one T | no | no |
[*]T |
many-pointer (buffer) | no | yes |
*[N]T |
pointer to array of N T | yes | yes |
*[]T |
pointer to slice | yes | yes |
Address-of: &x returns a pointer to the variable.
v := Vec2.{ 1.0, 2.0 };
ptr := &v; // *Vec2
Dereference: p.* loads the value through the pointer.
copy := ptr.*; // Vec2
Auto-deref: p.field is sugar for p.*.field.
set_x :: (p: *Vec2, val: f32) {
p.x = val; // auto-deref: p.*.x = val
}
set_x(&v, 99.0);
Null: All pointer types are nullable. null is the null pointer literal.
np : *Vec2 = null;
Many-pointer: [*]T supports indexing for buffers of unknown size.
arr : [5]s32 = .[10, 20, 30, 40, 50];
mp : [*]s32 = &arr[0]; // *s32 → [*]s32 implicit
val := mp[2]; // 30
Implicit conversions:
*T→[*]T(pointer to element → many-pointer)null(*void) → any*T
Vector Types (SIMD)
LLVM SIMD vectors, parameterized by length and element type.
v := vec3(1, 3, 2); // Vector(3, f32)
Arithmetic: Element-wise +, -, *, / on vectors of same dimensions.
add := v1 + v2; // element-wise addition
Scalar broadcast: Scalar operands are broadcast to match the vector.
scaled := v * 2.0; // [2.0, 6.0, 4.0]
Negation: Unary - negates each element.
neg := -v; // [-1.0, -3.0, -2.0]
Element access: .x, .y, .z, .w (aliases .r, .g, .b, .a) extract single components.
v.x // first element
v.z // third element
Index access: v[i] extracts by index.
v[0] // first element
Built-in sqrt: Calls LLVM llvm.sqrt.f32/.f64 intrinsic.
s := sqrt(9.0); // 3.0
Function Types
Expressed as (param_types) -> return_type.
A function with no return type annotation returns void.
// type is (s32) -> s32
compute :: (x: s32) -> s32 { x * x; }
// type is () -> void
main :: () { }
Type Aliases
A name bound to an existing type.
SOME_TYPE :: f64;
Generic Functions (Monomorphization)
Functions can be parameterized over types using $T syntax. The $ prefix introduces a type parameter; subsequent uses of the name reference it.
sum :: (a: $T, b: T) -> T {
return a + b;
}
$Tin a parameter type introduces type parameterT- Bare
T(without$) references the introduced type parameter - At call sites, type arguments are inferred from actual argument types:
sum(40, 2) // T = s32 sum(1.5, 2.5) // T = f32 - Each unique set of concrete types produces a separate specialized function (monomorphization)
- Multiple type parameters are supported:
(a: $T, b: $U) -> T
Variadic Functions
Functions can accept a variable number of arguments using ..Type syntax:
print :: (fmt: string, args: ..Any) { ... }
..Anymeans zero or more arguments, each boxed intoAny(type tag + payload)- The variadic parameter must be the last parameter
- At call sites, variadic arguments are automatically boxed:
print("x={}, y={}\n", x, y) - Inside the function body,
argsis accessed as a slice-like sequence
Type Inference
::bindings infer type from the right-hand side:=bindings infer type from the right-hand side- Explicit annotation overrides inference:
NAME : f64 : 0.9; - Integer literals default to
s32 - Float literals default to
f32 - Enum literals (
.variant) infer their enum type from context (expected type)
Type Conversions
Implicit (widening) — allowed without annotation:
- Integer to wider integer of same signedness (
u8→u16,s8→s32) - Unsigned to strictly wider signed (
u8→s16) - Any integer to any float (
u8→f32,s32→f64) - Float to wider float (
f32→f64) - Integer and float literals can convert to any numeric type implicitly
Explicit (narrowing) — requires xx prefix:
- Integer to narrower integer (
s32→u8) - Signed to unsigned (
s32→u32) - Float to narrower float (
f64→f32) - Float to any integer (
f64→u16) - Unsigned to signed of same or narrower width (
u8→s8)
The xx prefix operator marks an expression for auto-conversion to the expected type from context (assignment, declaration, argument, return):
large: f64 = 5999.5;
x : u16 = xx large; // f64 → u16
d : u8 = #run xx resolve(5); // s32 → u8 at compile time
Using xx outside a typed context (where the target type is known) is a compile error.
3. Declarations
Constant Binding (immutable)
// inferred type
NAME :: value;
// explicit type
NAME : type : value;
The :: operator creates an immutable binding. The value is evaluated at compile time when possible.
Examples:
SOME_INT :: 0; // s32
SOME_STR :: "Hello"; // string
SOME_FLOAT :: 0.3; // f32
SOME_DOUBLE : f64 : 0.9; // f64 (explicit)
SOME_FUNC :: () => 42; // () -> s32
SOME_TYPE :: f64; // type alias
Variable Binding (mutable)
// inferred type
name := value;
// explicit type
name : type = value;
// default-initialized (type required)
name : type;
// undefined (type required)
name : type = ---;
The := operator creates a mutable binding. The type is inferred unless explicitly annotated.
name : type; initializes using the type's defaults: zero for primitives, per-field defaults for structs (see Field Defaults).
name : type = ---; leaves the value undefined (uninitialized memory). Reading before writing is undefined behavior.
Examples:
x := 42; // s32, mutable
x := if true then 1 else 2;
z : Foo = .variant2; // Foo, mutable, explicit type
a : Foo; // Foo, default-initialized (a=0, b=42, c=undef)
b : Foo = ---; // Foo, entirely undefined
Function Definition
name :: (params) -> return_type {
body
}
- Parameters:
name: typeseparated by commas - Return type:
-> type(omit for void) - Body: block of statements; last expression is the implicit return value
- No
returnkeyword needed (last expression = return value)
Examples:
compute :: (x: s32) -> s32 {
x * x;
}
main :: () {
// void return, no -> annotation
}
// Bare-block shorthand (equivalent to no-arg void function):
main :: {
// same as main :: () { ... }
}
Enum Definition
Name :: enum {
variant1;
variant2;
}
Defines a new enum type with the given variants. Trailing comma is allowed.
4. Expressions
Everything in sx is expression-oriented where possible.
Operator Precedence
| Prec | Operators | Notes |
|---|---|---|
| 6 (highest) | *, / |
multiplication, division |
| 5 | +, - |
addition, subtraction |
| 4 | <, <=, >, >=, ==, != |
comparisons (chainable) |
| 2 | and |
logical AND (short-circuit) |
| 1 (lowest) | or |
logical OR (short-circuit) |
Arithmetic
Standard infix: +, -, *, / with usual precedence (*// before +/-).
x * x
x + 2
Chained Comparisons
Comparison operators can be chained. Each operand is evaluated exactly once.
0 <= x <= 100 // equivalent to: 0 <= x and x <= 100
1000 > x >= -100 // equivalent to: 1000 > x and x >= -100
a == b == c // equivalent to: a == b and b == c
Mixed operators are allowed: a < b <= c > d means a < b and b <= c and c > d.
Logical Operators
and and or are short-circuit boolean operators. The right operand is not evaluated if the left operand determines the result.
if 0 <= x <= 100 and 0 <= y <= 100 {
print("contained");
}
If Expression (inline form)
if condition then consequent else alternate
Both branches are single expressions. The whole form produces a value.
x := if true then 1 else 2;
The else branch is optional. Without it, the form is a statement (no value):
if i == 2 then continue;
if done then break;
if err then return;
If Expression (block form)
if condition {
stmts
} else {
stmts
}
Each branch is a block. The last expression in each block is the branch's value. Can be used inline within other expressions:
y := x + if false {
7;
} else {
12;
};
Pattern Matching
if subject == {
case pattern: body
case pattern: body
else: body // optional default arm
}
Matches subject against each case. Patterns can be:
- Enum literals:
.variant— matches a specific enum variant. - Integer/bool literals:
42,true— matches a specific value. - Type categories:
struct,enum,union— matches all types in that category (used withtype_ofvalues).
break exits a case arm without producing a value. The optional else: arm matches when no case pattern matches.
if z == {
case .variant1: break;
case .variant2:
print("z: {z}");
else:
print("unknown");
}
Type Category Matching
When switching on a Type value (from type_of), category keywords match all registered types of that category:
type := type_of(val);
if type == {
case int: result = int_to_string(xx val);
case struct: result = struct_to_string(cast(type) val);
case enum: result = enum_to_string(cast(type) val);
}
Available categories: int, float, bool, string, struct, enum, union.
Inside a category arm, cast(type) val performs runtime generic dispatch: the compiler generates a switch over all types in the category, monomorphizing the callee for each concrete type.
While Loop
while condition {
body
}
Repeats body as long as condition is true. break; exits the loop. continue; skips to the next iteration.
i := 0;
while i < 10 {
i += 1;
if i == 5 { continue; }
if i == 8 { break; }
print("{i}\n");
}
For Loop
for iterable {
// `it` is the current element
// `it_index` is the current index (s32)
print("{it}\n");
}
Iterates over arrays and slices. The loop body has two implicit variables:
it— the current element valueit_index— the current index (s32, starting at 0)
break; exits the loop. continue; skips to the next iteration.
arr : [5]s32 = .[1, 2, 3, 4, 5];
for arr {
if it_index == 2 { continue; }
print("{it}\n");
}
Lambda
(params) => expr
(params) -> return_type => expr
Anonymous function. Produces a function value. Supports the same parameter features as named functions: $ generic type params, .. variadic params, and optional return type annotation.
SOME_FUNC :: () => 42; // () -> s32
double :: (x: $T) -> T => x + x; // generic lambda with return type
Function Call
callee(args)
compute(6)
print("hello")
Field Access
object.field
Used for module access (std.print) and struct member access.
Enum Literal
.variant_name
The enum type is inferred from context (expected type from declaration or parameter).
5. Statements
Statements are terminated by ;.
- Declaration:
name :: value;/name := value; - Assignment:
name = value;/name += value;(and other compound assignments). Also supports field targets:obj.field = value; - Expression statement:
expr;— evaluates the expression (last in a block = return value) - Return:
return expr;— returns from the enclosing function with the given value.return;returns void. - Break:
break;— exits a match arm or while loop - Continue:
continue;— skips to the next iteration of a while loop - Defer:
defer expr;— defers execution ofexpruntil the enclosing block exits (LIFO order)
6. Blocks, Scoping, and Implicit Returns
A block { ... } contains zero or more statements. The last expression in a block is its value (implicit return).
In function bodies, the last expression becomes the return value:
compute :: (x: s32) -> s32 {
x * x; // this is returned
}
Scope Blocks
Bare blocks can be used as statements to introduce a new lexical scope. Variables declared inside a scope block are local to that block. No trailing ; is required.
main :: {
x := 42;
{
x := 6; // shadows outer x
print("inner: {x}"); // prints 6
}
print("outer: {x}"); // prints 42
}
Variable Shadowing
A variable declaration (name :=) inside an inner scope shadows any variable with the same name from outer scopes. The outer variable is restored when the inner scope exits.
Defer
defer expr; schedules expr to execute when the enclosing scope block exits. Multiple defers in the same scope execute in reverse order (LIFO).
{
defer print("second");
defer print("first");
}
// prints: first, then second
7. Built-in Functions
Built-in functions are declared in std.sx with the #builtin suffix, which tells the compiler to generate the implementation internally rather than looking for a function body.
I/O
write(str: string) -> void— write a string to standard outputprint(fmt: string, args: ..Any)— formatted print. Parses{}placeholders in the format string and substitutes arguments. When all argument types are statically known, the compiler specializes the call at compile time (noAnyboxing).
Math
sqrt(x: $T) -> T— square root (maps to LLVM intrinsic)
Memory
alloc(size: s32) -> string— allocatesizebytes of memory, returned as a string slicesize_of($T: Type) -> s32— size of typeTin bytes
Type Introspection
type_of(val: $T) -> Type— returns the runtime type tag of a valuetype_name($T: Type) -> string— returns the name of typeTas a string (e.g.,"Point")field_count($T: Type) -> s32— returns the number of fields (struct), variants (enum), or elements (vector) in typeTfield_name($T: Type, idx: s32) -> string— returns the name of theidx-th field (struct) or variant (enum) of typeTfield_value(s: $T, idx: s32) -> Any— returns theidx-th field (struct) or element (vector) ofs, boxed asAny
Type Conversion
cast(Type) expr— prefix operator that convertsexprtoType. Examples:cast(s32) 3.14,cast(f64) n. WhenTypeis a runtimeTypevalue inside a type-category match arm, the compiler generates a dispatch switch over all types in the category, monomorphizing the callee for each concrete type.
Vectors
Vector($N: int, $T: Type) -> Type— returns an LLVM vector type ofNelements of typeT
8. Compile-time Evaluation
#run Directive
#run expr evaluates expr at compile time using lazy JIT execution. It can appear in two contexts:
Compile-time constants — bind a compile-time value to a name:
compute :: (x: s32) -> s32 { x * x; }
x :: #run compute(5); // x = 25, evaluated at compile time
Comptime globals are resolved lazily: the JIT executes only when the value is first referenced during code generation. Chained dependencies are resolved automatically.
Side effects — execute code at compile time for its side effects:
#run print("compiling...");
#insert Directive
#insert expr; evaluates expr at compile time to obtain a string, then parses and compiles that string as inline code at the insertion point.
generate :: () -> string {
return "print(\"hello from the other side\");";
}
main :: () {
#insert #run generate();
// equivalent to: print("hello from the other side");
}
The inserted string must contain valid sx statements (including semicolons). The statements are parsed and compiled in the same scope as the #insert site.
9. Modules / Imports
#import Directive
The #import directive brings declarations from another .sx file into the current file. Paths are resolved relative to the importing file's directory.
Flat import — splices all declarations from the imported file into the current scope:
#import "modules/std/math.sx";
Namespaced import — wraps all declarations under a namespace name:
std :: #import "modules/std.sx";
Namespaced declarations are accessed with dot notation:
std.print("hello");
Import Resolution
- Imports are resolved after parsing and before code generation.
- Paths are relative to the directory of the file containing the
#import. - Nested imports are supported (imported files may themselves contain
#import). - Circular imports are detected and silently skipped (each file is imported at most once).
- Generic functions in namespaced imports are supported (e.g.,
std.mul(5, 2)wheremulis generic).
Intra-module References
Functions within a namespaced import can call each other without the namespace prefix. When generating code for a namespaced module, unresolved function names are automatically tried with the namespace prefix.
Example
// modules/std/math.sx
mul :: (base: $T, exp: T) -> T { base * exp; }
// modules/std/std.sx
print :: (str: string) -> void #builtin;
// main.sx
std :: #import "modules/std.sx";
#import "modules/std/math.sx";
main :: () -> s32 {
std.print("hello there");
mul(5, 2);
}
10. Program Structure
A program is a sequence of top-level declarations and #import directives. Execution begins at main.
main :: () {
// entry point
}
main takes no arguments and returns void. The process exit code is 0 unless otherwise specified.
11. Grammar (informal)
program = top_level*
top_level = decl | import_decl
import_decl = '#import' STRING ';'
| IDENT '::' '#import' STRING ';'
decl = const_decl | var_decl | fn_decl | enum_decl | struct_decl
const_decl = IDENT '::' expr ';'
| IDENT ':' type ':' expr ';'
var_decl = IDENT ':=' expr ';'
| IDENT ':' type '=' expr ';'
| IDENT ':' type ';'
fn_decl = IDENT '::' '(' params? ')' ('->' type)? block
| IDENT '::' block
enum_decl = IDENT '::' 'enum' '{' (IDENT ';')* '}'
struct_decl = IDENT '::' 'struct' '{' field_group* '}'
field_group = IDENT (',' IDENT)* ':' type ('=' expr)? ';'
params = param (',' param)*
param = IDENT ':' type
block = '{' stmt* '}'
stmt = decl | assignment ';' | return_stmt | defer_stmt | insert_stmt
| break_stmt | continue_stmt | expr ';'
return_stmt = 'return' expr? ';'
break_stmt = 'break' ';'
continue_stmt = 'continue' ';'
defer_stmt = 'defer' expr ';'
insert_stmt = '#insert' expr ';'
assignment = lvalue ('=' | '+=' | '-=' | '*=' | '/=') expr
lvalue = IDENT | postfix '.' IDENT
expr = if_expr | match_expr | while_expr | for_expr | lambda | binary
while_expr = 'while' expr block
for_expr = 'for' expr block
binary = unary (binop unary)*
unary = ('-' | '!' | 'xx' | 'cast' '(' type ')') postfix
| postfix
postfix = primary ('(' args? ')' | '.' IDENT | '.{' field_init_list '}')*
primary = INT | HEX_INT | BIN_INT | FLOAT | STRING | BOOL | IDENT | '---'
| '.' IDENT | '.' '{' field_init_list '}'
| '(' expr ')' | block | '#run' expr
field_init_list = field_init (',' field_init)*
field_init = IDENT '=' expr | IDENT | expr
if_expr = 'if' expr 'then' expr ('else' expr)?
| 'if' expr block ('else' block)?
match_expr = 'if' expr '==' '{' case_arm* else_arm? '}'
case_arm = 'case' pattern ':' (stmt* | 'break' ';')
else_arm = 'else' ':' stmt*
pattern = '.' IDENT | INT | BOOL | IDENT
lambda = '(' params? ')' ('->' type)? '=>' expr
args = expr (',' expr)*
type = '$' IDENT | 's32' | 'f32' | 'f64' | 'bool' | 'string'
| 'Any' | 'Type' | '..' type | '[' expr ']' type | IDENT
12. Open Questions
These are inferred gaps — things not shown in the readme that need decisions:
return: Bothreturn expr;and implicit return (last expression) are supported.- Else in match: Is there a default/else arm in pattern matching?
- Nested functions: Can functions be defined inside other functions?
- Mutability of params: Are function parameters immutable by default?
- Array/list types: Not shown — deferred.
- Struct types: Implemented — named struct types with positional/named/shorthand literals.
- Imports/modules:
#importdirective supports flat and namespaced imports (see Section 8). - Operator overloading: Not shown — presumably no.
- Semicolons: Required on all statements? What about the last expression in a block?
- Top-level expressions: Are bare expressions allowed at the top level or only declarations?