Files
sx/specs.md
2026-02-10 22:47:43 +02:00

27 KiB

sx language specification

1. Lexical Structure

Comments

Line comments start with // and extend to end of line.

// this is a comment

Identifiers

  • Lowercase or mixed-case for variables, functions: x, compute, main
  • UPPER_SNAKE_CASE for constants: SOME_INT, SOME_STR
  • PascalCase for types: Foo

Literals

Kind Examples Type
Integer 0, 42, 0xFF, 0b1010 s32
Float 0.3, 0.9 f32
String "Hello", "z: {z}" string
Boolean true, false bool
Enum .variant1 inferred from context
Undefined --- context-dependent

Keywords

if, else, then, while, break, continue, true, false, enum, struct, union, case, return, defer, xx, and, or

Operators

Operator Meaning
+ addition
- subtraction / negation
* multiplication
/ division
== equality
!= inequality
< less than
> greater than
<= less or equal
>= greater or equal
and logical AND (short-circuit)
or logical OR (short-circuit)
+= add-assign
-= sub-assign
*= mul-assign
/= div-assign

Delimiters and Punctuation

Token Meaning
:: constant binding / definition
:= variable binding (mutable, inferred)
: type annotation
= assignment (in typed var decl)
; statement terminator
, separator
. field access / enum literal prefix
-> return type annotation
=> lambda arrow
$ generic type parameter introduction
--- undefined value
() grouping / params
{} blocks / bodies

2. Type System

Primitive Types

  • s1..s64 — signed integers (1 to 64 bits). s32 is the default for integer literals.
  • u1..u64 — unsigned integers (1 to 64 bits).
  • f32 — 32-bit floating point
  • f64 — 64-bit floating point
  • bool — boolean (true / false)
  • string — string of characters
  • Any — type-erased value, represented as { i32, i64 } (type tag + payload). Used for variadic arguments and runtime type dispatch.
  • Type — compile-time type value. At runtime, represented as an i32 type tag (same tag space as Any).

Enum Types

User-defined sum types with named variants.

Foo :: enum {
  variant1;
  variant2;
}

Variants are referenced with dot-prefix syntax: .variant1

Struct Types

User-defined product types with named fields.

Vec4 :: struct {
  x, y, z, w: f32;
}

Fields are declared as name1, name2: type; (comma-separated names sharing a type, semicolon-terminated).

Field Defaults

Fields may have default values. Fields without an explicit default have a zero-value default. --- marks a field as explicitly undefined.

Foo :: struct {
  a : u2;          // default is 0
  b : u8 = 42;     // default is 42
  c : u8 = ---;    // default is undefined
}

Struct Literals

// Positional (with type annotation — type inferred from annotation)
v1 : Vec4 = .{ 1, 2, 3, 0 };

// Positional (with type prefix)
v2 := Vec4.{ 4, 1, 1, 3 };

// Named fields (any order)
v3 := Vec4.{ w=0, x=2, y=3, z=4 };

// Mixed named + shorthand (bare identifier = field name matches variable name)
z := 5.0;
w := 6.0;
v4 := Vec4.{ y=3, x=9, w, z };

Field Access and Assignment

v1.x        // read field x of struct v1
v1.x = 3.0; // assign to field x of struct v1

Struct Interpolation

Struct values in string interpolation print as TypeName{field:value, ...}:

print("{}", v1);  // Vec4{x:1.0, y:2.0, z:3.0, w:0.0}

Union Types (Tagged Unions)

Sum types where each variant can carry typed data or be void. Internally represented as { i32, [max_payload_size x i8] }.

Declaration

Shape :: union {
    circle: f32;    // typed variant
    rect: s32;      // typed variant
    none;           // void variant
}

Construction

s :Shape = .circle(3.14);       // inferred from context
s = .none;                       // void variant (enum literal syntax)
s = Shape.rect(42);              // explicit prefix

Payload Access

r := s.circle;   // load payload as f32 (undefined behavior if wrong variant active)

Pattern Matching

if s == {
    case .circle: print("circle\n");
    case .rect: print("rect\n");
    case .none: print("none\n");
}

Union Interpolation

Union values in string interpolation print as <TypeName tag=N>:

print("{}", s);  // <Shape tag=0>

Array Types

Fixed-size arrays with element type and length.

buffer : [5]f32 = .[0, 2, 3.5, 4, 0];
val := buffer[2];  // 3.5
buffer.len         // 5 (compile-time constant, s32)

Arrays can also be constructed programmatically with the Array builtin:

MyArr :: Array(5, s32);   // equivalent to [5]s32

Slice Types

A slice []T is a fat pointer {ptr, i32} referencing a contiguous sequence of T elements. Same runtime layout as string.

// Arrays implicitly coerce to slices at call sites
arr : [5]s32 = .[3, 1, 4, 1, 5];
sortSlice(arr);   // [5]s32 → []s32 coercion

// Slice operations
items[i]           // read element at index
items[i] = val;    // write element at index
items.len          // length (s32)
items.ptr          // raw pointer

Slices support generic type parameters: []$T introduces type parameter T inferred from the element type of the argument (array or slice).

Subslicing

Arrays, slices, and strings support subslice syntax to create zero-copy views:

arr : [5]s32 = .[3, 1, 4, 1, 5];
sub := arr[1..4];    // []s32 → [1, 4, 1]
head := arr[..3];    // []s32 → [3, 1, 4]
tail := arr[2..];    // []s32 → [4, 1, 5]

msg := "hello world";
word := msg[6..11];  // string → "world"
  • expr[start..end] — elements from start (inclusive) to end (exclusive)
  • expr[start..] — elements from start to end
  • expr[..end] — elements from beginning to end
  • Result type: []T for arrays/slices, string for strings
  • No memory allocation — the result points into the original backing storage

Pointer Types

Syntax Meaning .len [i]
*T pointer to one T no no
[*]T many-pointer (buffer) no yes
*[N]T pointer to array of N T yes yes
*[]T pointer to slice yes yes

Address-of: &x returns a pointer to the variable.

v := Vec2.{ 1.0, 2.0 };
ptr := &v;             // *Vec2

Dereference: p.* loads the value through the pointer.

copy := ptr.*;          // Vec2

Auto-deref: p.field is sugar for p.*.field.

set_x :: (p: *Vec2, val: f32) {
    p.x = val;          // auto-deref: p.*.x = val
}
set_x(&v, 99.0);

Null: All pointer types are nullable. null is the null pointer literal.

np : *Vec2 = null;

Many-pointer: [*]T supports indexing for buffers of unknown size.

arr : [5]s32 = .[10, 20, 30, 40, 50];
mp : [*]s32 = &arr[0];   // *s32 → [*]s32 implicit
val := mp[2];             // 30

Implicit conversions:

  • *T[*]T (pointer to element → many-pointer)
  • null (*void) → any *T

Vector Types (SIMD)

LLVM SIMD vectors, parameterized by length and element type.

v := vec3(1, 3, 2);  // Vector(3, f32)

Arithmetic: Element-wise +, -, *, / on vectors of same dimensions.

add := v1 + v2;     // element-wise addition

Scalar broadcast: Scalar operands are broadcast to match the vector.

scaled := v * 2.0;  // [2.0, 6.0, 4.0]

Negation: Unary - negates each element.

neg := -v;           // [-1.0, -3.0, -2.0]

Element access: .x, .y, .z, .w (aliases .r, .g, .b, .a) extract single components.

v.x     // first element
v.z     // third element

Index access: v[i] extracts by index.

v[0]    // first element

Built-in sqrt: Calls LLVM llvm.sqrt.f32/.f64 intrinsic.

s := sqrt(9.0);     // 3.0

Function Types

Expressed as (param_types) -> return_type. A function with no return type annotation returns void.

// type is (s32) -> s32
compute :: (x: s32) -> s32 { x * x; }

// type is () -> void
main :: () { }

Type Aliases

A name bound to an existing type.

SOME_TYPE :: f64;

Generic Functions (Monomorphization)

Functions can be parameterized over types using $T syntax. The $ prefix introduces a type parameter; subsequent uses of the name reference it.

sum :: (a: $T, b: T) -> T {
    return a + b;
}
  • $T in a parameter type introduces type parameter T
  • Bare T (without $) references the introduced type parameter
  • At call sites, type arguments are inferred from actual argument types:
    sum(40, 2)       // T = s32
    sum(1.5, 2.5)    // T = f32
    
  • Each unique set of concrete types produces a separate specialized function (monomorphization)
  • Multiple type parameters are supported: (a: $T, b: $U) -> T

Variadic Functions

Functions can accept a variable number of arguments using ..Type syntax:

print :: (fmt: string, args: ..Any) { ... }
  • ..Any means zero or more arguments, each boxed into Any (type tag + payload)
  • The variadic parameter must be the last parameter
  • At call sites, variadic arguments are automatically boxed: print("x={}, y={}\n", x, y)
  • Inside the function body, args is accessed as a slice-like sequence

Type Inference

  • :: bindings infer type from the right-hand side
  • := bindings infer type from the right-hand side
  • Explicit annotation overrides inference: NAME : f64 : 0.9;
  • Integer literals default to s32
  • Float literals default to f32
  • Enum literals (.variant) infer their enum type from context (expected type)

Type Conversions

Implicit (widening) — allowed without annotation:

  • Integer to wider integer of same signedness (u8u16, s8s32)
  • Unsigned to strictly wider signed (u8s16)
  • Any integer to any float (u8f32, s32f64)
  • Float to wider float (f32f64)
  • Integer and float literals can convert to any numeric type implicitly

Explicit (narrowing) — requires xx prefix:

  • Integer to narrower integer (s32u8)
  • Signed to unsigned (s32u32)
  • Float to narrower float (f64f32)
  • Float to any integer (f64u16)
  • Unsigned to signed of same or narrower width (u8s8)

The xx prefix operator marks an expression for auto-conversion to the expected type from context (assignment, declaration, argument, return):

large: f64 = 5999.5;
x : u16 = xx large;       // f64 → u16
d : u8 = #run xx resolve(5); // s32 → u8 at compile time

Using xx outside a typed context (where the target type is known) is a compile error.


3. Declarations

Constant Binding (immutable)

// inferred type
NAME :: value;

// explicit type
NAME : type : value;

The :: operator creates an immutable binding. The value is evaluated at compile time when possible.

Examples:

SOME_INT    :: 0;           // s32
SOME_STR    :: "Hello";     // string
SOME_FLOAT  :: 0.3;         // f32
SOME_DOUBLE : f64 : 0.9;   // f64 (explicit)
SOME_FUNC   :: () => 42;    // () -> s32
SOME_TYPE   :: f64;         // type alias

Variable Binding (mutable)

// inferred type
name := value;

// explicit type
name : type = value;

// default-initialized (type required)
name : type;

// undefined (type required)
name : type = ---;

The := operator creates a mutable binding. The type is inferred unless explicitly annotated.

name : type; initializes using the type's defaults: zero for primitives, per-field defaults for structs (see Field Defaults).

name : type = ---; leaves the value undefined (uninitialized memory). Reading before writing is undefined behavior.

Examples:

x := 42;              // s32, mutable
x := if true then 1 else 2;
z : Foo = .variant2;  // Foo, mutable, explicit type
a : Foo;              // Foo, default-initialized (a=0, b=42, c=undef)
b : Foo = ---;        // Foo, entirely undefined

Function Definition

name :: (params) -> return_type {
  body
}
  • Parameters: name: type separated by commas
  • Return type: -> type (omit for void)
  • Body: block of statements; last expression is the implicit return value
  • No return keyword needed (last expression = return value)

Examples:

compute :: (x: s32) -> s32 {
  x * x;
}

main :: () {
  // void return, no -> annotation
}

// Bare-block shorthand (equivalent to no-arg void function):
main :: {
  // same as main :: () { ... }
}

Enum Definition

Name :: enum {
  variant1;
  variant2;
}

Defines a new enum type with the given variants. Trailing comma is allowed.


4. Expressions

Everything in sx is expression-oriented where possible.

Operator Precedence

Prec Operators Notes
6 (highest) *, / multiplication, division
5 +, - addition, subtraction
4 <, <=, >, >=, ==, != comparisons (chainable)
2 and logical AND (short-circuit)
1 (lowest) or logical OR (short-circuit)

Arithmetic

Standard infix: +, -, *, / with usual precedence (*// before +/-).

x * x
x + 2

Chained Comparisons

Comparison operators can be chained. Each operand is evaluated exactly once.

0 <= x <= 100          // equivalent to: 0 <= x and x <= 100
1000 > x >= -100       // equivalent to: 1000 > x and x >= -100
a == b == c            // equivalent to: a == b and b == c

Mixed operators are allowed: a < b <= c > d means a < b and b <= c and c > d.

Logical Operators

and and or are short-circuit boolean operators. The right operand is not evaluated if the left operand determines the result.

if 0 <= x <= 100 and 0 <= y <= 100 {
    print("contained");
}

If Expression (inline form)

if condition then consequent else alternate

Both branches are single expressions. The whole form produces a value.

x := if true then 1 else 2;

The else branch is optional. Without it, the form is a statement (no value):

if i == 2 then continue;
if done then break;
if err then return;

If Expression (block form)

if condition {
  stmts
} else {
  stmts
}

Each branch is a block. The last expression in each block is the branch's value. Can be used inline within other expressions:

y := x + if false {
  7;
} else {
  12;
};

Pattern Matching

if subject == {
  case pattern: body
  case pattern: body
  else: body          // optional default arm
}

Matches subject against each case. Patterns can be:

  • Enum literals: .variant — matches a specific enum variant.
  • Integer/bool literals: 42, true — matches a specific value.
  • Type categories: struct, enum, union — matches all types in that category (used with type_of values).

break exits a case arm without producing a value. The optional else: arm matches when no case pattern matches.

if z == {
  case .variant1: break;
  case .variant2:
    print("z: {z}");
  else:
    print("unknown");
}

Type Category Matching

When switching on a Type value (from type_of), category keywords match all registered types of that category:

type := type_of(val);
if type == {
    case int: result = int_to_string(xx val);
    case struct: result = struct_to_string(cast(type) val);
    case enum: result = enum_to_string(cast(type) val);
}

Available categories: int, float, bool, string, struct, enum, union.

Inside a category arm, cast(type) val performs runtime generic dispatch: the compiler generates a switch over all types in the category, monomorphizing the callee for each concrete type.

While Loop

while condition {
  body
}

Repeats body as long as condition is true. break; exits the loop. continue; skips to the next iteration.

i := 0;
while i < 10 {
    i += 1;
    if i == 5 { continue; }
    if i == 8 { break; }
    print("{i}\n");
}

For Loop

for iterable {
  // `it` is the current element
  // `it_index` is the current index (s32)
  print("{it}\n");
}

Iterates over arrays and slices. The loop body has two implicit variables:

  • it — the current element value
  • it_index — the current index (s32, starting at 0)

break; exits the loop. continue; skips to the next iteration.

arr : [5]s32 = .[1, 2, 3, 4, 5];
for arr {
    if it_index == 2 { continue; }
    print("{it}\n");
}

Lambda

(params) => expr
(params) -> return_type => expr

Anonymous function. Produces a function value. Supports the same parameter features as named functions: $ generic type params, .. variadic params, and optional return type annotation.

SOME_FUNC :: () => 42;                    // () -> s32
double :: (x: $T) -> T => x + x;         // generic lambda with return type

Function Call

callee(args)
compute(6)
print("hello")

Field Access

object.field

Used for module access (std.print) and struct member access.

Enum Literal

.variant_name

The enum type is inferred from context (expected type from declaration or parameter).


5. Statements

Statements are terminated by ;.

  • Declaration: name :: value; / name := value;
  • Assignment: name = value; / name += value; (and other compound assignments). Also supports field targets: obj.field = value;
  • Expression statement: expr; — evaluates the expression (last in a block = return value)
  • Return: return expr; — returns from the enclosing function with the given value. return; returns void.
  • Break: break; — exits a match arm or while loop
  • Continue: continue; — skips to the next iteration of a while loop
  • Defer: defer expr; — defers execution of expr until the enclosing block exits (LIFO order)

6. Blocks, Scoping, and Implicit Returns

A block { ... } contains zero or more statements. The last expression in a block is its value (implicit return).

In function bodies, the last expression becomes the return value:

compute :: (x: s32) -> s32 {
  x * x;   // this is returned
}

Scope Blocks

Bare blocks can be used as statements to introduce a new lexical scope. Variables declared inside a scope block are local to that block. No trailing ; is required.

main :: {
  x := 42;
  {
    x := 6;                        // shadows outer x
    print("inner: {x}"); // prints 6
  }
  print("outer: {x}");   // prints 42
}

Variable Shadowing

A variable declaration (name :=) inside an inner scope shadows any variable with the same name from outer scopes. The outer variable is restored when the inner scope exits.

Defer

defer expr; schedules expr to execute when the enclosing scope block exits. Multiple defers in the same scope execute in reverse order (LIFO).

{
  defer print("second");
  defer print("first");
}
// prints: first, then second

7. Built-in Functions

Built-in functions are declared in std.sx with the #builtin suffix, which tells the compiler to generate the implementation internally rather than looking for a function body.

I/O

  • write(str: string) -> void — write a string to standard output
  • print(fmt: string, args: ..Any) — formatted print. Parses {} placeholders in the format string and substitutes arguments. When all argument types are statically known, the compiler specializes the call at compile time (no Any boxing).

Math

  • sqrt(x: $T) -> T — square root (maps to LLVM intrinsic)

Memory

  • alloc(size: s32) -> string — allocate size bytes of memory, returned as a string slice
  • size_of($T: Type) -> s32 — size of type T in bytes

Type Introspection

  • type_of(val: $T) -> Type — returns the runtime type tag of a value
  • type_name($T: Type) -> string — returns the name of type T as a string (e.g., "Point")
  • field_count($T: Type) -> s32 — returns the number of fields (struct), variants (enum), or elements (vector) in type T
  • field_name($T: Type, idx: s32) -> string — returns the name of the idx-th field (struct) or variant (enum) of type T
  • field_value(s: $T, idx: s32) -> Any — returns the idx-th field (struct) or element (vector) of s, boxed as Any

Type Conversion

  • cast(Type) expr — prefix operator that converts expr to Type. Examples: cast(s32) 3.14, cast(f64) n. When Type is a runtime Type value inside a type-category match arm, the compiler generates a dispatch switch over all types in the category, monomorphizing the callee for each concrete type.

Vectors

  • Vector($N: int, $T: Type) -> Type — returns an LLVM vector type of N elements of type T

8. Compile-time Evaluation

#run Directive

#run expr evaluates expr at compile time using lazy JIT execution. It can appear in two contexts:

Compile-time constants — bind a compile-time value to a name:

compute :: (x: s32) -> s32 { x * x; }
x :: #run compute(5);   // x = 25, evaluated at compile time

Comptime globals are resolved lazily: the JIT executes only when the value is first referenced during code generation. Chained dependencies are resolved automatically.

Side effects — execute code at compile time for its side effects:

#run print("compiling...");

#insert Directive

#insert expr; evaluates expr at compile time to obtain a string, then parses and compiles that string as inline code at the insertion point.

generate :: () -> string {
    return "print(\"hello from the other side\");";
}

main :: () {
    #insert #run generate();
    // equivalent to: print("hello from the other side");
}

The inserted string must contain valid sx statements (including semicolons). The statements are parsed and compiled in the same scope as the #insert site.


9. Modules / Imports

#import Directive

The #import directive brings declarations from another .sx file into the current file. Paths are resolved relative to the importing file's directory.

Flat import — splices all declarations from the imported file into the current scope:

#import "modules/std/math.sx";

Namespaced import — wraps all declarations under a namespace name:

std :: #import "modules/std.sx";

Namespaced declarations are accessed with dot notation:

std.print("hello");

Import Resolution

  • Imports are resolved after parsing and before code generation.
  • Paths are relative to the directory of the file containing the #import.
  • Nested imports are supported (imported files may themselves contain #import).
  • Circular imports are detected and silently skipped (each file is imported at most once).
  • Generic functions in namespaced imports are supported (e.g., std.mul(5, 2) where mul is generic).

Intra-module References

Functions within a namespaced import can call each other without the namespace prefix. When generating code for a namespaced module, unresolved function names are automatically tried with the namespace prefix.

Example

// modules/std/math.sx
mul :: (base: $T, exp: T) -> T { base * exp; }

// modules/std/std.sx
print :: (str: string) -> void #builtin;

// main.sx
std :: #import "modules/std.sx";
#import "modules/std/math.sx";

main :: () -> s32 {
    std.print("hello there");
    mul(5, 2);
}

10. Program Structure

A program is a sequence of top-level declarations and #import directives. Execution begins at main.

main :: () {
  // entry point
}

main takes no arguments and returns void. The process exit code is 0 unless otherwise specified.


11. Grammar (informal)

program         = top_level*
top_level       = decl | import_decl
import_decl     = '#import' STRING ';'
                | IDENT '::' '#import' STRING ';'
decl            = const_decl | var_decl | fn_decl | enum_decl | struct_decl
const_decl      = IDENT '::' expr ';'
                | IDENT ':' type ':' expr ';'
var_decl        = IDENT ':=' expr ';'
                | IDENT ':' type '=' expr ';'
                | IDENT ':' type ';'
fn_decl         = IDENT '::' '(' params? ')' ('->' type)? block
                | IDENT '::' block
enum_decl       = IDENT '::' 'enum' '{' (IDENT ';')* '}'
struct_decl     = IDENT '::' 'struct' '{' field_group* '}'
field_group     = IDENT (',' IDENT)* ':' type ('=' expr)? ';'
params          = param (',' param)*
param           = IDENT ':' type
block           = '{' stmt* '}'
stmt            = decl | assignment ';' | return_stmt | defer_stmt | insert_stmt
                | break_stmt | continue_stmt | expr ';'
return_stmt     = 'return' expr? ';'
break_stmt      = 'break' ';'
continue_stmt   = 'continue' ';'
defer_stmt      = 'defer' expr ';'
insert_stmt     = '#insert' expr ';'
assignment      = lvalue ('=' | '+=' | '-=' | '*=' | '/=') expr
lvalue          = IDENT | postfix '.' IDENT
expr            = if_expr | match_expr | while_expr | for_expr | lambda | binary
while_expr      = 'while' expr block
for_expr        = 'for' expr block
binary          = unary (binop unary)*
unary           = ('-' | '!' | 'xx' | 'cast' '(' type ')') postfix
                | postfix
postfix         = primary ('(' args? ')' | '.' IDENT | '.{' field_init_list '}')*
primary         = INT | HEX_INT | BIN_INT | FLOAT | STRING | BOOL | IDENT | '---'
                | '.' IDENT | '.' '{' field_init_list '}'
                | '(' expr ')' | block | '#run' expr
field_init_list = field_init (',' field_init)*
field_init      = IDENT '=' expr | IDENT | expr
if_expr         = 'if' expr 'then' expr ('else' expr)?
                | 'if' expr block ('else' block)?
match_expr      = 'if' expr '==' '{' case_arm* else_arm? '}'
case_arm        = 'case' pattern ':' (stmt* | 'break' ';')
else_arm        = 'else' ':' stmt*
pattern         = '.' IDENT | INT | BOOL | IDENT
lambda          = '(' params? ')' ('->' type)? '=>' expr
args            = expr (',' expr)*
type            = '$' IDENT | 's32' | 'f32' | 'f64' | 'bool' | 'string'
                | 'Any' | 'Type' | '..' type | '[' expr ']' type | IDENT

12. Open Questions

These are inferred gaps — things not shown in the readme that need decisions:

  • return: Both return expr; and implicit return (last expression) are supported.
  • Else in match: Is there a default/else arm in pattern matching?
  • Nested functions: Can functions be defined inside other functions?
  • Mutability of params: Are function parameters immutable by default?
  • Array/list types: Not shown — deferred.
  • Struct types: Implemented — named struct types with positional/named/shorthand literals.
  • Imports/modules: #import directive supports flat and namespaced imports (see Section 8).
  • Operator overloading: Not shown — presumably no.
  • Semicolons: Required on all statements? What about the last expression in a block?
  • Top-level expressions: Are bare expressions allowed at the top level or only declarations?