Files
sx/specs.md
2026-02-20 13:28:38 +02:00

1375 lines
45 KiB
Markdown

# sx language specification
## 1. Lexical Structure
### Comments
Line comments start with `//` and extend to end of line.
```sx
// this is a comment
```
### Identifiers
- Lowercase or mixed-case for variables, functions: `x`, `compute`, `main`
- UPPER_SNAKE_CASE for constants: `SOME_INT`, `SOME_STR`
- PascalCase for types: `Foo`
### Literals
| Kind | Examples | Type |
|-----------|---------------------|---------|
| Integer | `0`, `42`, `0xFF`, `0b1010` | `s64` |
| Float | `0.3`, `0.9` | `f32` |
| String | `"Hello"`, `"z: {z}"` | `string` (may span multiple lines) |
| Heredoc String | `#string END`...`END` | `string` |
| Boolean | `true`, `false` | `bool` |
| Enum | `.variant1` | inferred from context |
| Undefined | `---` | context-dependent |
String literals support escape sequences (`\n`, `\t`, `\r`, `\\`, `\"`, `\0`) and may span multiple lines directly:
```sx
shader_src := "#version 330 core
void main() {
gl_Position = vec4(0.0);
}
";
```
**Heredoc strings** use `#string DELIMITER` syntax (inspired by Jai). Content is completely raw — no escape processing. The delimiter is any identifier. Content starts after the newline following the delimiter and ends when the delimiter appears at column 0 of a line.
```sx
vert_src := #string GLSL
#version 330 core
void main() {
gl_Position = vec4(aPos, 1.0);
}
GLSL;
```
### Keywords
`if`, `else`, `then`, `while`, `for`, `break`, `continue`, `true`, `false`, `enum`, `struct`, `union`, `case`, `return`, `defer`, `push`, `ufcs`, `in`, `xx`, `and`, `or`
> Note: `enum` is used for both payload-less and payload-bearing sum types (tagged unions). `union` is reserved for C-style untagged unions (memory overlays).
### Operators
| Operator | Meaning |
|----------|------------------|
| `+` | addition |
| `-` | subtraction / negation |
| `*` | multiplication |
| `/` | division |
| `==` | equality |
| `!=` | inequality |
| `<` | less than |
| `>` | greater than |
| `<=` | less or equal |
| `>=` | greater or equal |
| `&` | bitwise AND |
| `\|` | bitwise OR |
| `^` | bitwise XOR |
| `~` | bitwise NOT (unary) |
| `<<` | left shift |
| `>>` | right shift (arithmetic for signed, logical for unsigned) |
| `and` | logical AND (short-circuit) |
| `or` | logical OR (short-circuit) |
| `in` | membership test (tuples) |
| `\|>` | pipe (function application) |
| `+=` | add-assign |
| `-=` | sub-assign |
| `*=` | mul-assign |
| `/=` | div-assign |
| `&=` | bitwise AND assign |
| `\|=` | bitwise OR assign |
| `^=` | bitwise XOR assign |
| `<<=` | left shift assign |
| `>>=` | right shift assign |
### Delimiters and Punctuation
| Token | Meaning |
|--------|--------------------------------------|
| `::` | constant binding / definition |
| `:=` | variable binding (mutable, inferred) |
| `:` | type annotation |
| `=` | assignment (in typed var decl) |
| `;` | statement terminator |
| `,` | separator |
| `.` | field access / enum literal prefix |
| `->` | return type annotation |
| `=>` | lambda arrow |
| `$` | generic type parameter introduction |
| `---` | undefined value |
| `()` | grouping / params |
| `{}` | blocks / bodies |
---
## 2. Type System
### Primitive Types
- `s1`..`s64` — signed integers (1 to 64 bits). `s64` is the default for integer literals.
- `u1`..`u64` — unsigned integers (1 to 64 bits).
- `f32` — 32-bit floating point
- `f64` — 64-bit floating point
- `bool` — boolean (`true` / `false`)
- `string` — string of characters
- `Any` — type-erased value, represented as `{ i64, i64 }` (type tag + payload). Used for variadic arguments and runtime type dispatch.
- `Type` — compile-time type value. At runtime, represented as an `i64` type tag (same tag space as `Any`).
### Enum Types
User-defined sum types with named variants. Variants may optionally carry typed data (tagged unions). Internally, payload-less enums are represented as `i64` (variant index). Enums with payloads are represented as `{ i64, [max_payload_size x i8] }` (tag + data).
#### Declaration
```sx
// Payload-less enum
Color :: enum {
red;
green;
blue;
}
// Enum with payloads (tagged union)
Shape :: enum {
circle: f32; // typed variant
rect: s32; // typed variant
none; // void variant
}
```
Variants are referenced with dot-prefix syntax: `.variant1`
#### Construction
```sx
c := Color.red; // payload-less
s :Shape = .circle(3.14); // inferred from context
s = .none; // void variant
s = Shape.rect(42); // explicit prefix
```
#### Payload Access
```sx
r := s.circle; // load payload as f32 (undefined behavior if wrong variant active)
```
#### Pattern Matching
```sx
if s == {
case .circle: print("circle\n");
case .rect: print("rect\n");
case .none: print("none\n");
}
```
#### Payload Capture
Match arms can capture the variant's payload into a local variable:
```sx
if s == {
case .circle: (radius) { print("radius: {}\n", radius); }
case .rect: (size) => print("size: {}\n", size);
}
```
The `(name)` after the colon binds the payload. Two forms:
- Block: `case .variant: (name) { body }`
- Short: `case .variant: (name) => expr;`
#### Enum Interpolation
Payload-less enums print as `.variant`. Enums with payloads print as `.variant(value)` or `<TypeName tag=N>`:
```sx
print("{}", s); // .circle(3.140000)
```
### Union Types (Untagged)
C-style untagged unions for zero-cost memory overlays (type punning). All fields share the same memory — no tag, no runtime overhead. The LLVM representation is `[max_field_size x i8]`.
#### Declaration
```sx
Overlay :: union {
f: f32;
i: s32;
}
```
All fields must have types (unlike enums, which may have void variants).
#### Anonymous Struct Fields (Member Promotion)
Anonymous `struct` fields inside a union have their members promoted to the union namespace:
```sx
Vec2 :: union {
data: [2]f32;
struct { x, y: f32; };
}
```
Access promoted members directly: `v.x`, `v.y` — these are zero-cost GEPs into the same underlying memory as `v.data[0]`, `v.data[1]`.
#### Initialization
Unions must be initialized with `---` (undefined) and then assigned per-field:
```sx
o :Overlay = ---;
o.f = 3.14;
print("{}\n", o.i); // reinterpret bits as s32
```
#### Restrictions
- Pattern matching (`if x == { case ... }`) is not supported on unions.
- Unions cannot be printed directly via `print("{}", union_val)` — access individual fields instead.
### Struct Types
User-defined product types with named fields.
```sx
Vec4 :: struct {
x, y, z, w: f32;
}
```
Fields are declared as `name1, name2: type;` (comma-separated names sharing a type, semicolon-terminated).
#### Field Defaults
Fields may have default values. Fields without an explicit default have a zero-value default. `---` marks a field as explicitly undefined.
```sx
Foo :: struct {
a : u2; // default is 0
b : u8 = 42; // default is 42
c : u8 = ---; // default is undefined
}
```
#### Struct Literals
```sx
// Positional (with type annotation — type inferred from annotation)
v1 : Vec4 = .{ 1, 2, 3, 0 };
// Positional (with type prefix)
v2 := Vec4.{ 4, 1, 1, 3 };
// Named fields (any order)
v3 := Vec4.{ w=0, x=2, y=3, z=4 };
// Mixed named + shorthand (bare identifier = field name matches variable name)
z := 5.0;
w := 6.0;
v4 := Vec4.{ y=3, x=9, w, z };
```
#### Field Access and Assignment
```sx
v1.x // read field x of struct v1
v1.x = 3.0; // assign to field x of struct v1
```
#### `#using` — Struct Composition
`#using StructName;` inside a struct declaration embeds all fields from `StructName` at that position. The embedded fields are accessed directly, as if declared inline.
```sx
UBase :: struct { x: s32; y: s32; }
UExt :: struct { #using UBase; z: s32; }
e := UExt.{ x = 1, y = 2, z = 3 };
print("{}\n", e.x); // 1
```
`#using` may appear at any field position (beginning, middle, end) and multiple `#using` entries are allowed:
```sx
UPos :: struct { px: s32; py: s32; }
UCol :: struct { r: s32; g: s32; }
USprite :: struct { #using UPos; #using UCol; scale: s32; }
s := USprite.{ px = 10, py = 20, r = 255, g = 128, scale = 1 };
```
The referenced struct must be declared before use. This is purely a compile-time field expansion — no runtime overhead.
#### Struct Interpolation
Struct values in string interpolation print as `TypeName{field:value, ...}`:
```sx
print("{}", v1); // Vec4{x:1.0, y:2.0, z:3.0, w:0.0}
```
### Tuple Types
Anonymous product types with optional field names. Tuples are first-class values — they can be stored in variables, passed to functions, and returned.
#### Construction
```sx
pair := (40, 2); // positional tuple: (s64, s64)
named := (x: 10, y: 20); // named tuple: (x: s64, y: s64)
single := (42,); // 1-tuple (trailing comma in value position)
zeroed : (s32, s32) = ---; // zero-initialized tuple
```
Note: In value position, `(expr)` without a comma is a grouping expression, not a tuple. Use `(expr,)` for a 1-tuple value.
#### Type Syntax
In type position, `(T)` is always a tuple type — no trailing comma needed. The `->` arrow disambiguates function types from tuple types:
```sx
(s64) // tuple type with one field
(s64, s64) // tuple type with two fields
(s64) -> s64 // function type: takes s64, returns s64
(s64, s64) -> s64 // function type: takes two s64, returns s64
```
#### Field Access
```sx
pair.0; // 40 — numeric index
pair.1; // 2
named.x; // 10 — named field
named.0; // 10 — numeric index also works on named tuples
```
#### As Return Type
```sx
swap :: (a: s64, b: s64) -> (s64, s64) { (b, a); }
wrap :: (x: s64) -> (s64) { (x,); }
s := swap(1, 2); // s.0 = 2, s.1 = 1
t := wrap(42); // t.0 = 42
```
#### Representation
Tuples are represented as anonymous LLVM struct types (same layout as named structs). A tuple `(s64, s64)` has LLVM type `{ i64, i64 }`.
#### Tuple Operators
**Equality and inequality** — element-wise comparison, both sides must have the same field count:
```sx
(1, 2) == (1, 2) // true
(1, 2) != (1, 3) // true
```
**Concatenation** (`+`) — creates a new tuple with fields from both sides:
```sx
c := (1, 2) + (3, 4); // c : (s64, s64, s64, s64)
c.0; // 1
c.3; // 4
```
**Repetition** (`*`) — repeats a tuple N times (N must be a compile-time integer literal):
```sx
r := (1, 2) * 3; // r : (s64, s64, s64, s64, s64, s64)
r.0; // 1
r.5; // 2
```
**Lexicographic comparison** (`<`, `<=`, `>`, `>=`) — compares element-by-element left to right:
```sx
(1, 2) < (1, 3) // true (first fields equal, 2 < 3)
(2, 0) > (1, 9) // true (2 > 1, rest ignored)
(1, 2) <= (1, 2) // true (all equal, <= allows tie)
```
**Membership** (`in`) — checks if a value exists in a tuple:
```sx
3 in (1, 2, 3) // true
5 in (1, 2, 3) // false
```
### Array Types
Fixed-size arrays with element type and length.
```sx
buffer : [5]f32 = .[0, 2, 3.5, 4, 0];
val := buffer[2]; // 3.5
buffer.len // 5 (compile-time constant, s64)
```
Arrays can also be constructed programmatically with the `Array` builtin:
```sx
MyArr :: Array(5, s32); // equivalent to [5]s32
```
### Slice Types
A slice `[]T` is a fat pointer `{ptr, i64}` referencing a contiguous sequence of `T` elements. Same runtime layout as `string`.
```sx
// Arrays implicitly coerce to slices at call sites
arr : [5]s32 = .[3, 1, 4, 1, 5];
sortSlice(arr); // [5]s32 → []s32 coercion
// Slice operations
items[i] // read element at index
items[i] = val; // write element at index
items.len // length (s64)
items.ptr // raw pointer
```
Slices support generic type parameters: `[]$T` introduces type parameter `T` inferred from the element type of the argument (array or slice).
### Subslicing
Arrays, slices, and strings support subslice syntax to create zero-copy views:
```sx
arr : [5]s32 = .[3, 1, 4, 1, 5];
sub := arr[1..4]; // []s32 → [1, 4, 1]
head := arr[..3]; // []s32 → [3, 1, 4]
tail := arr[2..]; // []s32 → [4, 1, 5]
msg := "hello world";
word := msg[6..11]; // string → "world"
```
- `expr[start..end]` — elements from `start` (inclusive) to `end` (exclusive)
- `expr[start..]` — elements from `start` to end
- `expr[..end]` — elements from beginning to `end`
- Result type: `[]T` for arrays/slices, `string` for strings
- No memory allocation — the result points into the original backing storage
### Pointer Types
| Syntax | Meaning | `.len` | `[i]` |
|--------|---------|--------|-------|
| `*T` | pointer to one T | no | no |
| `[*]T` | many-pointer (buffer) | no | yes |
| `*[N]T` | pointer to array of N T | yes | yes |
| `*[]T` | pointer to slice | yes | yes |
**Address-of**: `@x` returns a pointer to the variable.
```sx
v := Vec2.{ 1.0, 2.0 };
ptr := @v; // *Vec2
```
**Dereference**: `p.*` loads the value through the pointer.
```sx
copy := ptr.*; // Vec2
```
**Auto-deref**: `p.field` is sugar for `p.*.field`.
```sx
set_x :: (p: *Vec2, val: f32) {
p.x = val; // auto-deref: p.*.x = val
}
set_x(@v, 99.0);
```
**Null**: All pointer types are nullable. `null` is the null pointer literal.
```sx
np : *Vec2 = null;
```
**Many-pointer**: `[*]T` supports indexing for buffers of unknown size.
```sx
arr : [5]s32 = .[10, 20, 30, 40, 50];
mp : [*]s32 = @arr[0]; // *s32 → [*]s32 implicit
val := mp[2]; // 30
```
**Implicit conversions**:
- `*T``[*]T` (pointer to element → many-pointer)
- `*[N]T``[*]T` (pointer to array → many-pointer)
- `[N]T``[*]T` at call sites (array decays to many-pointer)
- `[]T``[*]T` (slice decays to many-pointer, extracts `.ptr`)
- `T``*T` at call sites (implicit address-of)
- `null` (`*void`) → any `*T`
**Fat pointer layout**: `[:0]u8`, `string`, and `[]T` are `{ptr, i64}` structs. The raw pointer is always the first field at offset 0. This means `*[:0]u8` works as C's `char**` — a C function dereferences through the outer pointer and reads the raw `char*` from offset 0.
### Foreign Function Interface (C Interop)
To call C functions, declare a library constant with `#library` and bind functions with `#foreign`:
```sx
// Declare a named library constant
libc :: #library "c";
sdl :: #library "SDL3";
// Bind foreign functions — library ref is required
socket :: (domain: s32, type: s32, protocol: s32) -> s32 #foreign libc;
SDL_Init :: (flags: u32) -> bool #foreign sdl;
// Symbol renaming — optional second argument gives the C symbol name
write_fd :: (fd: s32, buf: [*]u8, count: u64) -> s64 #foreign libc "write";
```
- `#library "name"` must be assigned to a named constant. The library is passed to the linker (`-lname` on Unix, `name.lib` on Windows).
- `#foreign lib_ref` declares a function as external C. The library reference is mandatory.
- `#foreign lib_ref "c_symbol"` renames the binding: the sx function name differs from the C symbol. This avoids name collisions (e.g. POSIX `write` vs an sx builtin).
### C Interop Type Mapping
| C type | sx type | Notes |
|--------|---------|-------|
| `const char*` (input) | `[:0]u8` | compiler extracts `.ptr` at call site |
| `char*` (output buffer) | `[*]u8` | raw buffer, no length |
| `const char**` | `*[:0]u8` | address of `[:0]u8``.ptr` at offset 0 |
| `int*` (single out) | `*s32` | |
| `unsigned*` (single out) | `*u32` | |
| `float*` (buffer) | `[*]f32` | |
| `void*` (generic) | `*void` | only for truly opaque/generic data |
### Vector Types (SIMD)
LLVM SIMD vectors, parameterized by length and element type.
```sx
v := vec3(1, 3, 2); // Vector(3, f32)
```
**Arithmetic**: Element-wise `+`, `-`, `*`, `/` on vectors of same dimensions.
```sx
add := v1 + v2; // element-wise addition
```
**Scalar broadcast**: Scalar operands are broadcast to match the vector.
```sx
scaled := v * 2.0; // [2.0, 6.0, 4.0]
```
**Negation**: Unary `-` negates each element.
```sx
neg := -v; // [-1.0, -3.0, -2.0]
```
**Element access**: `.x`, `.y`, `.z`, `.w` (aliases `.r`, `.g`, `.b`, `.a`) extract single components.
```sx
v.x // first element
v.z // third element
```
**Index access**: `v[i]` extracts by index.
```sx
v[0] // first element
```
**Built-in `sqrt`**: Calls LLVM `llvm.sqrt.f32`/`.f64` intrinsic.
```sx
s := sqrt(9.0); // 3.0
```
### Function Types
Expressed as `(param_types) -> return_type`.
A function with no return type annotation returns void.
```sx
// type is (s32) -> s32
compute :: (x: s32) -> s32 { x * x; }
// type is () -> void
main :: () { }
```
### Type Aliases
A name bound to an existing type.
```sx
SOME_TYPE :: f64;
```
### Generic Functions (Monomorphization)
Functions can be parameterized over types using `$T` syntax. The `$` prefix introduces a type parameter; subsequent uses of the name reference it.
```sx
sum :: (a: $T, b: T) -> T {
return a + b;
}
```
- `$T` in a parameter type **introduces** type parameter `T`
- Bare `T` (without `$`) **references** the introduced type parameter
- At call sites, type arguments are **inferred** from actual argument types:
```sx
sum(40, 2) // T = s32
sum(1.5, 2.5) // T = f32
```
- Each unique set of concrete types produces a **separate specialized function** (monomorphization)
- Multiple type parameters are supported: `(a: $T, b: $U) -> T`
### Variadic Functions
Functions can accept a variable number of arguments using `..Type` syntax:
```sx
print :: (fmt: string, args: ..Any) { ... }
```
- `..Any` means zero or more arguments, each boxed into `Any` (type tag + payload)
- The variadic parameter must be the last parameter
- At call sites, variadic arguments are automatically boxed: `print("x={}, y={}\n", x, y)`
- Inside the function body, `args` is accessed as a slice-like sequence
### Type Inference
- `::` bindings infer type from the right-hand side
- `:=` bindings infer type from the right-hand side
- Explicit annotation overrides inference: `NAME : f64 : 0.9;`
- Integer literals default to `s64`
- Float literals default to `f32`
- Enum literals (`.variant`) infer their enum type from context (expected type)
### Type Conversions
**Implicit (widening)** — allowed without annotation:
- Integer to wider integer of same signedness (`u8` → `u16`, `s8` → `s32`)
- Unsigned to strictly wider signed (`u8` → `s16`)
- Any integer to any float (`u8` → `f32`, `s32` → `f64`)
- Float to wider float (`f32` → `f64`)
- Integer and float literals can convert to any numeric type implicitly
**Explicit (narrowing)** — requires `xx` prefix:
- Integer to narrower integer (`s32` → `u8`)
- Signed to unsigned (`s32` → `u32`)
- Float to narrower float (`f64` → `f32`)
- Float to any integer (`f64` → `u16`)
- Unsigned to signed of same or narrower width (`u8` → `s8`)
The `xx` prefix operator marks an expression for auto-conversion to the expected type from context (assignment, declaration, argument, return):
```sx
large: f64 = 5999.5;
x : u16 = xx large; // f64 → u16
d : u8 = #run xx resolve(5); // s32 → u8 at compile time
```
Using `xx` outside a typed context (where the target type is known) is a compile error.
---
## 3. Declarations
### Constant Binding (immutable)
```sx
// inferred type
NAME :: value;
// explicit type
NAME : type : value;
```
The `::` operator creates an immutable binding. The value is evaluated at compile time when possible.
Examples:
```sx
SOME_INT :: 0; // s32
SOME_STR :: "Hello"; // string
SOME_FLOAT :: 0.3; // f32
SOME_DOUBLE : f64 : 0.9; // f64 (explicit)
SOME_FUNC :: () => 42; // () -> s32
SOME_TYPE :: f64; // type alias
```
### Variable Binding (mutable)
```sx
// inferred type
name := value;
// explicit type
name : type = value;
// default-initialized (type required)
name : type;
// undefined (type required)
name : type = ---;
```
The `:=` operator creates a mutable binding. The type is inferred unless explicitly annotated.
`name : type;` initializes using the type's defaults: zero for primitives, per-field defaults for structs (see Field Defaults).
`name : type = ---;` leaves the value undefined (uninitialized memory). Reading before writing is undefined behavior.
Examples:
```sx
x := 42; // s32, mutable
x := if true then 1 else 2;
z : Foo = .variant2; // Foo, mutable, explicit type
a : Foo; // Foo, default-initialized (a=0, b=42, c=undef)
b : Foo = ---; // Foo, entirely undefined
```
### Function Definition
```sx
name :: (params) -> return_type {
body
}
```
- Parameters: `name: type` separated by commas
- Return type: `-> type` (omit for void)
- Body: block of statements; last expression is the implicit return value
- No `return` keyword needed (last expression = return value)
Examples:
```sx
compute :: (x: s32) -> s32 {
x * x;
}
main :: () {
// void return, no -> annotation
}
// Bare-block shorthand (equivalent to no-arg void function):
main :: {
// same as main :: () { ... }
}
```
### Enum Definition
```sx
Name :: enum {
variant1;
variant2;
}
```
Defines a new enum type with the given variants. Trailing comma is allowed.
### Enum Backing Type
An optional backing type can be specified after the `enum` keyword (Jai-style):
```sx
Color :: enum u8 { red; green; blue; }
Status :: enum s16 { ok; error; timeout; }
```
Syntax: `Name :: enum [flags] [type] { ... }`
The backing type must be an integer type (`u8`, `u16`, `u32`, `s8`, `s16`, `s32`, `s64`, etc.). When omitted, the default is `s64`. This is useful for C interop (matching C enum sizes) and memory efficiency.
### Enum Layout Struct
For C interop with tagged unions (e.g. SDL_Event), a struct can be used as the backing type to specify the exact memory layout:
```sx
// Inline layout
SDL_Event :: enum struct { tag: u32; _: u32; payload: [30]u32; } {
quit :: 0x100;
key_down :: 0x300: SDL_KeyData;
key_up :: 0x301: SDL_KeyData;
}
// Named layout
EventLayout :: struct { tag: u32; _: u32; payload: [30]u32; }
SDL_Event :: enum EventLayout {
quit :: 0x100;
key_down :: 0x300: SDL_KeyData;
}
```
The layout struct must have:
- A field named `tag` — integer type, the discriminant. Its type becomes the enum's backing type.
- A field named `payload` — array type, the variant data area. Its size determines the maximum payload capacity.
- Any other fields are treated as padding/reserved and positioned by the struct layout.
This gives explicit control over the memory layout instead of relying on automatic alignment. The total size equals the struct size. Without a layout struct, tagged enums use `{ tag, [max_payload_size x i8] }` with no padding.
### Enum Flags
```sx
Perms :: enum flags {
read; // 1
write; // 2
execute; // 4
}
```
Flags can also specify a backing type:
```sx
SDL_InitFlags :: enum flags u32 {
video :: 0x20;
audio :: 0x10;
}
```
The `flags` modifier assigns auto power-of-2 values (1, 2, 4, 8, ...) instead of sequential indices (0, 1, 2, ...). Flags can be combined with `|` and tested with `&`:
```sx
p :Perms = .read | .write;
if p & .execute { ... }
print("{}\n", p); // .read | .write
```
Explicit values use `::` syntax (Jai-style):
```sx
WindowFlags :: enum flags {
vsync :: 64;
resizable :: 4;
hidden :: 128;
}
```
Restrictions:
- Flags enum variants cannot have payloads
- `flags` is a contextual identifier, not a keyword
### Bitwise Operators
All bitwise operators work on integer types. `>>` is arithmetic (sign-extending) for signed types and logical (zero-filling) for unsigned types.
```sx
x := 0xFF & 0x0F; // 15 — AND
y := 1 | 2 | 4; // 7 — OR
z := 0xFF ^ 0x0F; // 240 — XOR
w := ~0; // -1 — NOT
a := 1 << 4; // 16 — left shift
b := 256 >> 4; // 16 — right shift
```
Compound assignment forms: `&=`, `|=`, `^=`, `<<=`, `>>=`.
```sx
x := 0xFF;
x &= 0x0F; // 15
x |= 0xF0; // 255
x ^= 0x0F; // 240
y := 1;
y <<= 8; // 256
y >>= 4; // 16
```
---
## 4. Expressions
Everything in `sx` is expression-oriented where possible.
### Operator Precedence
| Prec | Operators | Notes |
|------|-----------|-------|
| 9 (highest) | `*`, `/`, `%` | multiplication, division, modulo |
| 8 | `+`, `-` | addition, subtraction |
| 7 | `<<`, `>>` | shifts |
| 6 | `<`, `<=`, `>`, `>=`, `==`, `!=` | comparisons (chainable) |
| 5 | `&` | bitwise AND |
| 4 | `^` | bitwise XOR |
| 3 | `\|` | bitwise OR |
| 2 | `and` | logical AND (short-circuit) |
| 1 (lowest) | `or` | logical OR (short-circuit) |
### Arithmetic
Standard infix: `+`, `-`, `*`, `/` with usual precedence (`*`/`/` before `+`/`-`).
```sx
x * x
x + 2
```
### Chained Comparisons
Comparison operators can be chained. Each operand is evaluated exactly once.
```sx
0 <= x <= 100 // equivalent to: 0 <= x and x <= 100
1000 > x >= -100 // equivalent to: 1000 > x and x >= -100
a == b == c // equivalent to: a == b and b == c
```
Mixed operators are allowed: `a < b <= c > d` means `a < b and b <= c and c > d`.
### Logical Operators
`and` and `or` are short-circuit boolean operators. The right operand is not evaluated if the left operand determines the result.
```sx
if 0 <= x <= 100 and 0 <= y <= 100 {
print("contained");
}
```
### If Expression (inline form)
```sx
if condition then consequent else alternate
```
Both branches are single expressions. The whole form produces a value.
```sx
x := if true then 1 else 2;
```
The `else` branch is optional. Without it, the form is a statement (no value):
```sx
if i == 2 then continue;
if done then break;
if err then return;
```
### If Expression (block form)
```sx
if condition {
stmts
} else {
stmts
}
```
Each branch is a block. The last expression in each block is the branch's value. Can be used inline within other expressions:
```sx
y := x + if false {
7;
} else {
12;
};
```
### Pattern Matching
```sx
if subject == {
case pattern: body
case pattern: body
else: body // optional default arm
}
```
Matches `subject` against each `case`. Patterns can be:
- **Enum literals**: `.variant` — matches a specific enum variant.
- **Integer/bool literals**: `42`, `true` — matches a specific value.
- **Type categories**: `struct`, `enum`, `union` — matches all types in that category (used with `type_of` values).
`break` exits a case arm without producing a value. The optional `else:` arm matches when no `case` pattern matches.
```sx
if z == {
case .variant1: break;
case .variant2:
print("z: {z}");
else:
print("unknown");
}
```
#### Type Category Matching
When switching on a `Type` value (from `type_of`), category keywords match all registered types of that category:
```sx
type := type_of(val);
if type == {
case int: result = int_to_string(xx val);
case struct: result = struct_to_string(cast(type) val);
case enum: result = enum_to_string(cast(type) val);
}
```
Available categories: `int`, `float`, `bool`, `string`, `struct`, `enum`, `vector`, `array`, `slice`, `pointer`, `type`.
> Note: `case enum:` matches both payload-less enums and tagged enums (enums with payloads). C-style untagged unions are not registered with the Any type system and cannot be matched by category.
Inside a category arm, `cast(type) val` performs **runtime generic dispatch**: the compiler generates a switch over all types in the category, monomorphizing the callee for each concrete type.
### While Loop
```sx
while condition {
body
}
```
Repeats `body` as long as `condition` is true. `break;` exits the loop. `continue;` skips to the next iteration.
```sx
i := 0;
while i < 10 {
i += 1;
if i == 5 { continue; }
if i == 8 { break; }
print("{i}\n");
}
```
### For Loop
```sx
for iterable: (elem) { } // element alias (no copy)
for iterable: (elem, ix) { } // element + index
for iterable: (_, ix) { } // index only
```
Iterates over arrays and slices. The capture clause after `:` binds loop variables:
- The first name is the element capture (non-reassignable alias into the array/slice)
- The optional second name is the index (s64, starting at 0, also non-reassignable)
- Use `_` to discard a capture
The element capture is a direct alias — reads and field writes go to the original array element. Direct reassignment of the capture (`elem = x`) is a compile error.
`break;` exits the loop. `continue;` skips to the next iteration.
```sx
arr : [5]s32 = .[1, 2, 3, 4, 5];
for arr: (val, ix) {
if ix == 2 { continue; }
print("{}\n", val);
}
```
### Lambda
```sx
(params) => expr
(params) -> return_type => expr
```
Anonymous function. Produces a function value. Supports the same parameter features as named functions: `$` generic type params, `..` variadic params, and optional return type annotation.
```sx
SOME_FUNC :: () => 42; // () -> s32
double :: (x: $T) -> T => x + x; // generic lambda with return type
```
### Function Call
```sx
callee(args)
```
```sx
compute(6)
print("hello")
```
### UFCS (Uniform Function Call Syntax)
```sx
object.func(args) // equivalent to func(object, args)
```
When `object.func(args)` is encountered and `func` is not a field of `object`'s type, the compiler rewrites the call to `func(object, args)`. This enables method-like syntax without dedicated method declarations.
```sx
Point :: struct { x: s32; y: s32; }
point_sum :: (p: Point) -> s32 { p.x + p.y; }
p := Point.{3, 4};
print("{}\n", p.point_sum()); // calls point_sum(p) → 7
```
UFCS works with pointer receivers (auto-deref applies) and generic functions. If the field name exists as both a struct field and a free function, the struct field takes priority.
#### UFCS Aliases
The `ufcs` keyword creates a name alias for a function, decoupling the method name from the function name:
```sx
arena_alloc :: (arena: *Arena, size: s64) -> *void { ... }
alloc :: ufcs arena_alloc;
myArena.alloc(42); // calls arena_alloc(myArena, 42)
alloc(myArena, 42); // also works as a direct call
```
This avoids the naming redundancy of `myArena.arena_alloc(42)`.
#### Tuple UFCS Splatting
When a tuple is used as the receiver of a UFCS call, its elements are unpacked as leading arguments:
```sx
num_add :: (a: s64, b: s64) -> s64 { a + b; }
add :: ufcs num_add;
(40, 2).add(); // splats to num_add(40, 2) → 42
(40,).add(2); // partial: num_add(40, 2) → 42
40.add(2); // normal UFCS: num_add(40, 2) → 42
```
With more arguments:
```sx
compute :: (a: s64, b: s64, c: s64, d: s64) -> s64 { a + b * c - d; }
calc :: ufcs compute;
(1, 2, 3, 4).calc(); // full splat → compute(1, 2, 3, 4)
(1, 2).calc(3, 4); // partial splat → compute(1, 2, 3, 4)
1.calc(2, 3, 4); // normal UFCS → compute(1, 2, 3, 4)
```
### Pipe Operator
The pipe operator `|>` inserts the left-hand side as the first argument of the right-hand side call. It is desugared at parse time.
```sx
a |> f(b, c) // → f(a, b, c)
a |> f // → f(a)
a |> f(b) |> g(c) // → g(f(a, b), c)
```
The pipe is left-associative with the lowest precedence of all binary operators, so expressions like `x + 1 |> f(2)` are parsed as `f(x + 1, 2)`.
This is especially useful with namespaced imports:
```sx
pkg :: #import "modules/math";
3 |> pkg.add(4) // → pkg.add(3, 4) → 7
3 |> pkg.add(4) |> pkg.mul(2) // → pkg.mul(pkg.add(3, 4), 2) → 14
```
### Field Access
```sx
object.field
```
Used for module access (`std.print`) and struct member access.
### Enum Literal
```sx
.variant_name
```
The enum type is inferred from context (expected type from declaration or parameter).
---
## 5. Statements
Statements are terminated by `;`.
- **Declaration**: `name :: value;` / `name := value;`
- **Assignment**: `name = value;` / `name += value;` (and other compound assignments). Also supports field targets: `obj.field = value;`
- **Multi-target assignment**: `a, b = b, a;` — all RHS values are evaluated before any stores, enabling swaps without temporaries. Target count must equal value count. Only plain `=` is supported (no compound operators). Each target must be a valid lvalue (variable, field, index, dereference).
- **Expression statement**: `expr;` — evaluates the expression (last in a block = return value)
- **Return**: `return expr;` — returns from the enclosing function with the given value. `return;` returns void.
- **Break**: `break;` — exits a match arm or while loop
- **Continue**: `continue;` — skips to the next iteration of a while loop
- **Defer**: `defer expr;` — defers execution of `expr` until the enclosing block exits (LIFO order)
- **Push**: `push expr { body }` — scoped context override (see below)
### `push` Statement and Implicit `context`
The `push` statement temporarily overrides a global `context` variable for the duration of a block. The previous context is saved before the block and restored after it exits.
```sx
push Context.{ arena = @arena, data = xx @logger } {
handle(client); // inside here, `context` has the new value
}
// context is restored to its previous value here
```
**`Context` struct** — defined in `std.sx`:
```sx
Context :: struct {
arena: *Arena; // pointer to active arena allocator (or null)
data: *void; // opaque pointer for application-specific data
}
context : Context = ---; // global mutable variable
```
Inside the pushed block, any code (including called functions) can read `context.arena` and `context.data`. The standard library's `cstring()` function checks `context.arena` and uses it for allocation when available, falling back to `malloc()` otherwise.
`push` requires a global mutable variable named `context` to be in scope (provided by `std.sx`).
---
## 6. Blocks, Scoping, and Implicit Returns
A block `{ ... }` contains zero or more statements. The last expression in a block is its value (implicit return).
In function bodies, the last expression becomes the return value:
```sx
compute :: (x: s32) -> s32 {
x * x; // this is returned
}
```
### Scope Blocks
Bare blocks can be used as statements to introduce a new lexical scope. Variables declared inside a scope block are local to that block. No trailing `;` is required.
```sx
main :: {
x := 42;
{
x := 6; // shadows outer x
print("inner: {x}"); // prints 6
}
print("outer: {x}"); // prints 42
}
```
### Variable Shadowing
A variable declaration (`name :=`) inside an inner scope shadows any variable with the same name from outer scopes. The outer variable is restored when the inner scope exits.
### Defer
`defer expr;` schedules `expr` to execute when the enclosing scope block exits. Multiple defers in the same scope execute in reverse order (LIFO).
```sx
{
defer print("second");
defer print("first");
}
// prints: first, then second
```
---
## 7. Built-in Functions
Built-in functions are declared in `std.sx` with the `#builtin` suffix, which tells the compiler to generate the implementation internally rather than looking for a function body.
### I/O
- `out(str: string) -> void` — write a string to standard output
- `print(fmt: string, args: ..Any)` — formatted print. Parses `{}` placeholders in the format string and substitutes arguments. When all argument types are statically known, the compiler specializes the call at compile time (no `Any` boxing).
### Math
- `sqrt(x: $T) -> T` — square root (maps to LLVM intrinsic)
- `sin(x: $T) -> T` — sine (maps to LLVM intrinsic)
- `cos(x: $T) -> T` — cosine (maps to LLVM intrinsic)
### Memory
- `malloc(size: s64) -> *void` — allocate `size` bytes of heap memory
- `free(ptr: *void) -> void` — free previously allocated memory
- `memcpy(dst: *void, src: *void, size: s64) -> *void` — copy `size` bytes from `src` to `dst`
- `memset(dst: *void, val: s64, size: s64) -> void` — fill `size` bytes at `dst` with `val`
- `size_of($T: Type) -> s64` — size of type `T` in bytes
### Type Introspection
- `type_of(val: $T) -> Type` — returns the runtime type tag of a value
- `type_name($T: Type) -> string` — returns the name of type `T` as a string (e.g., `"Point"`)
- `field_count($T: Type) -> s64` — returns the number of fields (struct), variants (enum), or elements (vector) in type `T`
- `field_name($T: Type, idx: s64) -> string` — returns the name of the `idx`-th field (struct) or variant (enum) of type `T`
- `field_value(s: $T, idx: s64) -> Any` — returns the `idx`-th field (struct) or element (vector) of `s`, boxed as `Any`
- `field_value_int($T: Type, idx: s64) -> s64` — returns the integer value of the `idx`-th enum variant
- `field_index($T: Type, val: T) -> s64` — returns the sequential variant index for an explicit enum value (reverse of `field_value_int`). Returns `-1` if no variant matches.
- `is_flags($T: Type) -> bool` — returns `true` if `T` is a flags enum (declared with `#flags`)
### Type Conversion
- `cast(Type) expr` — prefix operator that converts `expr` to `Type`. Examples: `cast(s32) 3.14`, `cast(f64) n`. When `Type` is a runtime `Type` value inside a type-category match arm, the compiler generates a dispatch switch over all types in the category, monomorphizing the callee for each concrete type.
### Vectors
- `Vector($N: int, $T: Type) -> Type` — returns an LLVM vector type of `N` elements of type `T`
---
## 8. Compile-time Evaluation
### `#run` Directive
`#run expr` evaluates `expr` at compile time using lazy JIT execution. It can appear in two contexts:
**Compile-time constants** — bind a compile-time value to a name:
```sx
compute :: (x: s32) -> s32 { x * x; }
x :: #run compute(5); // x = 25, evaluated at compile time
```
Comptime globals are resolved lazily: the JIT executes only when the value is first referenced during code generation. Chained dependencies are resolved automatically.
**Side effects** — execute code at compile time for its side effects:
```sx
#run print("compiling...");
```
### `#insert` Directive
`#insert expr;` evaluates `expr` at compile time to obtain a string, then parses and compiles that string as inline code at the insertion point.
```sx
generate :: () -> string {
return "print(\"hello from the other side\");";
}
main :: () {
#insert #run generate();
// equivalent to: print("hello from the other side");
}
```
The inserted string must contain valid `sx` statements (including semicolons). The statements are parsed and compiled in the same scope as the `#insert` site. Variables created by one `#insert` are visible to subsequent `#insert` directives in the same function.
### Comptime Call Evaluation
When a `::` constant binding is initialized with a function call and all arguments are comptime-known (literals or other `::` constants), the compiler attempts to evaluate the entire call at compile time using the bytecode VM. If evaluation succeeds, the result is baked into the binary as a static constant with zero runtime overhead.
```sx
body :: "<html><body><h1>Hello</h1></body></html>";
response :: format("HTTP/1.1 200 OK\r\nContent-Length: {}\r\n\r\n{}", body.len, body);
// response is a static string constant — no runtime allocation
```
This works for any function, not just `format`. The mechanism is general: the VM compiles the function body (including `#insert` directives, variadic `..Any` args, and calls to other functions) and executes it entirely at compile time. If the VM encounters something it cannot evaluate (e.g., foreign function calls, unsupported operations), it silently falls through to runtime codegen.
---
## 9. Modules / Imports
### `#import` Directive
The `#import` directive brings declarations from another `.sx` file or directory into the current file. Paths are resolved relative to the importing file's directory.
**Flat import** — splices all declarations from the imported file into the current scope:
```sx
#import "modules/std/math.sx";
```
**Namespaced import** — wraps all declarations under a namespace name:
```sx
std :: #import "modules/std.sx";
```
**Directory import** — when the path refers to a directory, all `.sx` files in that directory are aggregated into a single module:
```sx
pkg :: #import "modules/testpkg"; // namespaced — all .sx files merged under pkg
#import "modules/testpkg"; // flat — all declarations spliced into scope
```
Directory imports scan only the top level of the specified directory (non-recursive). Files are processed in alphabetical order for deterministic builds. Files within the directory may `#import` each other or external files.
Namespaced declarations are accessed with dot notation:
```sx
std.print("hello");
```
### Import Resolution
- Imports are resolved after parsing and before code generation.
- Paths are relative to the directory of the file containing the `#import`.
- If the path resolves to a file, it is imported directly. If it resolves to a directory, all `.sx` files in that directory are aggregated.
- Nested imports are supported (imported files may themselves contain `#import`).
- Circular imports are detected and silently skipped (each file is imported at most once).
- Generic functions in namespaced imports are supported (e.g., `std.mul(5, 2)` where `mul` is generic).
### Intra-module References
Functions within a namespaced import can call each other without the namespace prefix. When generating code for a namespaced module, unresolved function names are automatically tried with the namespace prefix.
### Example
```sx
// modules/std/math.sx
mul :: (base: $T, exp: T) -> T { base * exp; }
// modules/std/std.sx
out :: (str: string) -> void #builtin;
// main.sx
std :: #import "modules/std.sx";
#import "modules/std/math.sx";
main :: () -> s32 {
std.out("hello there");
mul(5, 2);
}
```
---
## 10. Program Structure
A program is a sequence of top-level declarations and `#import` directives. Execution begins at `main`.
```sx
main :: () {
// entry point
}
```
`main` takes no arguments and returns void. The process exit code is 0 unless otherwise specified.
---
## 11. Grammar (informal)
```
program = top_level*
top_level = decl | import_decl
import_decl = '#import' STRING ';'
| IDENT '::' '#import' STRING ';'
decl = const_decl | var_decl | fn_decl | enum_decl | struct_decl
const_decl = IDENT '::' expr ';'
| IDENT ':' type ':' expr ';'
var_decl = IDENT ':=' expr ';'
| IDENT ':' type '=' expr ';'
| IDENT ':' type ';'
fn_decl = IDENT '::' '(' params? ')' ('->' type)? block
| IDENT '::' block
enum_decl = IDENT '::' 'enum' '{' (IDENT ';')* '}'
struct_decl = IDENT '::' 'struct' '{' struct_member* '}'
struct_member = field_group | '#using' IDENT ';'
field_group = IDENT (',' IDENT)* ':' type ('=' expr)? ';'
params = param (',' param)*
param = IDENT ':' type
block = '{' stmt* '}'
stmt = decl | assignment ';' | multi_assign ';' | return_stmt | defer_stmt | insert_stmt
| push_stmt | break_stmt | continue_stmt | expr ';'
return_stmt = 'return' expr? ';'
break_stmt = 'break' ';'
continue_stmt = 'continue' ';'
defer_stmt = 'defer' expr ';'
insert_stmt = '#insert' expr ';'
push_stmt = 'push' expr block
assignment = lvalue ('=' | '+=' | '-=' | '*=' | '/=') expr
multi_assign = lvalue (',' lvalue)+ '=' expr (',' expr)+
lvalue = IDENT | postfix '.' IDENT
expr = if_expr | match_expr | while_expr | for_expr | lambda | binary
while_expr = 'while' expr block
for_expr = 'for' expr ':' '(' IDENT [',' IDENT] ')' block
binary = unary (binop unary)*
unary = ('-' | '!' | 'xx' | 'cast' '(' type ')') postfix
| postfix
postfix = primary ('(' args? ')' | '.' IDENT | '.{' field_init_list '}')*
primary = INT | HEX_INT | BIN_INT | FLOAT | STRING | BOOL | IDENT | '---'
| '.' IDENT | '.' '{' field_init_list '}'
| '(' expr ')' | block | '#run' expr
field_init_list = field_init (',' field_init)*
field_init = IDENT '=' expr | IDENT | expr
if_expr = 'if' expr 'then' expr ('else' expr)?
| 'if' expr block ('else' block)?
match_expr = 'if' expr '==' '{' case_arm* else_arm? '}'
case_arm = 'case' pattern ':' (stmt* | 'break' ';')
else_arm = 'else' ':' stmt*
pattern = '.' IDENT | INT | BOOL | IDENT
lambda = '(' params? ')' ('->' type)? '=>' expr
args = expr (',' expr)*
type = '$' IDENT | 's32' | 'f32' | 'f64' | 'bool' | 'string'
| 'Any' | 'Type' | '..' type | '[' expr ']' type | IDENT
```
---
## 12. Open Questions
- **Nested functions**: Can functions be defined inside other functions?
- **Operator overloading**: Not shown — presumably no.
- **Top-level expressions**: Are bare expressions allowed at the top level or only declarations?