Files
sx/specs.md
2026-02-16 01:58:30 +02:00

1112 lines
34 KiB
Markdown

# sx language specification
## 1. Lexical Structure
### Comments
Line comments start with `//` and extend to end of line.
```sx
// this is a comment
```
### Identifiers
- Lowercase or mixed-case for variables, functions: `x`, `compute`, `main`
- UPPER_SNAKE_CASE for constants: `SOME_INT`, `SOME_STR`
- PascalCase for types: `Foo`
### Literals
| Kind | Examples | Type |
|-----------|---------------------|---------|
| Integer | `0`, `42`, `0xFF`, `0b1010` | `s64` |
| Float | `0.3`, `0.9` | `f32` |
| String | `"Hello"`, `"z: {z}"` | `string` (may span multiple lines) |
| Heredoc String | `#string END`...`END` | `string` |
| Boolean | `true`, `false` | `bool` |
| Enum | `.variant1` | inferred from context |
| Undefined | `---` | context-dependent |
String literals support escape sequences (`\n`, `\t`, `\r`, `\\`, `\"`, `\0`) and may span multiple lines directly:
```sx
shader_src := "#version 330 core
void main() {
gl_Position = vec4(0.0);
}
";
```
**Heredoc strings** use `#string DELIMITER` syntax (inspired by Jai). Content is completely raw — no escape processing. The delimiter is any identifier. Content starts after the newline following the delimiter and ends when the delimiter appears at column 0 of a line.
```sx
vert_src := #string GLSL
#version 330 core
void main() {
gl_Position = vec4(aPos, 1.0);
}
GLSL;
```
### Keywords
`if`, `else`, `then`, `while`, `break`, `continue`, `true`, `false`, `enum`, `struct`, `union`, `case`, `return`, `defer`, `xx`, `and`, `or`
> Note: `enum` is used for both payload-less and payload-bearing sum types (tagged unions). `union` is reserved for C-style untagged unions (memory overlays).
### Operators
| Operator | Meaning |
|----------|------------------|
| `+` | addition |
| `-` | subtraction / negation |
| `*` | multiplication |
| `/` | division |
| `==` | equality |
| `!=` | inequality |
| `<` | less than |
| `>` | greater than |
| `<=` | less or equal |
| `>=` | greater or equal |
| `&` | bitwise AND |
| `\|` | bitwise OR |
| `and` | logical AND (short-circuit) |
| `or` | logical OR (short-circuit) |
| `+=` | add-assign |
| `-=` | sub-assign |
| `*=` | mul-assign |
| `/=` | div-assign |
### Delimiters and Punctuation
| Token | Meaning |
|--------|--------------------------------------|
| `::` | constant binding / definition |
| `:=` | variable binding (mutable, inferred) |
| `:` | type annotation |
| `=` | assignment (in typed var decl) |
| `;` | statement terminator |
| `,` | separator |
| `.` | field access / enum literal prefix |
| `->` | return type annotation |
| `=>` | lambda arrow |
| `$` | generic type parameter introduction |
| `---` | undefined value |
| `()` | grouping / params |
| `{}` | blocks / bodies |
---
## 2. Type System
### Primitive Types
- `s1`..`s64` — signed integers (1 to 64 bits). `s64` is the default for integer literals.
- `u1`..`u64` — unsigned integers (1 to 64 bits).
- `f32` — 32-bit floating point
- `f64` — 64-bit floating point
- `bool` — boolean (`true` / `false`)
- `string` — string of characters
- `Any` — type-erased value, represented as `{ i64, i64 }` (type tag + payload). Used for variadic arguments and runtime type dispatch.
- `Type` — compile-time type value. At runtime, represented as an `i64` type tag (same tag space as `Any`).
### Enum Types
User-defined sum types with named variants. Variants may optionally carry typed data (tagged unions). Internally, payload-less enums are represented as `i64` (variant index). Enums with payloads are represented as `{ i64, [max_payload_size x i8] }` (tag + data).
#### Declaration
```sx
// Payload-less enum
Color :: enum {
red;
green;
blue;
}
// Enum with payloads (tagged union)
Shape :: enum {
circle: f32; // typed variant
rect: s32; // typed variant
none; // void variant
}
```
Variants are referenced with dot-prefix syntax: `.variant1`
#### Construction
```sx
c := Color.red; // payload-less
s :Shape = .circle(3.14); // inferred from context
s = .none; // void variant
s = Shape.rect(42); // explicit prefix
```
#### Payload Access
```sx
r := s.circle; // load payload as f32 (undefined behavior if wrong variant active)
```
#### Pattern Matching
```sx
if s == {
case .circle: print("circle\n");
case .rect: print("rect\n");
case .none: print("none\n");
}
```
#### Payload Capture
Match arms can capture the variant's payload into a local variable:
```sx
if s == {
case .circle: (radius) { print("radius: {}\n", radius); }
case .rect: (size) => print("size: {}\n", size);
}
```
The `(name)` after the colon binds the payload. Two forms:
- Block: `case .variant: (name) { body }`
- Short: `case .variant: (name) => expr;`
#### Enum Interpolation
Payload-less enums print as `.variant`. Enums with payloads print as `.variant(value)` or `<TypeName tag=N>`:
```sx
print("{}", s); // .circle(3.140000)
```
### Union Types (Untagged)
C-style untagged unions for zero-cost memory overlays (type punning). All fields share the same memory — no tag, no runtime overhead. The LLVM representation is `[max_field_size x i8]`.
#### Declaration
```sx
Overlay :: union {
f: f32;
i: s32;
}
```
All fields must have types (unlike enums, which may have void variants).
#### Anonymous Struct Fields (Member Promotion)
Anonymous `struct` fields inside a union have their members promoted to the union namespace:
```sx
Vec2 :: union {
data: [2]f32;
struct { x, y: f32; };
}
```
Access promoted members directly: `v.x`, `v.y` — these are zero-cost GEPs into the same underlying memory as `v.data[0]`, `v.data[1]`.
#### Initialization
Unions must be initialized with `---` (undefined) and then assigned per-field:
```sx
o :Overlay = ---;
o.f = 3.14;
print("{}\n", o.i); // reinterpret bits as s32
```
#### Restrictions
- Pattern matching (`if x == { case ... }`) is not supported on unions.
- Unions cannot be printed directly via `print("{}", union_val)` — access individual fields instead.
### Struct Types
User-defined product types with named fields.
```sx
Vec4 :: struct {
x, y, z, w: f32;
}
```
Fields are declared as `name1, name2: type;` (comma-separated names sharing a type, semicolon-terminated).
#### Field Defaults
Fields may have default values. Fields without an explicit default have a zero-value default. `---` marks a field as explicitly undefined.
```sx
Foo :: struct {
a : u2; // default is 0
b : u8 = 42; // default is 42
c : u8 = ---; // default is undefined
}
```
#### Struct Literals
```sx
// Positional (with type annotation — type inferred from annotation)
v1 : Vec4 = .{ 1, 2, 3, 0 };
// Positional (with type prefix)
v2 := Vec4.{ 4, 1, 1, 3 };
// Named fields (any order)
v3 := Vec4.{ w=0, x=2, y=3, z=4 };
// Mixed named + shorthand (bare identifier = field name matches variable name)
z := 5.0;
w := 6.0;
v4 := Vec4.{ y=3, x=9, w, z };
```
#### Field Access and Assignment
```sx
v1.x // read field x of struct v1
v1.x = 3.0; // assign to field x of struct v1
```
#### Struct Interpolation
Struct values in string interpolation print as `TypeName{field:value, ...}`:
```sx
print("{}", v1); // Vec4{x:1.0, y:2.0, z:3.0, w:0.0}
```
### Array Types
Fixed-size arrays with element type and length.
```sx
buffer : [5]f32 = .[0, 2, 3.5, 4, 0];
val := buffer[2]; // 3.5
buffer.len // 5 (compile-time constant, s64)
```
Arrays can also be constructed programmatically with the `Array` builtin:
```sx
MyArr :: Array(5, s32); // equivalent to [5]s32
```
### Slice Types
A slice `[]T` is a fat pointer `{ptr, i64}` referencing a contiguous sequence of `T` elements. Same runtime layout as `string`.
```sx
// Arrays implicitly coerce to slices at call sites
arr : [5]s32 = .[3, 1, 4, 1, 5];
sortSlice(arr); // [5]s32 → []s32 coercion
// Slice operations
items[i] // read element at index
items[i] = val; // write element at index
items.len // length (s64)
items.ptr // raw pointer
```
Slices support generic type parameters: `[]$T` introduces type parameter `T` inferred from the element type of the argument (array or slice).
### Subslicing
Arrays, slices, and strings support subslice syntax to create zero-copy views:
```sx
arr : [5]s32 = .[3, 1, 4, 1, 5];
sub := arr[1..4]; // []s32 → [1, 4, 1]
head := arr[..3]; // []s32 → [3, 1, 4]
tail := arr[2..]; // []s32 → [4, 1, 5]
msg := "hello world";
word := msg[6..11]; // string → "world"
```
- `expr[start..end]` — elements from `start` (inclusive) to `end` (exclusive)
- `expr[start..]` — elements from `start` to end
- `expr[..end]` — elements from beginning to `end`
- Result type: `[]T` for arrays/slices, `string` for strings
- No memory allocation — the result points into the original backing storage
### Pointer Types
| Syntax | Meaning | `.len` | `[i]` |
|--------|---------|--------|-------|
| `*T` | pointer to one T | no | no |
| `[*]T` | many-pointer (buffer) | no | yes |
| `*[N]T` | pointer to array of N T | yes | yes |
| `*[]T` | pointer to slice | yes | yes |
**Address-of**: `@x` returns a pointer to the variable.
```sx
v := Vec2.{ 1.0, 2.0 };
ptr := @v; // *Vec2
```
**Dereference**: `p.*` loads the value through the pointer.
```sx
copy := ptr.*; // Vec2
```
**Auto-deref**: `p.field` is sugar for `p.*.field`.
```sx
set_x :: (p: *Vec2, val: f32) {
p.x = val; // auto-deref: p.*.x = val
}
set_x(@v, 99.0);
```
**Null**: All pointer types are nullable. `null` is the null pointer literal.
```sx
np : *Vec2 = null;
```
**Many-pointer**: `[*]T` supports indexing for buffers of unknown size.
```sx
arr : [5]s32 = .[10, 20, 30, 40, 50];
mp : [*]s32 = @arr[0]; // *s32 → [*]s32 implicit
val := mp[2]; // 30
```
**Implicit conversions**:
- `*T``[*]T` (pointer to element → many-pointer)
- `*[N]T``[*]T` (pointer to array → many-pointer)
- `[N]T``[*]T` at call sites (array decays to many-pointer)
- `[]T``[*]T` (slice decays to many-pointer, extracts `.ptr`)
- `T``*T` at call sites (implicit address-of)
- `null` (`*void`) → any `*T`
**Fat pointer layout**: `[:0]u8`, `string`, and `[]T` are `{ptr, i64}` structs. The raw pointer is always the first field at offset 0. This means `*[:0]u8` works as C's `char**` — a C function dereferences through the outer pointer and reads the raw `char*` from offset 0.
### C Interop Type Mapping
| C type | sx type | Notes |
|--------|---------|-------|
| `const char*` (input) | `[:0]u8` | compiler extracts `.ptr` at call site |
| `char*` (output buffer) | `[*]u8` | raw buffer, no length |
| `const char**` | `*[:0]u8` | address of `[:0]u8``.ptr` at offset 0 |
| `int*` (single out) | `*s32` | |
| `unsigned*` (single out) | `*u32` | |
| `float*` (buffer) | `[*]f32` | |
| `void*` (generic) | `*void` | only for truly opaque/generic data |
### Vector Types (SIMD)
LLVM SIMD vectors, parameterized by length and element type.
```sx
v := vec3(1, 3, 2); // Vector(3, f32)
```
**Arithmetic**: Element-wise `+`, `-`, `*`, `/` on vectors of same dimensions.
```sx
add := v1 + v2; // element-wise addition
```
**Scalar broadcast**: Scalar operands are broadcast to match the vector.
```sx
scaled := v * 2.0; // [2.0, 6.0, 4.0]
```
**Negation**: Unary `-` negates each element.
```sx
neg := -v; // [-1.0, -3.0, -2.0]
```
**Element access**: `.x`, `.y`, `.z`, `.w` (aliases `.r`, `.g`, `.b`, `.a`) extract single components.
```sx
v.x // first element
v.z // third element
```
**Index access**: `v[i]` extracts by index.
```sx
v[0] // first element
```
**Built-in `sqrt`**: Calls LLVM `llvm.sqrt.f32`/`.f64` intrinsic.
```sx
s := sqrt(9.0); // 3.0
```
### Function Types
Expressed as `(param_types) -> return_type`.
A function with no return type annotation returns void.
```sx
// type is (s32) -> s32
compute :: (x: s32) -> s32 { x * x; }
// type is () -> void
main :: () { }
```
### Type Aliases
A name bound to an existing type.
```sx
SOME_TYPE :: f64;
```
### Generic Functions (Monomorphization)
Functions can be parameterized over types using `$T` syntax. The `$` prefix introduces a type parameter; subsequent uses of the name reference it.
```sx
sum :: (a: $T, b: T) -> T {
return a + b;
}
```
- `$T` in a parameter type **introduces** type parameter `T`
- Bare `T` (without `$`) **references** the introduced type parameter
- At call sites, type arguments are **inferred** from actual argument types:
```sx
sum(40, 2) // T = s32
sum(1.5, 2.5) // T = f32
```
- Each unique set of concrete types produces a **separate specialized function** (monomorphization)
- Multiple type parameters are supported: `(a: $T, b: $U) -> T`
### Variadic Functions
Functions can accept a variable number of arguments using `..Type` syntax:
```sx
print :: (fmt: string, args: ..Any) { ... }
```
- `..Any` means zero or more arguments, each boxed into `Any` (type tag + payload)
- The variadic parameter must be the last parameter
- At call sites, variadic arguments are automatically boxed: `print("x={}, y={}\n", x, y)`
- Inside the function body, `args` is accessed as a slice-like sequence
### Type Inference
- `::` bindings infer type from the right-hand side
- `:=` bindings infer type from the right-hand side
- Explicit annotation overrides inference: `NAME : f64 : 0.9;`
- Integer literals default to `s64`
- Float literals default to `f32`
- Enum literals (`.variant`) infer their enum type from context (expected type)
### Type Conversions
**Implicit (widening)** — allowed without annotation:
- Integer to wider integer of same signedness (`u8` → `u16`, `s8` → `s32`)
- Unsigned to strictly wider signed (`u8` → `s16`)
- Any integer to any float (`u8` → `f32`, `s32` → `f64`)
- Float to wider float (`f32` → `f64`)
- Integer and float literals can convert to any numeric type implicitly
**Explicit (narrowing)** — requires `xx` prefix:
- Integer to narrower integer (`s32` → `u8`)
- Signed to unsigned (`s32` → `u32`)
- Float to narrower float (`f64` → `f32`)
- Float to any integer (`f64` → `u16`)
- Unsigned to signed of same or narrower width (`u8` → `s8`)
The `xx` prefix operator marks an expression for auto-conversion to the expected type from context (assignment, declaration, argument, return):
```sx
large: f64 = 5999.5;
x : u16 = xx large; // f64 → u16
d : u8 = #run xx resolve(5); // s32 → u8 at compile time
```
Using `xx` outside a typed context (where the target type is known) is a compile error.
---
## 3. Declarations
### Constant Binding (immutable)
```sx
// inferred type
NAME :: value;
// explicit type
NAME : type : value;
```
The `::` operator creates an immutable binding. The value is evaluated at compile time when possible.
Examples:
```sx
SOME_INT :: 0; // s32
SOME_STR :: "Hello"; // string
SOME_FLOAT :: 0.3; // f32
SOME_DOUBLE : f64 : 0.9; // f64 (explicit)
SOME_FUNC :: () => 42; // () -> s32
SOME_TYPE :: f64; // type alias
```
### Variable Binding (mutable)
```sx
// inferred type
name := value;
// explicit type
name : type = value;
// default-initialized (type required)
name : type;
// undefined (type required)
name : type = ---;
```
The `:=` operator creates a mutable binding. The type is inferred unless explicitly annotated.
`name : type;` initializes using the type's defaults: zero for primitives, per-field defaults for structs (see Field Defaults).
`name : type = ---;` leaves the value undefined (uninitialized memory). Reading before writing is undefined behavior.
Examples:
```sx
x := 42; // s32, mutable
x := if true then 1 else 2;
z : Foo = .variant2; // Foo, mutable, explicit type
a : Foo; // Foo, default-initialized (a=0, b=42, c=undef)
b : Foo = ---; // Foo, entirely undefined
```
### Function Definition
```sx
name :: (params) -> return_type {
body
}
```
- Parameters: `name: type` separated by commas
- Return type: `-> type` (omit for void)
- Body: block of statements; last expression is the implicit return value
- No `return` keyword needed (last expression = return value)
Examples:
```sx
compute :: (x: s32) -> s32 {
x * x;
}
main :: () {
// void return, no -> annotation
}
// Bare-block shorthand (equivalent to no-arg void function):
main :: {
// same as main :: () { ... }
}
```
### Enum Definition
```sx
Name :: enum {
variant1;
variant2;
}
```
Defines a new enum type with the given variants. Trailing comma is allowed.
### Enum Backing Type
An optional backing type can be specified after the `enum` keyword (Jai-style):
```sx
Color :: enum u8 { red; green; blue; }
Status :: enum s16 { ok; error; timeout; }
```
Syntax: `Name :: enum [flags] [type] { ... }`
The backing type must be an integer type (`u8`, `u16`, `u32`, `s8`, `s16`, `s32`, `s64`, etc.). When omitted, the default is `s64`. This is useful for C interop (matching C enum sizes) and memory efficiency.
### Enum Layout Struct
For C interop with tagged unions (e.g. SDL_Event), a struct can be used as the backing type to specify the exact memory layout:
```sx
// Inline layout
SDL_Event :: enum struct { tag: u32; _: u32; payload: [30]u32; } {
quit :: 0x100;
key_down :: 0x300: SDL_KeyData;
key_up :: 0x301: SDL_KeyData;
}
// Named layout
EventLayout :: struct { tag: u32; _: u32; payload: [30]u32; }
SDL_Event :: enum EventLayout {
quit :: 0x100;
key_down :: 0x300: SDL_KeyData;
}
```
The layout struct must have:
- A field named `tag` — integer type, the discriminant. Its type becomes the enum's backing type.
- A field named `payload` — array type, the variant data area. Its size determines the maximum payload capacity.
- Any other fields are treated as padding/reserved and positioned by the struct layout.
This gives explicit control over the memory layout instead of relying on automatic alignment. The total size equals the struct size. Without a layout struct, tagged enums use `{ tag, [max_payload_size x i8] }` with no padding.
### Enum Flags
```sx
Perms :: enum flags {
read; // 1
write; // 2
execute; // 4
}
```
Flags can also specify a backing type:
```sx
SDL_InitFlags :: enum flags u32 {
video :: 0x20;
audio :: 0x10;
}
```
The `flags` modifier assigns auto power-of-2 values (1, 2, 4, 8, ...) instead of sequential indices (0, 1, 2, ...). Flags can be combined with `|` and tested with `&`:
```sx
p :Perms = .read | .write;
if p & .execute { ... }
print("{}\n", p); // .read | .write
```
Explicit values use `::` syntax (Jai-style):
```sx
WindowFlags :: enum flags {
vsync :: 64;
resizable :: 4;
hidden :: 128;
}
```
Restrictions:
- Flags enum variants cannot have payloads
- `flags` is a contextual identifier, not a keyword
### Bitwise Operators
`&` (bitwise AND) and `|` (bitwise OR) work on all integer types, not just flags. They sit at precedence level 3, between comparisons and logical operators.
```sx
x := 0xFF & 0x0F; // 15
y := 1 | 2 | 4; // 7
```
---
## 4. Expressions
Everything in `sx` is expression-oriented where possible.
### Operator Precedence
| Prec | Operators | Notes |
|------|-----------|-------|
| 6 (highest) | `*`, `/`, `%` | multiplication, division, modulo |
| 5 | `+`, `-` | addition, subtraction |
| 4 | `<`, `<=`, `>`, `>=`, `==`, `!=` | comparisons (chainable) |
| 3 | `&`, `\|` | bitwise AND, bitwise OR |
| 2 | `and` | logical AND (short-circuit) |
| 1 (lowest) | `or` | logical OR (short-circuit) |
### Arithmetic
Standard infix: `+`, `-`, `*`, `/` with usual precedence (`*`/`/` before `+`/`-`).
```sx
x * x
x + 2
```
### Chained Comparisons
Comparison operators can be chained. Each operand is evaluated exactly once.
```sx
0 <= x <= 100 // equivalent to: 0 <= x and x <= 100
1000 > x >= -100 // equivalent to: 1000 > x and x >= -100
a == b == c // equivalent to: a == b and b == c
```
Mixed operators are allowed: `a < b <= c > d` means `a < b and b <= c and c > d`.
### Logical Operators
`and` and `or` are short-circuit boolean operators. The right operand is not evaluated if the left operand determines the result.
```sx
if 0 <= x <= 100 and 0 <= y <= 100 {
print("contained");
}
```
### If Expression (inline form)
```sx
if condition then consequent else alternate
```
Both branches are single expressions. The whole form produces a value.
```sx
x := if true then 1 else 2;
```
The `else` branch is optional. Without it, the form is a statement (no value):
```sx
if i == 2 then continue;
if done then break;
if err then return;
```
### If Expression (block form)
```sx
if condition {
stmts
} else {
stmts
}
```
Each branch is a block. The last expression in each block is the branch's value. Can be used inline within other expressions:
```sx
y := x + if false {
7;
} else {
12;
};
```
### Pattern Matching
```sx
if subject == {
case pattern: body
case pattern: body
else: body // optional default arm
}
```
Matches `subject` against each `case`. Patterns can be:
- **Enum literals**: `.variant` — matches a specific enum variant.
- **Integer/bool literals**: `42`, `true` — matches a specific value.
- **Type categories**: `struct`, `enum`, `union` — matches all types in that category (used with `type_of` values).
`break` exits a case arm without producing a value. The optional `else:` arm matches when no `case` pattern matches.
```sx
if z == {
case .variant1: break;
case .variant2:
print("z: {z}");
else:
print("unknown");
}
```
#### Type Category Matching
When switching on a `Type` value (from `type_of`), category keywords match all registered types of that category:
```sx
type := type_of(val);
if type == {
case int: result = int_to_string(xx val);
case struct: result = struct_to_string(cast(type) val);
case enum: result = enum_to_string(cast(type) val);
}
```
Available categories: `int`, `float`, `bool`, `string`, `struct`, `enum`, `vector`, `array`, `slice`, `pointer`, `type`.
> Note: `case enum:` matches both payload-less enums and tagged enums (enums with payloads). C-style untagged unions are not registered with the Any type system and cannot be matched by category.
Inside a category arm, `cast(type) val` performs **runtime generic dispatch**: the compiler generates a switch over all types in the category, monomorphizing the callee for each concrete type.
### While Loop
```sx
while condition {
body
}
```
Repeats `body` as long as `condition` is true. `break;` exits the loop. `continue;` skips to the next iteration.
```sx
i := 0;
while i < 10 {
i += 1;
if i == 5 { continue; }
if i == 8 { break; }
print("{i}\n");
}
```
### For Loop
```sx
for iterable: (elem) { } // element alias (no copy)
for iterable: (elem, ix) { } // element + index
for iterable: (_, ix) { } // index only
```
Iterates over arrays and slices. The capture clause after `:` binds loop variables:
- The first name is the element capture (non-reassignable alias into the array/slice)
- The optional second name is the index (s64, starting at 0, also non-reassignable)
- Use `_` to discard a capture
The element capture is a direct alias — reads and field writes go to the original array element. Direct reassignment of the capture (`elem = x`) is a compile error.
`break;` exits the loop. `continue;` skips to the next iteration.
```sx
arr : [5]s32 = .[1, 2, 3, 4, 5];
for arr: (val, ix) {
if ix == 2 { continue; }
print("{}\n", val);
}
```
### Lambda
```sx
(params) => expr
(params) -> return_type => expr
```
Anonymous function. Produces a function value. Supports the same parameter features as named functions: `$` generic type params, `..` variadic params, and optional return type annotation.
```sx
SOME_FUNC :: () => 42; // () -> s32
double :: (x: $T) -> T => x + x; // generic lambda with return type
```
### Function Call
```sx
callee(args)
```
```sx
compute(6)
print("hello")
```
### Field Access
```sx
object.field
```
Used for module access (`std.print`) and struct member access.
### Enum Literal
```sx
.variant_name
```
The enum type is inferred from context (expected type from declaration or parameter).
---
## 5. Statements
Statements are terminated by `;`.
- **Declaration**: `name :: value;` / `name := value;`
- **Assignment**: `name = value;` / `name += value;` (and other compound assignments). Also supports field targets: `obj.field = value;`
- **Multi-target assignment**: `a, b = b, a;` — all RHS values are evaluated before any stores, enabling swaps without temporaries. Target count must equal value count. Only plain `=` is supported (no compound operators). Each target must be a valid lvalue (variable, field, index, dereference).
- **Expression statement**: `expr;` — evaluates the expression (last in a block = return value)
- **Return**: `return expr;` — returns from the enclosing function with the given value. `return;` returns void.
- **Break**: `break;` — exits a match arm or while loop
- **Continue**: `continue;` — skips to the next iteration of a while loop
- **Defer**: `defer expr;` — defers execution of `expr` until the enclosing block exits (LIFO order)
---
## 6. Blocks, Scoping, and Implicit Returns
A block `{ ... }` contains zero or more statements. The last expression in a block is its value (implicit return).
In function bodies, the last expression becomes the return value:
```sx
compute :: (x: s32) -> s32 {
x * x; // this is returned
}
```
### Scope Blocks
Bare blocks can be used as statements to introduce a new lexical scope. Variables declared inside a scope block are local to that block. No trailing `;` is required.
```sx
main :: {
x := 42;
{
x := 6; // shadows outer x
print("inner: {x}"); // prints 6
}
print("outer: {x}"); // prints 42
}
```
### Variable Shadowing
A variable declaration (`name :=`) inside an inner scope shadows any variable with the same name from outer scopes. The outer variable is restored when the inner scope exits.
### Defer
`defer expr;` schedules `expr` to execute when the enclosing scope block exits. Multiple defers in the same scope execute in reverse order (LIFO).
```sx
{
defer print("second");
defer print("first");
}
// prints: first, then second
```
---
## 7. Built-in Functions
Built-in functions are declared in `std.sx` with the `#builtin` suffix, which tells the compiler to generate the implementation internally rather than looking for a function body.
### I/O
- `write(str: string) -> void` — write a string to standard output
- `print(fmt: string, args: ..Any)` — formatted print. Parses `{}` placeholders in the format string and substitutes arguments. When all argument types are statically known, the compiler specializes the call at compile time (no `Any` boxing).
### Math
- `sqrt(x: $T) -> T` — square root (maps to LLVM intrinsic)
### Memory
- `alloc(size: s64) -> string` — allocate `size` bytes of memory, returned as a string slice
- `size_of($T: Type) -> s64` — size of type `T` in bytes
### Type Introspection
- `type_of(val: $T) -> Type` — returns the runtime type tag of a value
- `type_name($T: Type) -> string` — returns the name of type `T` as a string (e.g., `"Point"`)
- `field_count($T: Type) -> s64` — returns the number of fields (struct), variants (enum), or elements (vector) in type `T`
- `field_name($T: Type, idx: s64) -> string` — returns the name of the `idx`-th field (struct) or variant (enum) of type `T`
- `field_value(s: $T, idx: s64) -> Any` — returns the `idx`-th field (struct) or element (vector) of `s`, boxed as `Any`
- `field_index($T: Type, val: T) -> s64` — returns the sequential variant index for an explicit enum value (reverse of `field_value_int`). Returns `-1` if no variant matches.
### Type Conversion
- `cast(Type) expr` — prefix operator that converts `expr` to `Type`. Examples: `cast(s32) 3.14`, `cast(f64) n`. When `Type` is a runtime `Type` value inside a type-category match arm, the compiler generates a dispatch switch over all types in the category, monomorphizing the callee for each concrete type.
### Vectors
- `Vector($N: int, $T: Type) -> Type` — returns an LLVM vector type of `N` elements of type `T`
---
## 8. Compile-time Evaluation
### `#run` Directive
`#run expr` evaluates `expr` at compile time using lazy JIT execution. It can appear in two contexts:
**Compile-time constants** — bind a compile-time value to a name:
```sx
compute :: (x: s32) -> s32 { x * x; }
x :: #run compute(5); // x = 25, evaluated at compile time
```
Comptime globals are resolved lazily: the JIT executes only when the value is first referenced during code generation. Chained dependencies are resolved automatically.
**Side effects** — execute code at compile time for its side effects:
```sx
#run print("compiling...");
```
### `#insert` Directive
`#insert expr;` evaluates `expr` at compile time to obtain a string, then parses and compiles that string as inline code at the insertion point.
```sx
generate :: () -> string {
return "print(\"hello from the other side\");";
}
main :: () {
#insert #run generate();
// equivalent to: print("hello from the other side");
}
```
The inserted string must contain valid `sx` statements (including semicolons). The statements are parsed and compiled in the same scope as the `#insert` site.
---
## 9. Modules / Imports
### `#import` Directive
The `#import` directive brings declarations from another `.sx` file into the current file. Paths are resolved relative to the importing file's directory.
**Flat import** — splices all declarations from the imported file into the current scope:
```sx
#import "modules/std/math.sx";
```
**Namespaced import** — wraps all declarations under a namespace name:
```sx
std :: #import "modules/std.sx";
```
Namespaced declarations are accessed with dot notation:
```sx
std.print("hello");
```
### Import Resolution
- Imports are resolved after parsing and before code generation.
- Paths are relative to the directory of the file containing the `#import`.
- Nested imports are supported (imported files may themselves contain `#import`).
- Circular imports are detected and silently skipped (each file is imported at most once).
- Generic functions in namespaced imports are supported (e.g., `std.mul(5, 2)` where `mul` is generic).
### Intra-module References
Functions within a namespaced import can call each other without the namespace prefix. When generating code for a namespaced module, unresolved function names are automatically tried with the namespace prefix.
### Example
```sx
// modules/std/math.sx
mul :: (base: $T, exp: T) -> T { base * exp; }
// modules/std/std.sx
print :: (str: string) -> void #builtin;
// main.sx
std :: #import "modules/std.sx";
#import "modules/std/math.sx";
main :: () -> s32 {
std.print("hello there");
mul(5, 2);
}
```
---
## 10. Program Structure
A program is a sequence of top-level declarations and `#import` directives. Execution begins at `main`.
```sx
main :: () {
// entry point
}
```
`main` takes no arguments and returns void. The process exit code is 0 unless otherwise specified.
---
## 11. Grammar (informal)
```
program = top_level*
top_level = decl | import_decl
import_decl = '#import' STRING ';'
| IDENT '::' '#import' STRING ';'
decl = const_decl | var_decl | fn_decl | enum_decl | struct_decl
const_decl = IDENT '::' expr ';'
| IDENT ':' type ':' expr ';'
var_decl = IDENT ':=' expr ';'
| IDENT ':' type '=' expr ';'
| IDENT ':' type ';'
fn_decl = IDENT '::' '(' params? ')' ('->' type)? block
| IDENT '::' block
enum_decl = IDENT '::' 'enum' '{' (IDENT ';')* '}'
struct_decl = IDENT '::' 'struct' '{' field_group* '}'
field_group = IDENT (',' IDENT)* ':' type ('=' expr)? ';'
params = param (',' param)*
param = IDENT ':' type
block = '{' stmt* '}'
stmt = decl | assignment ';' | multi_assign ';' | return_stmt | defer_stmt | insert_stmt
| break_stmt | continue_stmt | expr ';'
return_stmt = 'return' expr? ';'
break_stmt = 'break' ';'
continue_stmt = 'continue' ';'
defer_stmt = 'defer' expr ';'
insert_stmt = '#insert' expr ';'
assignment = lvalue ('=' | '+=' | '-=' | '*=' | '/=') expr
multi_assign = lvalue (',' lvalue)+ '=' expr (',' expr)+
lvalue = IDENT | postfix '.' IDENT
expr = if_expr | match_expr | while_expr | for_expr | lambda | binary
while_expr = 'while' expr block
for_expr = 'for' expr ':' '(' IDENT [',' IDENT] ')' block
binary = unary (binop unary)*
unary = ('-' | '!' | 'xx' | 'cast' '(' type ')') postfix
| postfix
postfix = primary ('(' args? ')' | '.' IDENT | '.{' field_init_list '}')*
primary = INT | HEX_INT | BIN_INT | FLOAT | STRING | BOOL | IDENT | '---'
| '.' IDENT | '.' '{' field_init_list '}'
| '(' expr ')' | block | '#run' expr
field_init_list = field_init (',' field_init)*
field_init = IDENT '=' expr | IDENT | expr
if_expr = 'if' expr 'then' expr ('else' expr)?
| 'if' expr block ('else' block)?
match_expr = 'if' expr '==' '{' case_arm* else_arm? '}'
case_arm = 'case' pattern ':' (stmt* | 'break' ';')
else_arm = 'else' ':' stmt*
pattern = '.' IDENT | INT | BOOL | IDENT
lambda = '(' params? ')' ('->' type)? '=>' expr
args = expr (',' expr)*
type = '$' IDENT | 's32' | 'f32' | 'f64' | 'bool' | 'string'
| 'Any' | 'Type' | '..' type | '[' expr ']' type | IDENT
```
---
## 12. Open Questions
These are inferred gaps — things not shown in the readme that need decisions:
- **`return`**: Both `return expr;` and implicit return (last expression) are supported.
- **Else in match**: Is there a default/else arm in pattern matching?
- **Nested functions**: Can functions be defined inside other functions?
- **Mutability of params**: Are function parameters immutable by default?
- **Array/list types**: Not shown — deferred.
- **Struct types**: Implemented — named struct types with positional/named/shorthand literals.
- **Imports/modules**: `#import` directive supports flat and namespaced imports (see Section 8).
- **Operator overloading**: Not shown — presumably no.
- **Semicolons**: Required on all statements? What about the last expression in a block?
- **Top-level expressions**: Are bare expressions allowed at the top level or only declarations?