Files
sx/readme.md
agra 116af2359e lang: multi-iterable for loops — drop ':', add '..=', open ranges, arrow bodies
The for header is now a comma-separated list of iterables with a
positional capture group and no ':' separator:

    for xs (x) { }                    // collection
    for 0..n (i) { }                  // range (end exclusive)
    for 1..=5 (a) { }                 // ..= inclusive end
    for xs, 0.. (x, i) { }            // index idiom (replaces (x, i))
    for xs, ys (x, y) { }             // parallel (zip) iteration
    for xs (x) => sum += x;           // arrow body (full statement)

First-iterable-wins: the first iterable's length drives the loop and
must be bounded; the other positions follow by their own cursors (a
non-first range's end is not consulted or evaluated; a shorter
non-first collection is read past its length on mismatch). The old
single-iterable index capture is replaced by the trailing open range.

Capture/call disambiguation is positional: the paren group immediately
before '{' or '=>' is the capture, every earlier top-level group is a
call. 'for zip(a, b) (x, y)' calls zip; 'for f(n) { }' reads (n) as
the capture and errors with a parenthesize/add-capture hint. The old
':' form errors with a migration hint.

Lowering is unified across forms: one cursor slot per position (ranges
start at their start, collections at 0), all advanced together, the
first position's bound terminating. inline for keeps the single
bounded comptime range.

Migrated the full corpus (examples, library modules, issue repros,
in-source test strings). New coverage: examples/0050 (the full feature
surface) and examples/1149-1155 (seven diagnostic faces). specs.md For
Loop section + grammar rewritten; readme teaser updated.
2026-06-10 20:30:55 +03:00

550 lines
19 KiB
Markdown

# sx
An experimental systems programming language with Jai-inspired syntax, compile-time execution, generics, closures, protocols, and an LLVM backend.
> **Status**: Highly experimental. The language and compiler are under active development.
## At a Glance
```sx
#import "modules/std.sx";
Point :: struct {
x, y: s32;
magnitude :: (self: *Point) -> f32 { sqrt(self.x * self.x + self.y * self.y); }
}
main :: () {
p := Point.{ x = 3, y = 4 };
print("point: {}, magnitude: {}\n", p, p.magnitude());
}
```
**Key characteristics:**
- Jai-inspired declaration syntax: `name :: value` for constants, `name := value` for variables
- Compiles to native code via LLVM 19
- Compile-time execution with `#run`
- Generics via monomorphization
- First-class closures with value capture
- Protocol-based polymorphism (traits)
- Pattern matching on enums, optionals, and type categories
- C interop via `#foreign` and `#import c`
- Targets: macOS (ARM64, x86_64), Linux (x86_64, ARM64), Windows (x86_64), WebAssembly
## Building
Requires **Zig 0.16+** and **LLVM 19+**.
```sh
zig build
```
On macOS with Homebrew LLVM:
```sh
# default path: /opt/homebrew/opt/llvm@19
zig build
```
Custom LLVM path:
```sh
zig build -Dllvm-prefix=/path/to/llvm
```
## Usage
```sh
sx run file.sx # compile and run
sx build file.sx # compile to binary
sx build file.sx -o out # compile with output path
sx ir file.sx # emit LLVM IR
sx lsp # start language server
```
Options:
```
--target <triple> target platform (shortcuts: macos, linux, windows, wasm)
--opt <level> optimization: none, less, default, aggressive
--cpu <name> target CPU
-o <path> output path
```
## Language Overview
### Types
| Type | Description |
|------|-------------|
| `s8`..`s64`, `u8`..`u64` | Signed/unsigned integers (default: `s64`) |
| `f32`, `f64` | Floating point (default: `f32`) |
| `bool` | `true` / `false` |
| `string` | UTF-8 fat pointer `{ptr, len}` |
| `[N]T` | Fixed-size array |
| `[]T` | Slice (fat pointer) |
| `*T`, `[*]T` | Single / many pointer |
| `?T` | Optional |
| `struct`, `enum`, `union` | Composite types |
| `Closure(args) -> ret` | Closure type |
**Numeric limits.** A field-like access on a builtin integer type name folds to
a compile-time constant of that type: `s64.max``9223372036854775807`,
`u8.min``0`, `s3.max``3`. It works for every width `s1`..`s64` / `u1`..`u64`
plus `usize`/`isize`, and is usable anywhere a constant of that type is — including
array dimensions (`[u8.max]T` is a 255-element array). The float types `f32`/`f64`
expose `.min` / `.max` too (with `.min` = most-negative finite = `-max`, **not**
C's `DBL_MIN`) plus the float-only `.epsilon` (ULP of 1.0, not C#'s denormal
`Epsilon`), `.min_positive` (smallest normal = C `DBL_MIN`), `.true_min` (smallest
subnormal — beware flush-to-zero CPU modes), `.inf`, and `.nan`. A float-only
accessor on an integer (`s32.epsilon`), or any accessor on a non-numeric type, is
a clean compile error. The fold applies only to a bare type-name receiver: a raw
identifier that binds a value shadowing a type name (`` `f64 := … `` then
`` `f64.epsilon ``) reads the value's field, not the limit — for a local, global,
or module-constant binding alike. This stays an ordinary *runtime* field read
even when it flows into an integer binding or an array dimension, so it truncates
(its field value) / is a non-constant count — never the builtin limit. See
`specs.md` → Numeric Limits.
### Declarations
```sx
// Constants (compile-time when possible)
PI :: 3.14159;
MAX : s32 : 100;
// Variables (mutable)
x := 42; // inferred type
y : s32 = 0; // explicit type
z : s32 = ---; // uninitialized
```
A typed constant's initializer must be compatible with its annotation — an
integer fits any integer or float, a float a float type, a string `string`,
`null` a pointer/optional. The check is type-based, so it covers a literal and a
constant expression alike: both `N : string : 4` and `N : string : M + 2` are a
compile-time `type mismatch` error, not a silently-accepted constant. Mixed
int+float arithmetic promotes to the float in either operand order (`n + 0.5` and
`0.5 + n` are both `f64`), so `C : s64 : M + 0.5` is rejected regardless of order
while `F : f64 : M + 0.5` folds to `2.5`.
**Float → integer narrowing (unified rule).** A float flowing into an
integer-typed binding *without* a cast follows the same integral-fold rule an
array dimension uses: an **integral** compile-time float folds to its integer, a
**non-integral** one is a compile error. It holds whether the value is a literal
or *any* compile-time-constant float expression — including one that references a
float-typed const (`F : f64 : 2.5; y : s64 = F + 1.5` → `4`), a builtin float
numeric-limit accessor (`f64.max - f64.max` → `0`, while `f64.true_min + 0.5`
errors), a float `%` (`6.0 % 4.0` → `2`, while `5.5 % 2.0` = `1.5` errors), or a
float `/` (`6.0 / 2.0` → `3`, while `5.0 / 2.0` = `2.5` errors — a float `/` is
always float division, never integer truncation, even with integral operands):
the compile-time float evaluator recognises every leaf shape the integer one does, so
no constant float form escapes the rule at one site while folding at another — and
is uniform
across a typed local, a parameter default, a struct field default, a call
argument, a typed constant, **and an array dimension / count** — `y : s64 = 4.0`,
`K : s64 : 4.0`, `y : s64 = M + 2.0`, and `[F + 1.5]s64` (≡ `[4]s64`, whether
written directly, through a const, or via a type alias) all give `4`, while
`y : s64 = 1.5`, `N : s64 : 1.5`, `y : s64 = M + 0.5`, `y : s64 = F + 0.25`
(= `2.75`), and `[F + 0.25]s64` all error (one wording at the binding sites:
`cannot implicitly narrow non-integral float …`; a dimension instead reports
`array dimension must be an integer, but '…' is a non-integral float`, since the
cast escape does not apply in a count position). An explicit `xx` / `cast(s64)`
is the escape hatch and always truncates (`y : s64 = xx 1.5` → `1`,
`y : s64 = xx (M + 0.5)` → `2`); a genuine runtime float is likewise unaffected.
Builtin type names (`s2`, `u8`, `bool`, `string`, …) are reserved and a *bare*
spelling can't be used as an identifier at a **value-binding or declaration-name**
site — a value binding (`:=` / typed local / parameter), a `::` constant or
function declaration, an `impl` method definition, or a `::` type declaration
(`struct` / `enum` / `union` / alias / `protocol` / …) — each is an error
(`s2 :: 5` and `s2 :: (n) { … }` are rejected just like `s2 := 5`). **Member-name
positions are exempt**: a struct *field*, a union *tag*, and a protocol
*method-signature* may be a bare reserved spelling (`struct { s2: s64 }`,
`union { u8: … }`, `protocol { s2 :: () -> s64 }`) — they are reached via `obj.name`,
so they never mis-lower. The bare exemption covers only the identifier-classified
reserved names (`s1`..`s64`, `u1`..`u64`, `bool`, `string`, `void`, `usize`,
`isize`, `Any`); `f32` and `f64` are lexer keywords, so even in a member slot they
need the backtick (`` struct { `f32: s64 } ``). A leading backtick escapes one into
a **raw identifier**:
`` `name `` is the literal identifier `name` (the backtick drops out of the text),
usable in **every** position — value, declaration, and type, and optional in the
exempt member positions. It is the only way handwritten sx can spell a reserved
name in a binding or declaration site.
```sx
`s2 := 2.5; // identifier "s2", distinct from the s2 type
print("{}\n", `s2); // 2.5 (or bare `s2` in value position)
`s2 :: struct { x: s64; } // declare a type named with a reserved spelling
v : `s2 = ---; // and reference it as a type — resolves to the struct
x : s2 = 3; // bare `s2` in type position is still the int type
```
It works in every identifier position — local, global, parameter, struct field,
union tag, function name, type/alias/import name, a top-level or struct-body
constant, and the control-flow / capture / binding forms (destructure, `if`/`while`
binding, `for` capture, match capture, `catch`/`onfail` tag) — and a reserved-spelled
function is bare-callable (`s2(10)`). A backtick name used as a type resolves to a
`` `name ``-declared type — including a parameterized template (`` `s2(s64) ``) and
under pointer/optional wrappers — else a normal `unknown type` error.
Foreign declarations from `#import c { … }` are exempt automatically: C names that
collide with reserved type names (e.g. `s1`, `s2`) import unedited, and a foreign
reserved-name function is bare-callable by its C name.
### Structs
```sx
Vec3 :: struct {
x, y, z: f32;
length :: (self: *Vec3) -> f32 {
sqrt(self.x * self.x + self.y * self.y + self.z * self.z);
}
}
v := Vec3.{ x = 1, y = 2, z = 3 };
v2 := Vec3.{ 1, 2, 3 }; // positional
print("{}\n", v.length());
```
Structs support field defaults, `#using` for composition, and methods defined in the body.
### Enums (Tagged Unions)
```sx
Shape :: enum {
circle: f32;
rect: struct { w, h: f32; };
none;
}
area :: (s: Shape) -> f32 {
if s == {
case .circle: (r) => 3.14159 * r * r;
case .rect: (r) => r.w * r.h;
case .none: 0;
}
}
```
Flag enums with power-of-2 values:
```sx
Perms :: enum flags { read; write; execute; }
rw := Perms.read | Perms.write;
```
### Optionals
```sx
x: ?s32 = 42;
y: ?s32 = null;
val := x ?? 0; // null coalescing
forced := x!; // force unwrap (traps on null)
if v := x { // safe unwrap
print("{}\n", v);
}
// Optional chaining
node: ?Node = get_node();
name := node?.name ?? "unknown";
```
### Generics
```sx
max :: (a: $T, b: T) -> T {
if a > b then a else b;
}
List :: struct ($T: Type) {
items: [*]T;
len: s64;
append :: (self: *List(T), item: T) { ... }
}
```
Generic constraints via protocols:
```sx
are_equal :: ($T: Type/Eq, a: T, b: T) -> bool { a.eq(b); }
```
### Closures
```sx
make_adder :: (n: s64) -> Closure(s64) -> s64 {
closure((x: s64) -> s64 => x + n);
}
add5 := make_adder(5);
print("{}\n", add5(100)); // 105
```
Closures capture by value. Bare functions auto-promote to closures when needed.
### Protocols
```sx
Drawable :: protocol {
draw :: (x: s32, y: s32);
}
impl Drawable for Circle {
draw :: (self: *Circle, x: s32, y: s32) { ... }
}
shape : Drawable = xx my_circle; // type erasure via xx
shape.draw(10, 20); // dynamic dispatch
```
`#inline` protocols store function pointers directly (no vtable indirection):
```sx
Allocator :: protocol #inline {
alloc :: (size: s64) -> *void;
dealloc :: (ptr: *void);
}
```
### Pattern Matching
```sx
// On enums
if shape == {
case .circle: (r) => print("radius: {}\n", r);
case .rect: (r) => print("{}x{}\n", r.w, r.h);
case .none: print("nothing\n");
}
// On optionals
if opt == {
case .some: (val) => use(val);
case .none: fallback();
}
// On type categories (via Any)
if type_of(val) == {
case int: print("integer\n");
case string: print("string\n");
case struct: print("struct\n");
}
```
### Control Flow
```sx
// Chained comparisons
if 0 <= x <= 100 { ... }
// While
while i < 10 { i += 1; }
// For — collections, ranges, and parallel iteration
for items (val) { print("{}\n", val); }
for items, 0.. (val, idx) { print("[{}] = {}\n", idx, val); }
for 1..=5, 0.. (a, b) { print("{}:{}\n", a, b); } // a: 1..5, b follows
for items (val) => total += val; // arrow body
// Defer
f := open("file.txt");
defer close(f);
// Multi-target assignment (atomic swap)
a, b = b, a;
```
### Pipe Operator
```sx
result := data |> parse() |> transform() |> serialize();
// equivalent to: serialize(transform(parse(data)))
```
### Compile-Time Execution
```sx
// Evaluate at compile time
FIBONACCI_10 :: #run fib(10);
// Generate code at compile time
#insert #run generate_lookup_table();
```
### C Interop
Foreign functions:
```sx
libc :: #library "c";
printf :: (fmt: [:0]u8, args: ..Any) -> s32 #foreign libc;
write_fd :: (fd: s32, buf: [*]u8, count: u64) -> s64 #foreign libc "write";
```
Direct C header import:
```sx
#import c {
#include "vendors/mylib/api.h";
#source "vendors/mylib/impl.c";
};
```
### Modules
```sx
#import "modules/std.sx"; // flat import
math :: #import "modules/math.sx"; // namespaced import
```
When two flat-imported modules each define a function of the same name, every
module's own code binds its OWN author — a bare call resolves to the same-name
function in the caller's module (or in its single flat import that provides it).
A bare call to a name that two or more flat imports both provide is ambiguous and
is rejected; qualify it with a namespaced import (`m :: #import …; m.fn()`).
A **namespaced** import only binds its alias: reach the module's members as
`m.name`. Bare-name visibility joins over flat (`#import "…"`) imports, never over
a namespaced alias. That join is **non-transitive for every bare member kind —
functions, constants, AND types alike**: a flat import of a flat import is NOT
bare-visible (when `A` imports `B` and `B` imports `C`, `A` does not see `C`'s
top-level names — including its types — so qualify them, or `#import "C"` directly
if you reference them). This holds for a *parameterized* type head too: a generic
struct / parameterized protocol / type-returning function used as `Box(s64)` is
gated exactly like a bare leaf type — the constructor head must be reachable over
your own or a direct flat import, not two hops away. A bare reference to a
namespaced-only import's member — function, module constant, or **type** (leaf or
generic head) — is likewise not visible and is rejected (`type 'X' is not visible;
#import the module that declares it`); qualify it as `m.name`. The type gate holds
wherever a bare type name is named — a value/field annotation, a reflection /
type-arg slot (`size_of(T)`, `size_of(*T)`), a typed array-literal head (`T.[…]`),
a parameterized head (`Box(s64)`), or a type-as-value / type-match arm — not just
plain annotations. **Own-wins** holds at every one of those sites too, exactly like
a bare call: when the querying module declares its OWN same-name type, that bare
reference resolves to ITS author — never a same-name flat import. Ambiguity is
enforced at every one of those sites as well: a bare type (including a type-returning
function head) that two or more flat imports each declare — with no own author to
win — is **ambiguous and rejected** (`type 'X' is ambiguous: it is declared in
multiple flat-imported modules; qualify the reference or remove the duplicate
import`) — never a silent pick of one author. Qualifying the reference is a real
escape hatch for a **generic head** too: `ns.Box(args)` selects the template
AUTHORED by `ns`'s module, so two namespaces each declaring a same-name
`Box($T)` with different layouts stay distinct types (`a.Box(s64)` and
`b.Box(s64)` instantiate their own author's fields), never the global last-wins
template. (A library's own *internal* type references still resolve: a generic
struct / pack fn / protocol body is instantiated in the module that defines it, so
e.g. `List(T).append`'s `alloc: Allocator` is visible there regardless of the call
site.)
### Implicit Context
Every program gets an implicit `context` with a default allocator:
```sx
// No boilerplate needed — context is auto-initialized
main :: () {
list := List(s64).create(); // uses context.allocator
list.append(42);
}
// Override allocator for a scope
push Context.{ allocator = my_arena } {
do_work(); // all allocations use my_arena
}
```
## Quick Sort Example
```sx
#import "modules/std.sx";
quick_sort :: (items: []$T) {
partition :: (items: []T, lo: s64, hi: s64) -> s64 {
pivot := items[hi];
i := lo - 1;
j := lo;
while j < hi {
if items[j] < pivot {
i += 1;
items[i], items[j] = items[j], items[i];
}
j += 1;
}
i += 1;
items[i], items[hi] = items[hi], items[i];
i;
}
sort :: (items: []T, lo: s64, hi: s64) {
if lo < hi {
pi := partition(items, lo, hi);
sort(items, lo, pi - 1);
sort(items, pi + 1, hi);
}
}
sort(items, 0, items.len - 1);
}
main :: () {
arr : []s64 = .[333, 2, 3, 5, 2, 2, 3, 4, 5, 6, 6, 1];
quick_sort(arr);
print("{}\n", arr);
// [1, 2, 2, 2, 3, 3, 4, 5, 5, 6, 6, 333]
}
```
## Standard Library
The standard library (`modules/std.sx`) provides:
- **I/O**: `print(fmt, args...)`, `out(str)`
- **Collections**: `List($T)` (dynamic array)
- **Strings**: `concat`, `substr`, `int_to_string`, `uint_to_string`, `float_to_string`, `cstring`
- **Memory**: `Allocator` protocol, `GPA` (general purpose), `Arena` (bump allocator)
- **Math**: `sqrt`, `sin`, `cos`
- **Introspection**: `type_of`, `type_name`, `type_is_unsigned`, `type_eq`, `field_count`, `field_name`, `field_value`, `size_of`, `align_of`, `is_flags` — the type-only builtins (`size_of`, `align_of`, `field_count`, `type_name`, `type_eq`, `type_is_unsigned`, `is_flags`) require a type argument (a spelled type or a generic `T`); passing a value is a compile-time error. A runtime `Type` value (`type_of(x)`) is currently accepted by `type_name` and `type_is_unsigned` only — the other five are compile-time-only (runtime reflection is deferred)
### Command-line interface (`modules/std/cli.sx`)
`std.cli` builds command-line front-ends over an explicit logical argv
(`[]string`): `os_args(buf)` reads the real process argv, and
`parse(args, commands, diag) -> !Parsed` does subcommand dispatch + `--flag`
parsing. On top of that it defines the small **exit-code / `--json` contract**
a CLI program (e.g. `dist`) relies on:
```sx
#import "modules/std/cli.sx";
p, e := parse(args, cmds, @diag); // (Parsed, !CliError)
if e == error.UnknownCommand {
log.err("unknown command '{}'", diag.token); // human text -> stderr
exit_usage(); // usage error -> exit 64
}
if p.json { /* emit ONLY machine output on stdout */ }
```
- **Named exit codes** — `EX_OK` (0), `EX_USAGE` (64, the sysexits.h
command-line-usage code), `EX_UNAVAILABLE` (70, unsupported platform).
- **Terminators** — `exit_ok()` / `exit_usage()` end the process with the
matching code; both route through the canonical `process.exit(code: u8)`.
- **`--json` mode** — the reserved global `--json` flag surfaces as
`parsed.json` (true iff `--json` is in the argv). Convention: in json mode
stdout carries only the machine result; human diagnostics go to stderr.
## Cross-Compilation
```sh
sx build app.sx --target linux # Linux x86_64
sx build app.sx --target macos-arm # macOS ARM64
sx build app.sx --target windows # Windows x86_64
sx build app.sx --target wasm # WebAssembly
```
## Acknowledgments
- [Jonathan Blow](https://en.wikipedia.org/wiki/Jonathan_Blow) for Jai, the language that inspired this one
- [Andrew Kelley](https://andrewkelley.me) for Zig, which made this compiler a joy to write
## License
MIT