fix(diagnostics): reject reserved/builtin type names used as identifiers (issue 0076)

A value binding (local/global `var` or a parameter) spelled as a
reserved/builtin type name parses as a `.type_expr` rather than an
`.identifier` (parser.zig, via `Type.fromName`), so the address-of
family in lower.zig never saw a scoped local and mis-lowered it —
loading the aggregate and passing it by value to a `ptr` parameter
(LLVM verifier abort, or a silent `*self`-mutation-losing copy).

Add a declaration-site diagnostic in semantic_diagnostics.zig
(`UnknownTypeChecker.checkBindingName`): reject any parameter name or
`var` binding name (`:=` / typed-local / global forms) whose spelling
collides with a reserved type name. `isReservedTypeName` defers to the
parser's own classifier (`types.Type.fromName`) so the rejected set
never drifts from the set that would parse as a type — the named
builtins (bool/string/void/f32/f64/usize/isize/Any) and `[su]N` over
sx's 1-64 range. Bare value names (`s`, `self`, `index`) are untouched.
No lowering special-case; the `.identifier`-only address-of paths are
correct once type-shaped names can never be bound. The rejected
attempt-1 `bareVarName` approach was never landed.

Tests:
- 0125-types-type-named-var-rejected: `:=` form (s2) rejected
  (repurposed from the old test that asserted the now-illegal behavior).
- 1119-diagnostics-reserved-type-name-as-identifier: parameter (u8),
  typed-local (s64, bool), `:=` (string) forms rejected.
- 0135-types-self-streaming-nonreserved: positive — `*self` streaming
  with non-reserved names accumulates correctly via both call styles.
- 0904-optionals: renamed incidental locals s1/s2 -> filled/empty.
This commit is contained in:
agra
2026-06-03 19:00:39 +03:00
parent 4ab3608f77
commit f49a49cd07
18 changed files with 262 additions and 31 deletions

View File

@@ -0,0 +1,125 @@
# 0076 — builtin/reserved type name wrongly accepted as an identifier
> **Status: RESOLVED.**
>
> **Root cause:** the language accepted a value binding (local/global `var` or a
> parameter) spelled as a reserved/builtin type name. The parser turns such a
> spelling into a `.type_expr` rather than an `.identifier` (`parser.zig`, via
> `Type.fromName`), so the address-of family in `src/ir/lower.zig` never saw a
> scoped local and fell through to value lowering — loading the whole aggregate
> and passing it by value to a `ptr` parameter (LLVM verifier abort, or a silent
> `*self`-mutation-losing copy).
>
> **Fix:** a declaration-site diagnostic in the existing semantic pass
> `src/ir/semantic_diagnostics.zig` (`UnknownTypeChecker`). New
> `checkBindingName` rejects any parameter name or `var` binding name (local or
> global, `:=` / typed-local forms) whose spelling collides with a reserved type
> name; `isReservedTypeName` defers to the parser's own classifier
> (`types.Type.fromName`) so the rejected set never drifts from the set that
> would parse as a type — the named builtins (`bool`, `string`, `void`, `f32`,
> `f64`, `usize`, `isize`, `Any`) and `[su]N` over sx's 164 range. Bare value
> names (`s`, `self`, `index`) are untouched. No lowering special-case is added;
> the `.identifier`-only address-of paths are correct once type-shaped names can
> never be bound. The rejected `bareVarName` approach was never landed.
>
> **Regression tests:**
> - `examples/0125-types-type-named-var-rejected.sx` — `:=` form (`s2`) rejected.
> - `examples/1119-diagnostics-reserved-type-name-as-identifier.sx` — parameter
> (`u8`), typed-local (`s64`, `bool`), and `:=` (`string`) forms rejected.
> - `examples/0135-types-self-streaming-nonreserved.sx` — positive: `*self`
> streaming with non-reserved names (`hasher`, `ctx`) accumulates correctly via
> both `update(@h, …)` and `h.update(…)`.
>
> Pre-existing example `examples/0904-...` declared locals `s1`/`s2` (incidental
> names); renamed to `filled`/`empty`. Scope: main-file decls only, matching the
> pass's existing trusted-imports convention.
## Symptom (how it first surfaced)
A local variable whose name is lexically a type — e.g. `s2` (the `sN`
arbitrary-width signed-int syntax: `Type.fromName("s2")``s(2)`), or `u8`,
`s64`, etc. — is accepted as a variable. Because such a name parses as a
`.type_expr` (not `.identifier`), the address-of family of lowering sites
(`@s2`, the autoref `s2.update(...)` receiver, a bare `f(s2)` at a `*T` param,
global function-pointer args) does NOT recognize it as a scoped local and falls
through to value lowering — loading the whole aggregate and passing it **by
value** to a `ptr` parameter:
```
LLVM verification failed: Call parameter type does not match function signature!
call void @update(ptr @__sx_default_context,
{ [8 x i64], [64 x i8], i64, i64 } %load, ...)
```
For some struct shapes it compiles but silently passes a **copy** (callee
`*self` mutations lost). A non-type-shaped name (`hasher`, `ctx`) never triggers
any of this — the `.identifier` paths already work correctly.
## Root cause
The language is **accepting reserved/builtin type names as identifiers** in the
first place. `sN`/`uN` (arbitrary-width ints) and the named builtins
(`bool`, `string`, `void`, `f32`, `f64`, `s8`/`s16`/`s32`/`s64`,
`u8`/`u16`/`u32`/`u64`, …) are reserved type names; declaring a variable with
such a name is meaningless and produces the mis-lowering above. Patching each
address-of site to tolerate the name (the rejected `bareVarName` approach) is
whack-a-mole — there is always another site, and it entrenches a name that
should never have been allowed.
## Proper fix (the required direction)
Emit a **diagnostic error** when an identifier is declared with a name that
collides with a **builtin/reserved type name** — including the arbitrary-width
`[su][0-9]+` (`sN`/`uN`) family AND the named builtins (`bool`, `string`,
`void`, `f32`, `f64`, the fixed-width int types, etc.). Scope ruling (Agra):
**all builtin/reserved type names** are rejected as identifiers. (User-defined
struct/type-name shadowing, if intentionally supported elsewhere, is out of
scope for this issue — this is specifically about builtin/reserved type names.)
Diagnostic at the declaration site, e.g.:
`error: 'u8' is a reserved type name and cannot be used as an identifier`
with the declaration's span.
Suspected area: name binding / declaration handling — where a `:=` / typed
local / parameter name is introduced. Reject the name there, before it ever
reaches lowering. Do NOT add lowering special-cases for type-shaped names; the
`.identifier`-only checks at the address-of sites are then correct as-is (no
type-shaped name can reach them).
## Reproduction
```sx
#import "modules/std.sx";
Sha256 :: struct { h:[8]u64; block:[64]u8; block_len:s64=0; total_len:u64=0; }
init :: () -> Sha256 { s:Sha256=---; s.block_len=0; s.total_len=0; s }
update :: (self:*Sha256, data:string) { self.total_len += data.len; }
main :: () -> s32 { s2 := init(); update(@s2, "."); print("total_len={}\n", s2.total_len); return 0; }
```
`./zig-out/bin/sx run <file>` today → LLVM verifier abort.
**Expected after fix:** a clean compile-time diagnostic that `s2` is a reserved
type name and cannot be an identifier (exit non-zero, readable error — NOT an
LLVM abort, NOT a silent copy). The same program with a non-reserved name
(`hasher := init(); update(@hasher, ".")`) must compile and print `total_len=1`.
## Verification
1. Pinned diagnostics test(s) asserting the error for representative reserved
names used as identifiers: `s2`, `u8`, `s64`, `bool`, `string` (declaration
forms: `:=`, typed local, and a parameter name). Capture the diagnostic text
in `expected/`.
2. A positive test: the same `*self` streaming pattern with NON-reserved names
(`hasher`, `ctx`) compiles and accumulates state correctly via both
`update(@h, ...)` and `h.update(...)` — proving the `.identifier` paths are
correct and no lowering special-case is needed.
3. `zig build && zig build test && bash tests/run_examples.sh` all green. If any
existing example/test declares a variable with a reserved type name, it is now
illegal — fix the test's variable name (do NOT weaken the diagnostic). Report
how many such sites existed.
## Provenance
Discovered by the `distribution` flow (P1.2 pure-sx SHA-256), whose minimal repro
happened to name a local `s2`. Real SHA-256 code with names like `hasher`/`ctx`
is unaffected on the current compiler — so the P1.2 "blocker" was a
naming artifact, and this issue is really a missing-diagnostic correctness bug.