lang: introduce cstring — the C-boundary string (Odin model)
cstring is ONE pointer to a null-terminated u8 buffer, C's char*: thin (8 bytes, no length; cstring_len walks to the terminator), crossing #foreign boundaries verbatim in both directions, with ?cstring as the nullable case lowering to the same bare pointer (null = absent). Conversion discipline mirrors Odin: a string LITERAL coerces implicitly (its bytes are terminated constants); any other string is rejected with a diagnostic naming to_cstring (it may be an unterminated view); and cstring never coerces to string implicitly — from_cstring(c) is the explicit zero-copy view, pricing the strlen. Plumbing: TypeId/TypeInfo builtin slot 18 (first_user 19), name classifiers, size/align/name tables, LLVM ptr lowering, the ?T pointer niche, the xx pointer ladder, the literal-gated coercion plan (isConstString + data_ptr), and the reserved-spelling set. std gains cstring_len/from_cstring/to_cstring (fmt.sx, re-exported); the old cstring(size) allocator helper is renamed alloc_string everywhere; getenv migrates to (name: cstring) -> ?cstring as the canonical user and env() drops its manual strlen/memcpy. Pinned: examples/1222 (FFI both directions, literal coercion, ?cstring null paths, round trip) and examples/1173 (both coercion diagnostics); FAIL pre-feature. The alloc_string rename + getenv signature shift the .ir snapshots — regenerated. zig build test 426/426; run_examples 604/604. Spec: reserved spelling + cstring section + C-interop rows.
This commit is contained in:
34
specs.md
34
specs.md
@@ -16,7 +16,7 @@ Line comments start with `//` and extend to end of line.
|
||||
#### Reserved type names
|
||||
|
||||
A spelling that names a builtin type — the arbitrary-width integers `i1`..`i64` /
|
||||
`u1`..`u64`, plus `bool`, `string`, `void`, `f32`, `f64`, `usize`, `isize`, `Any` —
|
||||
`u1`..`u64`, plus `bool`, `string`, `cstring`, `void`, `f32`, `f64`, `usize`, `isize`, `Any` —
|
||||
is reserved. A bare reserved spelling is rejected at **value-binding and
|
||||
declaration-name sites**: a value binding (`:=` / typed local / parameter), a
|
||||
`::` **constant** or **function** declaration, an `impl` method **definition**,
|
||||
@@ -39,8 +39,8 @@ slot), so a reserved-spelled impl method still needs the backtick
|
||||
(`` `i2 :: (self) ``), exactly like a free function. See `examples/0158`.
|
||||
|
||||
The bare member-name exemption applies only to the **identifier-classified**
|
||||
reserved spellings — `i1`..`i64`, `u1`..`u64`, `bool`, `string`, `void`, `usize`,
|
||||
`isize`, `Any` — which all lex as ordinary identifiers. The two
|
||||
reserved spellings — `i1`..`i64`, `u1`..`u64`, `bool`, `string`, `cstring`, `void`,
|
||||
`usize`, `isize`, `Any` — which all lex as ordinary identifiers. The two
|
||||
**keyword-classified** reserved spellings, `f32` and `f64`, are lexer keywords, and
|
||||
member-name slots require an identifier token; a bare `f32` / `f64` is therefore
|
||||
rejected at parse (`expected field name in struct`) even in a member position. Use
|
||||
@@ -1043,6 +1043,29 @@ the chain leaves the checked zone.
|
||||
|
||||
**Fat pointer layout**: `[:0]u8`, `string`, and `[]T` are `{ptr, i64}` structs. The raw pointer is always the first field at offset 0. This means `*[:0]u8` works as C's `char**` — a C function dereferences through the outer pointer and reads the raw `char*` from offset 0.
|
||||
|
||||
### cstring
|
||||
|
||||
`cstring` is the C-boundary string: ONE pointer to a null-terminated u8
|
||||
buffer — exactly C's `char *`. It is thin (8 bytes, no length field;
|
||||
`cstring_len` walks to the terminator, O(n)) and crosses `#foreign`
|
||||
boundaries verbatim in BOTH directions. `?cstring` is the nullable case
|
||||
and lowers to the same bare pointer (null = absent) — the natural type
|
||||
for `getenv`-style returns and optional `char *` parameters.
|
||||
|
||||
Conversion discipline (Odin's model):
|
||||
|
||||
- A string **literal** coerces to `cstring` implicitly — literal bytes
|
||||
are terminated constants in the binary, so the conversion is free.
|
||||
- Any **other** `string` does NOT coerce: it may be an unterminated view
|
||||
(`string.{ptr, len}` windows, writer output). Materialize an owned,
|
||||
terminated copy with `to_cstring(s)`.
|
||||
- `cstring` does not coerce to `string` implicitly — the length is an
|
||||
O(n) strlen the code must ask for. `from_cstring(c)` is the zero-copy
|
||||
view (shares C's buffer); `substr(from_cstring(c), 0, n)` the owned
|
||||
copy.
|
||||
- `xx` bit-casts `cstring` ↔ `*u8` / `[*]u8` / integer-pointer values
|
||||
for low-level interop.
|
||||
|
||||
### Optional Types
|
||||
|
||||
Optional types represent values that may or may not be present.
|
||||
@@ -1182,7 +1205,10 @@ write_fd :: (fd: i32, buf: [*]u8, count: u64) -> i64 #foreign libc "write";
|
||||
|
||||
| C type | sx type | Notes |
|
||||
|--------|---------|-------|
|
||||
| `const char*` (input) | `[:0]u8` | compiler extracts `.ptr` at call site |
|
||||
| `const char*` (input) | `cstring` | the pointer, verbatim; literals coerce |
|
||||
| `const char*` (input, legacy) | `[:0]u8` | compiler extracts `.ptr` at call site |
|
||||
| `const char*` (return) | `cstring` | the pointer, verbatim; `from_cstring` to view |
|
||||
| nullable `const char*` (both directions) | `?cstring` | null pointer = `null` |
|
||||
| `char*` (output buffer) | `[*]u8` | raw buffer, no length |
|
||||
| `const char**` | `*[:0]u8` | address of `[:0]u8` — `.ptr` at offset 0 |
|
||||
| `int*` (single out) | `*i32` | |
|
||||
|
||||
Reference in New Issue
Block a user