16 KiB
PLAN-HTTPZ — Stream HTTPZ (production HTTP-server readiness)
STATUS: 🟡 PLANNED — not started. This stream is being (re)established as a tracked stream. The HTTP/socket/thread work to date shipped ad-hoc under phase tags in source comments (
S2socket nonblocking,S6pthreads,S7ahttp server,S7bthread-pool handlers,C3per-OS selection) but never had a PLAN/CHECKPOINT file — the comments in socket.sx:3 / thread.sx:23 reference a "PLAN-HTTPZ" that did not exist until now. Progress tracked in CHECKPOINT-HTTPZ.md.
Goal: the low-level guarantees needed to run a long-lived HTTP service on Linux in production — not a web framework. Survive malformed clients, slow clients, overload, restarts, memory pressure, and a normal Linux deployment without every app author rediscovering the same failure modes. Driven by the user's production-readiness checklist (P0 blockers, P1 hardening, P2 ergonomics), mapped below to concrete sx work.
Cadence (IMPASSIBLE): no commit both adds a test AND makes it pass (lock-to-bail, then
flip to green); zig build && zig build test green after every step; never regen snapshots
while red; scope regens with -Dname=examples/<cat>/<file>.sx -Dupdate-goldens + review the
diff. HTTP corpus lives in examples/http/ (16xx/http category) + examples/event/.
Stress/fuzz/load harnesses live OUTSIDE the corpus (the corpus runner has a 10s/example
timeout and no network sandbox — see corpus_run.test.zig).
Audit of record (grounded against the tree, 2026-06-26)
What already exists, so the next session does not redo discovery. Two layers of P0 #1 are already done to a high standard; one piece is a literal blocker.
✅ Solid / Linux-validated — do NOT rebuild
- event.sx —
Loopfully branchesOS == .linux(epoll, lines ~85–251) vs kqueue (~251–323). Ran 6/6 green on real aarch64 Linux in an ApplecontainerVM (kernel 6.18); ABI corpus-locked byexamples/event/1633. - net/epoll.sx — arch-aware
EpollEventlayout (12B packed x86_64 / 16B aligned aarch64 via the u32-split trick), correct flags, EINTR retry,__errno_location. - net/kqueue.sx — macOS-only, correct.
- sched.sx — M:1 fiber runtime; epoll/kqueue
fd-readiness fully branched incl.
EPOLL_CTL_DELafterEPOLLONESHOT. Linux-tested. - json.sx — streaming writer, zero-copy views, explicit allocators, stable key order. (Integers only — no floats.)
❌ BROKEN on Linux — the keystone blocker (Phase C3)
- socket.sx — Darwin-only, no
OSbranching:SockAddr(:32) carries Darwin'ssin_len:u8at offset 0; Linuxsockaddr_inhas no such field → family/port written to wrong offsets, addresses corrupted.O_NONBLOCK = 4— Linux is2048;set_nonblockingsets the wrong bit.- errno constants are macOS values (
EAGAIN=35→Linux 11,EINPROGRESS=36→115,ECONNRESET=54→104, …) → WouldBlock/reset detection silently breaks. errno_slotbinds__error(:52) — Linux is__errno_location.
- thread.sx — Darwin pthread struct sizes:
MutexBuf = 64B(:44) is Darwin'spthread_mutex_t; glibc is 40B →pthread_mutex_initoverflows the buffer by 24B. Heap corruption on first mutex init under the thread pool. (CondBuf = 48Bhappens to match glibc — fragile coincidence.)
⚠️ Works, unhardened — http.sx
Single-worker event loop; inline (thread_pool_count = 0) + pooled handlers; keep-alive
- pipelining; delivery timeouts (
timeout_request_ms/timeout_keepalive_ms); conn cap (max_conn) + per-conn request cap (request_count); emits 400/413/431/503. Connection state machine:CONN_FREE/READING/WRITING/KEEPALIVE/HANDLINGwithgencounter. Gaps (this stream's HTTP work):
- Parser:
Content-Lengthonly (noTransfer-Encoding); no per-header-line size limit; no header-count limit; no request-line/version syntax validation; no duplicateContent-Lengthrejection; noContent-Lengthoverflow guard. - Memory:
Server.close()(:317) frees neither theconnsarray, thePoolStatestruct, norps.done→ shutdown leaks. - Shutdown:
run()is an infinite loop; noServer.stop();close()is abrupt (no drain). - Timeouts: delivery-only. No handler-execution timeout — a hung handler blocks the loop (inline) or pins a pool worker forever; no cancellation.
- Observability: none. All accept/read/write/loop faults close the connection silently — no log hook, no counters/metrics.
- Response: whole response built in one allocation; no streaming, no body-size backpressure; alloc-failure path unhandled.
❌ Absent entirely
- CI: no
.github/workflows, no Linux CI. Localzig build teston macOS only. - Fuzz / sanitizers / leak-check: none. tests/stress-http.sh
is broken (references deleted
examples/32-http-server.sx). - Releases: no git tags, no CHANGELOG, no stability tiers.
- Security: no SECURITY.md, no disclosure process, no posture statement.
- Deploy docs: cross-compile/static-link documented in readme.md; no systemd/Docker/reverse-proxy/health-check/graceful-shutdown examples.
- TLS: none yet — to be added natively via an mbedTLS FFI binding (Phase T). Proxy deployment stays documented as an option (D1).
- Routing / query / form helpers: manual
if req.path == …dispatch only.
Phases (dependency-ordered; checklist item in parens)
C3 is the keystone — until socket.sx + thread.sx are correct on Linux, nothing in P0 is honestly testable on Linux, so Phase C precedes all else regardless of how the rest is sliced.
Phase C — Linux foundation (P0 #1) — unblocks everything
- C3a —
socket.sxper-OS. BranchSockAddr(dropsin_lenon Linux),O_NONBLOCK, the errno constants, anderrno_slot(__errorvs__errno_location) onOS/ARCH, mirroring theinline if OS ==pattern already proven inevent.sx/sched.sx. No silent fallback defaults (CLAUDE.md rule). Lock a Linux-vs-Darwin layout/const test red, then green. - C3b —
thread.sxper-OS. CorrectMutexBuf/CondBufsizes per glibc (40/48) vs Darwin (64/48), branched. Memory-safety fix, not cosmetics. Validate under the ApplecontainerLinux VM that the pool no longer corrupts the heap. - C4 — Linux CI. A workflow building + running
zig build test(incl. the HTTP corpus) on Linux. The Apple-containerpath is proven for local validation; CI needs a real Linux runner (GH Actionsubuntuand/or self-hosted aarch64). First CI of any kind for the repo. - C5 — Linux socket I/O corpus. Examples covering accept/read/write/close/error on
Linux (today only the macOS-friendly
1633covers the happy path). Threaded-handler example included. - Acceptance: basic server compiles + runs on Linux; HTTP suite passes on Linux; accept/ read/write/close/error paths covered; threaded mode correct.
Phase H — HTTP hardening (P0 #2–6)
- H1 — Parser hardening (#2). Max header-line size, max header count, strict
request-line + version validation, CRLF strictness,
Content-Lengthoverflow guard + duplicate/conflicting rejection,Transfer-Encoding: chunked→ 501 (full impl in S1), slowloris coverage (delivery-timeout already mitigates). Outcomes: 400 / 413 / 431 / 501 / safe-close. Unit + fuzz-seed corpus. - H2 — Memory lifecycle (#3). Fix
Server.close()to freeconns,PoolState, andps.done. Document allocator ownership (long-lived containers must capture their owner — CLAUDE.md rule; the read buffers are intentionally per-conn). Leak gate: start/stop loop with GPA counters asserting zero + repaired stress script for RSS-over-churn. - H3 — Graceful shutdown (#4).
Server.stop()— stop accepting, drain in-flight within a timeout, close idle keep-alives, return fromrun()cleanly. Tests: start/stop/restart in one process; no FD leak; no mem leak. - H4 — Explicit errors + observability hooks (#5, #9). Route accept/read/write/loop faults through a pluggable log/error hook instead of silent close; add counters (active / accepted / closed conns, requests served, parser errors, timeouts, rejected, 4xx/5xx, pool queue depth, optional request duration). Hook-based — no forced logging format. (#5 and #9 interlock; land together.)
- H5 — Handler timeout + cancellation (#6). Per-request deadline enforced in BOTH
inline and pool modes; bound time in
CONN_HANDLING; timed-out → 504 or safe close. A never-returning handler must not permanently consume capacity.
Phase T — Native TLS via mbedTLS (#15) — revises the proxy-only posture
Native in-process HTTPS by binding a vetted C library (mbedTLS) over FFI — not a
pure-sx TLS stack (out of scope: security-critical, multi-year). Slotted after H because
TLS folds into the same connection state machine + read_more/write_more paths, which
must be stable first. Backend: mbedTLS (small pure-C, clean WANT_READ/WANT_WRITE
non-blocking API, static-links cleanly into --self-contained musl ELF; Apache-2.0).
- T1 — mbedTLS FFI binding. New
library/modules/ffi/mbedtls.sx(orstd/tls.sx):extern "c"decls formbedtls_ssl_{init,setup,handshake,read,write,close_notify},mbedtls_ssl_config,mbedtls_x509_crt,mbedtls_pk_context,mbedtls_ctr_drbg+mbedtls_entropy,mbedtls_ssl_set_bio, and theWANT_READ/WANT_WRITEerror constants. Loud failure on any setup error (no silent default — CLAUDE.md rule). - T2 — Transport abstraction in
http.sx. Introduce a transport seam soread_more/write_morego through plaintext (today'ssocket.*_nb) OR TLS, instead of calling the socket directly. mbedTLS BIO callbacks bridge to the non-blocking fd: map socketWouldBlock→MBEDTLS_ERR_SSL_WANT_READ/WANT_WRITE. - T3 — Handshake state + event-loop integration. New
CONN_TLS_HANDSHAKEstate beforeCONN_READING; drivembedtls_ssl_handshakeincrementally, mappingWANT_READ→loop.add_read,WANT_WRITE→loop.add_write; handshake deadline (reusetimeout_request_ms); gracefulclose_notifyon shutdown (ties into H3). - T4 — TLS config surface.
Configgainstls_enabled, cert/key/chain paths, min version (default TLS 1.2+, prefer 1.3), optional ALPN, SNI (single default cert first; multi-cert later). Cert/key load failure is a loudHttpErr, never a silent fallthrough. - T5 — Tests + static-link + Linux validation. TLS corpus example: in-process mbedTLS
client handshakes against the server over loopback with a self-signed cert fixture
(under
examples/http/16xx-…/); cover bad-cert, handshake-failure, and mid-handshake client-abort paths. Verify a--self-containedstatic build links mbedTLS; run on macOS- aarch64 Linux (Apple
container). Document the per-target mbedTLS static-archive requirement for self-contained builds (vendor vs system).
- aarch64 Linux (Apple
Phase S — Streaming, stress, stability (P1)
- S1 — Streaming responses + chunked out (#10). Explicit
Content-Length; stream large bodies without buffering the whole response; write-backpressure-aware send; header-set / status / content-type helpers. (Builds on H1's chunked scaffolding.) - S2 — Request-body streaming (#11). Incremental body reader, configurable max, early reject, mid-body-disconnect handling, backpressure-aware reads. Enables real inbound chunked bodies.
- S3 — Fuzz harness (#7). libFuzzer/AFL targets: request-line, header,
Content-Length, keep-alive + pipeline state machine, partial reads, malformed bodies, random close timing. Runs manually + in CI. Crash/panic/hang = bug. - S4 — Load/stress suite (#8). Repair + expand the stress scripts: many short-lived, many keep-alive, slow clients, large bodies at the limit, pool saturation, FD exhaustion → 503/backpressure (not crash), RSS-over-time. Document expected overload behavior.
- S5 — Concurrency model docs (#12). Write up the allocator/thread-safety rules already asserted in thread.sx:11 + the http.sx header: handler execution model, per-request lifetime, what may be retained after a handler returns, misuse cases.
- S6 — API stability + security posture (#13, #14). Tag a milestone; define the stable
std subset (
http/socket/event/thread/mem); SECURITY.md + disclosure process + the "reverse-proxy-only, not for direct internet exposure" posture statement + known limitations.
Phase D — Deploy & ergonomics (P2)
- D1 — Reverse-proxy + deployment docs (#15, #20). With native TLS shipping in Phase T, proxy deployment is now an option, not the only option — document both. Cover proxy TLS termination, forwarded headers, client-IP, size limits, timeouts, keep-alive, recommended proxy settings; AND native-TLS direct-exposure guidance (cert rotation, cipher/version policy). Plus systemd unit, Docker, health-check endpoint, graceful-shutdown, logging examples; release-binary build + static/dynamic linking notes (incl. the mbedTLS static-archive note from T5; cross-compile already in readme.md).
- D2 — Routing + query/form helpers (#16, #17). Thin layer over manual dispatch: method
- path routing, path params, query parsing, 404/405, per-route limits/timeouts; form (urlencoded/multipart) + JSON request/response helpers over the existing json.sx.
- D3 — Honest benchmarks (#18). Revive bench/run.sh: plain-text, JSON, keep-alive, concurrency, pool, slow-client vs a baseline server; record hardware/ OS/flags/command; measure latency, throughput, memory, error rate.
- D4 — Compiler-confidence framing (#19). Largely already true (corpus +
issues/regressions, subprocess-isolated runner). Add the "supported vs experimental" labelling for language + std features; ensure production-critical features have corpus coverage.
Decisions Log (HTTPZ specifics)
- Native TLS via an mbedTLS FFI binding (Phase T) — supersedes the original
reverse-proxy-only posture (2026-06-26). The server gains in-process HTTPS; reverse-proxy
deployment stays supported and documented (D1) as an option. No pure-sx TLS stack —
TLS is security-critical and is delegated to the vetted C library. mbedTLS chosen over
OpenSSL/LibreSSL for its small pure-C footprint, clean non-blocking
WANT_READ/WANT_WRITEAPI, and clean static-linking into--self-containedmusl builds (Apache-2.0). Transfer-Encoding: chunked: reject (501) in H1, implement in S1/S2. Pragmatic P0 minimum is explicit rejection; full chunked support is gated on the streaming work.- Stress/fuzz/load live outside the corpus. The corpus runner has a 10s/example
timeout and no network sandbox; long-running adversarial harnesses are separate scripts
wired into CI, not
examples/. - No silent fallback defaults in any C3 branching (CLAUDE.md REJECTED PATTERNS): a failed/unhandled OS or arch arm bails loudly, never picks a "reasonable-looking" Darwin default.