# PLAN-HTTPZ β€” Stream HTTPZ (production HTTP-server readiness) > **STATUS: 🟑 PLANNED β€” not started.** This stream is being (re)established as a > *tracked* stream. The HTTP/socket/thread work to date shipped ad-hoc under phase > tags in source comments (`S2` socket nonblocking, `S6` pthreads, `S7a` http server, > `S7b` thread-pool handlers, `C3` per-OS selection) but **never had a PLAN/CHECKPOINT > file** β€” the comments in [socket.sx:3](../library/modules/std/socket.sx#L3) / > [thread.sx:23](../library/modules/std/thread.sx#L23) reference a "PLAN-HTTPZ" that did > not exist until now. Progress tracked in [CHECKPOINT-HTTPZ.md](CHECKPOINT-HTTPZ.md). **Goal:** the low-level guarantees needed to run a long-lived HTTP service on Linux in production β€” *not* a web framework. Survive malformed clients, slow clients, overload, restarts, memory pressure, and a normal Linux deployment without every app author rediscovering the same failure modes. Driven by the user's production-readiness checklist (P0 blockers, P1 hardening, P2 ergonomics), mapped below to concrete sx work. **Cadence (IMPASSIBLE):** no commit both adds a test AND makes it pass (lock-to-bail, then flip to green); `zig build && zig build test` green after every step; never regen snapshots while red; scope regens with `-Dname=examples//.sx -Dupdate-goldens` + review the diff. HTTP corpus lives in `examples/http/` (`16xx`/`http` category) + `examples/event/`. Stress/fuzz/load harnesses live OUTSIDE the corpus (the corpus runner has a 10s/example timeout and no network sandbox β€” see [corpus_run.test.zig](../src/corpus_run.test.zig)). --- ## Audit of record (grounded against the tree, 2026-06-26) What already exists, so the next session does not redo discovery. **Two layers of P0 #1 are already done to a high standard; one piece is a literal blocker.** ### βœ… Solid / Linux-validated β€” do NOT rebuild - **[event.sx](../library/modules/std/event.sx)** β€” `Loop` fully branches `OS == .linux` (epoll, lines ~85–251) vs kqueue (~251–323). Ran **6/6 green on real aarch64 Linux** in an Apple `container` VM (kernel 6.18); ABI corpus-locked by `examples/event/1633`. - **[net/epoll.sx](../library/modules/std/net/epoll.sx)** β€” arch-aware `EpollEvent` layout (12B packed x86_64 / 16B aligned aarch64 via the u32-split trick), correct flags, EINTR retry, `__errno_location`. - **[net/kqueue.sx](../library/modules/std/net/kqueue.sx)** β€” macOS-only, correct. - **[sched.sx](../library/modules/std/sched.sx)** β€” M:1 fiber runtime; epoll/kqueue fd-readiness fully branched incl. `EPOLL_CTL_DEL` after `EPOLLONESHOT`. Linux-tested. - **[json.sx](../library/modules/std/json.sx)** β€” streaming writer, zero-copy views, explicit allocators, stable key order. (Integers only β€” no floats.) ### ❌ BROKEN on Linux β€” the keystone blocker (Phase C3) - **[socket.sx](../library/modules/std/socket.sx)** β€” Darwin-only, no `OS` branching: - `SockAddr` ([:32](../library/modules/std/socket.sx#L32)) carries Darwin's `sin_len:u8` at offset 0; Linux `sockaddr_in` has no such field β†’ family/port written to wrong offsets, addresses corrupted. - `O_NONBLOCK = 4` β€” Linux is `2048`; `set_nonblocking` sets the wrong bit. - errno constants are macOS values (`EAGAIN=35`β†’Linux 11, `EINPROGRESS=36`β†’115, `ECONNRESET=54`β†’104, …) β†’ WouldBlock/reset detection silently breaks. - `errno_slot` binds `__error` ([:52](../library/modules/std/socket.sx#L52)) β€” Linux is `__errno_location`. - **[thread.sx](../library/modules/std/thread.sx)** β€” Darwin pthread struct sizes: - `MutexBuf = 64B` ([:44](../library/modules/std/thread.sx#L44)) is Darwin's `pthread_mutex_t`; glibc is **40B** β†’ `pthread_mutex_init` overflows the buffer by 24B. **Heap corruption on first mutex init under the thread pool.** (`CondBuf = 48B` happens to match glibc β€” fragile coincidence.) ### ⚠️ Works, unhardened β€” [http.sx](../library/modules/std/http.sx) Single-worker event loop; inline (`thread_pool_count = 0`) + pooled handlers; keep-alive + pipelining; delivery timeouts (`timeout_request_ms`/`timeout_keepalive_ms`); conn cap (`max_conn`) + per-conn request cap (`request_count`); emits 400/413/431/503. Connection state machine: `CONN_FREE/READING/WRITING/KEEPALIVE/HANDLING` with `gen` counter. **Gaps (this stream's HTTP work):** - **Parser:** `Content-Length` only (no `Transfer-Encoding`); no per-header-line size limit; no header-count limit; no request-line/version syntax validation; no duplicate `Content-Length` rejection; no `Content-Length` overflow guard. - **Memory:** `Server.close()` ([:317](../library/modules/std/http.sx#L317)) frees neither the `conns` array, the `PoolState` struct, nor `ps.done` β†’ shutdown leaks. - **Shutdown:** `run()` is an infinite loop; no `Server.stop()`; `close()` is abrupt (no drain). - **Timeouts:** delivery-only. **No handler-execution timeout** β€” a hung handler blocks the loop (inline) or pins a pool worker forever; no cancellation. - **Observability:** **none.** All accept/read/write/loop faults close the connection silently β€” no log hook, no counters/metrics. - **Response:** whole response built in one allocation; no streaming, no body-size backpressure; alloc-failure path unhandled. ### ❌ Absent entirely - **CI:** no `.github/workflows`, no Linux CI. Local `zig build test` on macOS only. - **Fuzz / sanitizers / leak-check:** none. [tests/stress-http.sh](../tests/stress-http.sh) is broken (references deleted `examples/32-http-server.sx`). - **Releases:** no git tags, no CHANGELOG, no stability tiers. - **Security:** no SECURITY.md, no disclosure process, no posture statement. - **Deploy docs:** cross-compile/static-link documented in [readme.md](../readme.md); no systemd/Docker/reverse-proxy/health-check/graceful-shutdown examples. - **TLS:** none yet β€” to be added natively via an mbedTLS FFI binding (Phase T). Proxy deployment stays documented as an option (D1). - **Routing / query / form helpers:** manual `if req.path == …` dispatch only. --- ## Phases (dependency-ordered; checklist item in parens) C3 is the keystone β€” until socket.sx + thread.sx are correct on Linux, **nothing in P0 is honestly testable on Linux**, so Phase C precedes all else regardless of how the rest is sliced. ### Phase C β€” Linux foundation (P0 #1) β€” unblocks everything - **C3a β€” `socket.sx` per-OS.** Branch `SockAddr` (drop `sin_len` on Linux), `O_NONBLOCK`, the errno constants, and `errno_slot` (`__error` vs `__errno_location`) on `OS`/`ARCH`, mirroring the `inline if OS ==` pattern already proven in `event.sx`/`sched.sx`. No silent fallback defaults (CLAUDE.md rule). Lock a Linux-vs-Darwin layout/const test red, then green. - **C3b β€” `thread.sx` per-OS.** Correct `MutexBuf`/`CondBuf` sizes per glibc (40/48) vs Darwin (64/48), branched. Memory-safety fix, not cosmetics. Validate under the Apple `container` Linux VM that the pool no longer corrupts the heap. - **C4 β€” Linux CI.** A workflow building + running `zig build test` (incl. the HTTP corpus) on Linux. The Apple-`container` path is proven for local validation; CI needs a real Linux runner (GH Actions `ubuntu` and/or self-hosted aarch64). First CI of any kind for the repo. - **C5 β€” Linux socket I/O corpus.** Examples covering accept/read/write/close/error on Linux (today only the macOS-friendly `1633` covers the happy path). Threaded-handler example included. - *Acceptance:* basic server compiles + runs on Linux; HTTP suite passes on Linux; accept/ read/write/close/error paths covered; threaded mode correct. ### Phase H β€” HTTP hardening (P0 #2–6) - **H1 β€” Parser hardening (#2).** Max header-line size, max header count, strict request-line + version validation, CRLF strictness, `Content-Length` overflow guard + duplicate/conflicting rejection, **`Transfer-Encoding: chunked` β†’ 501** (full impl in S1), slowloris coverage (delivery-timeout already mitigates). Outcomes: 400 / 413 / 431 / 501 / safe-close. Unit + fuzz-seed corpus. - **H2 β€” Memory lifecycle (#3).** Fix `Server.close()` to free `conns`, `PoolState`, and `ps.done`. Document allocator ownership (long-lived containers must capture their owner β€” CLAUDE.md rule; the read buffers are intentionally per-conn). Leak gate: start/stop loop with GPA counters asserting zero + repaired stress script for RSS-over-churn. - **H3 β€” Graceful shutdown (#4).** `Server.stop()` β€” stop accepting, drain in-flight within a timeout, close idle keep-alives, return from `run()` cleanly. Tests: start/stop/restart in one process; no FD leak; no mem leak. - **H4 β€” Explicit errors + observability hooks (#5, #9).** Route accept/read/write/loop faults through a pluggable log/error hook instead of silent close; add counters (active / accepted / closed conns, requests served, parser errors, timeouts, rejected, 4xx/5xx, pool queue depth, optional request duration). Hook-based β€” no forced logging format. (#5 and #9 interlock; land together.) - **H5 β€” Handler timeout + cancellation (#6).** Per-request deadline enforced in BOTH inline and pool modes; bound time in `CONN_HANDLING`; timed-out β†’ 504 or safe close. A never-returning handler must not permanently consume capacity. ### Phase T β€” Native TLS via mbedTLS (#15) β€” revises the proxy-only posture Native in-process HTTPS by binding a vetted C library (mbedTLS) over FFI β€” **not** a pure-sx TLS stack (out of scope: security-critical, multi-year). Slotted after H because TLS folds into the same connection state machine + `read_more`/`write_more` paths, which must be stable first. Backend: **mbedTLS** (small pure-C, clean `WANT_READ`/`WANT_WRITE` non-blocking API, static-links cleanly into `--self-contained` musl ELF; Apache-2.0). - **T1 β€” mbedTLS FFI binding.** New `library/modules/ffi/mbedtls.sx` (or `std/tls.sx`): `extern "c"` decls for `mbedtls_ssl_{init,setup,handshake,read,write,close_notify}`, `mbedtls_ssl_config`, `mbedtls_x509_crt`, `mbedtls_pk_context`, `mbedtls_ctr_drbg` + `mbedtls_entropy`, `mbedtls_ssl_set_bio`, and the `WANT_READ`/`WANT_WRITE` error constants. Loud failure on any setup error (no silent default β€” CLAUDE.md rule). - **T2 β€” Transport abstraction in `http.sx`.** Introduce a transport seam so `read_more`/ `write_more` go through plaintext (today's `socket.*_nb`) OR TLS, instead of calling the socket directly. mbedTLS BIO callbacks bridge to the non-blocking fd: map socket `WouldBlock` β†’ `MBEDTLS_ERR_SSL_WANT_READ/WANT_WRITE`. - **T3 β€” Handshake state + event-loop integration.** New `CONN_TLS_HANDSHAKE` state before `CONN_READING`; drive `mbedtls_ssl_handshake` incrementally, mapping `WANT_READ` β†’ `loop.add_read`, `WANT_WRITE` β†’ `loop.add_write`; handshake deadline (reuse `timeout_request_ms`); graceful `close_notify` on shutdown (ties into H3). - **T4 β€” TLS config surface.** `Config` gains `tls_enabled`, cert/key/chain paths, min version (default TLS 1.2+, prefer 1.3), optional ALPN, SNI (single default cert first; multi-cert later). Cert/key load failure is a loud `HttpErr`, never a silent fallthrough. - **T5 β€” Tests + static-link + Linux validation.** TLS corpus example: in-process mbedTLS *client* handshakes against the server over loopback with a self-signed cert fixture (under `examples/http/16xx-…/`); cover bad-cert, handshake-failure, and mid-handshake client-abort paths. Verify a `--self-contained` static build links mbedTLS; run on macOS + aarch64 Linux (Apple `container`). Document the per-target mbedTLS static-archive requirement for self-contained builds (vendor vs system). ### Phase S β€” Streaming, stress, stability (P1) - **S1 β€” Streaming responses + chunked out (#10).** Explicit `Content-Length`; stream large bodies without buffering the whole response; write-backpressure-aware send; header-set / status / content-type helpers. (Builds on H1's chunked scaffolding.) - **S2 β€” Request-body streaming (#11).** Incremental body reader, configurable max, early reject, mid-body-disconnect handling, backpressure-aware reads. Enables real inbound chunked bodies. - **S3 β€” Fuzz harness (#7).** libFuzzer/AFL targets: request-line, header, `Content-Length`, keep-alive + pipeline state machine, partial reads, malformed bodies, random close timing. Runs manually + in CI. Crash/panic/hang = bug. - **S4 β€” Load/stress suite (#8).** Repair + expand the stress scripts: many short-lived, many keep-alive, slow clients, large bodies at the limit, pool saturation, FD exhaustion β†’ 503/backpressure (not crash), RSS-over-time. Document expected overload behavior. - **S5 β€” Concurrency model docs (#12).** Write up the allocator/thread-safety rules already asserted in [thread.sx:11](../library/modules/std/thread.sx#L11) + the http.sx header: handler execution model, per-request lifetime, what may be retained after a handler returns, misuse cases. - **S6 β€” API stability + security posture (#13, #14).** Tag a milestone; define the stable std subset (`http`/`socket`/`event`/`thread`/`mem`); SECURITY.md + disclosure process + the "reverse-proxy-only, not for direct internet exposure" posture statement + known limitations. ### Phase D β€” Deploy & ergonomics (P2) - **D1 β€” Reverse-proxy + deployment docs (#15, #20).** With native TLS shipping in Phase T, proxy deployment is now *an option, not the only option* β€” document both. Cover proxy TLS termination, forwarded headers, client-IP, size limits, timeouts, keep-alive, recommended proxy settings; AND native-TLS direct-exposure guidance (cert rotation, cipher/version policy). Plus systemd unit, Docker, health-check endpoint, graceful-shutdown, logging examples; release-binary build + static/dynamic linking notes (incl. the mbedTLS static-archive note from T5; cross-compile already in readme.md). - **D2 β€” Routing + query/form helpers (#16, #17).** Thin layer over manual dispatch: method + path routing, path params, query parsing, 404/405, per-route limits/timeouts; form (urlencoded/multipart) + JSON request/response helpers over the existing json.sx. - **D3 β€” Honest benchmarks (#18).** Revive [bench/run.sh](../bench/run.sh): plain-text, JSON, keep-alive, concurrency, pool, slow-client vs a baseline server; record hardware/ OS/flags/command; measure latency, throughput, memory, error rate. - **D4 β€” Compiler-confidence framing (#19).** Largely already true (corpus + `issues/` regressions, subprocess-isolated runner). Add the "supported vs experimental" labelling for language + std features; ensure production-critical features have corpus coverage. --- ## Decisions Log (HTTPZ specifics) - **Native TLS via an mbedTLS FFI binding (Phase T)** β€” supersedes the original reverse-proxy-only posture (2026-06-26). The server gains in-process HTTPS; reverse-proxy deployment stays supported and documented (D1) as an option. **No pure-sx TLS stack** β€” TLS is security-critical and is delegated to the vetted C library. mbedTLS chosen over OpenSSL/LibreSSL for its small pure-C footprint, clean non-blocking `WANT_READ`/ `WANT_WRITE` API, and clean static-linking into `--self-contained` musl builds (Apache-2.0). - **`Transfer-Encoding: chunked`: reject (501) in H1, implement in S1/S2.** Pragmatic P0 minimum is explicit rejection; full chunked support is gated on the streaming work. - **Stress/fuzz/load live outside the corpus.** The corpus runner has a 10s/example timeout and no network sandbox; long-running adversarial harnesses are separate scripts wired into CI, not `examples/`. - **No silent fallback defaults in any C3 branching** (CLAUDE.md REJECTED PATTERNS): a failed/unhandled OS or arch arm bails loudly, never picks a "reasonable-looking" Darwin default.