F2.2: reject raw control bytes (U+0000..U+001F) in JSON strings

parse_string scanned for `"` and `\` but accepted every other byte,
including raw control characters. RFC 8259 §7 requires those bytes to be
escaped inside a string; an unescaped one is invalid JSON and must surface
a parse error, not be silently accepted.

Add `BadControlChar` to JsonParseError and reject any unescaped byte < 0x20
in the string body scan (which gates the decode path too, so escaped forms
like \t/\n/	 still decode correctly; 0x20 and 0x7F are not over-rejected).

Regression test in examples/0714: raw 0x09/0x0A/0x00 each raise
BadControlChar via `?`/`!`; a positive case proves the escaped forms still
decode to the right bytes. All prior assertions kept.
This commit is contained in:
agra
2026-06-04 02:32:32 +03:00
parent 301e966bcf
commit 2871342c0a
3 changed files with 42 additions and 4 deletions

View File

@@ -349,9 +349,11 @@ write_to_file :: (v: Value, file: *File, staging: []u8) -> !JsonError {
//
// NOT SUPPORTED (rejected, not silently accepted): a fraction or exponent
// in a number (`1.5`, `1e9`) → `BadNumber`; a number outside s64 →
// `BadNumber`; a leading-zero integer (`01`) → `BadNumber`. UNESCAPED raw
// control bytes (< 0x20) inside a string are passed through verbatim (the
// minimal-reader leniency the manifest / db.json never exercise).
// `BadNumber`; a leading-zero integer (`01`) → `BadNumber`. An UNESCAPED
// raw control byte (U+0000..U+001F) inside a string → `BadControlChar`
// (RFC 8259 §7 requires those bytes to be escaped); the escaped forms
// (`\t`, `\n`, `\u0009`, …) stay valid and decode normally. Bytes >= 0x20,
// including 0x7F (DEL) and UTF-8 continuation bytes (>= 0x80), pass through.
//
// HEAP DISCIPLINE (binding, see heap-discipline.md). Exactly two kinds of
// allocation happen, both through the EXPLICIT `alloc` parameter, never
@@ -377,7 +379,7 @@ write_to_file :: (v: Value, file: *File, staging: []u8) -> !JsonError {
// The reader's failure contract. Meaningful variants so a caller can tell
// a truncated document from a bad escape from trailing junk.
JsonParseError :: error { UnexpectedToken, UnexpectedEnd, BadEscape, BadNumber, TrailingGarbage }
JsonParseError :: error { UnexpectedToken, UnexpectedEnd, BadEscape, BadNumber, TrailingGarbage, BadControlChar }
// Lowercase/uppercase hex nibble value (0..15) of an ASCII byte; a non-hex
// byte in a `\uXXXX` escape is a `BadEscape`.
@@ -518,6 +520,11 @@ Parser :: struct {
has_escape = true;
i += 1;
if i >= self.src.len { raise error.UnexpectedEnd; }
} else if c < 32 {
// RFC 8259 §7: a raw control byte (U+0000..U+001F) must be
// escaped inside a string; an unescaped one is invalid JSON.
self.pos = i;
raise error.BadControlChar;
}
i += 1;
}