F2.2: reject raw control bytes (U+0000..U+001F) in JSON strings

parse_string scanned for `"` and `\` but accepted every other byte,
including raw control characters. RFC 8259 §7 requires those bytes to be
escaped inside a string; an unescaped one is invalid JSON and must surface
a parse error, not be silently accepted.

Add `BadControlChar` to JsonParseError and reject any unescaped byte < 0x20
in the string body scan (which gates the decode path too, so escaped forms
like \t/\n/	 still decode correctly; 0x20 and 0x7F are not over-rejected).

Regression test in examples/0714: raw 0x09/0x0A/0x00 each raise
BadControlChar via `?`/`!`; a positive case proves the escaped forms still
decode to the right bytes. All prior assertions kept.
This commit is contained in:
agra
2026-06-04 02:32:32 +03:00
parent 301e966bcf
commit 2871342c0a
3 changed files with 42 additions and 4 deletions

View File

@@ -42,6 +42,15 @@ raises :: (src: string, want: JsonParseError, alloc: Allocator) -> bool {
e == want
}
// True when parsing `"a<b>b"` (a string holding the RAW control byte `b`)
// raises BadControlChar. Built from a byte buffer because a raw control
// byte can't appear in an sx string literal.
ctrl_raises :: (b: u8, alloc: Allocator) -> bool {
raw : [5]u8 = ---;
raw[0] = 34; raw[1] = 97; raw[2] = b; raw[3] = 98; raw[4] = 34; // "a<b>b"
return raises(string.{ ptr = @raw[0], len = 5 }, error.BadControlChar, alloc);
}
main :: () -> ! {
gpa := GPA.init();
arena := Arena.init(xx gpa, 8192);
@@ -125,6 +134,21 @@ main :: () -> ! {
report("err-overflow", raises("9223372036854775808", error.BadNumber, xx arena));
report("err-unterminated", raises("\"abc", error.UnexpectedEnd, xx arena));
// ── 7. RFC 8259 §7: unescaped control bytes (U+0000..U+001F) ──────
// A RAW control byte inside a string is invalid JSON -> BadControlChar.
report("err-raw-tab", ctrl_raises(9, xx arena)); // raw 0x09
report("err-raw-lf", ctrl_raises(10, xx arena)); // raw 0x0A
report("err-raw-nul", ctrl_raises(0, xx arena)); // raw 0x00
// POSITIVE: the ESCAPED control forms stay valid and decode to the
// exact bytes. JSON "\t\n\u0009" -> 0x09 0x0A 0x09 (3 bytes).
esc := try parse("\"\\t\\n\\u0009\"", xx arena);
es := esc.str;
report("esc-ctrl-len", es.len == 3);
report("esc-tab", es[0] == 0x09); // \t
report("esc-lf", es[1] == 0x0A); // \n
report("esc-u", es[2] == 0x09); // \u0009
print("=== DONE ===\n");
return;
}

View File

@@ -33,4 +33,11 @@ err-fraction: ok
err-leading-zero: ok
err-overflow: ok
err-unterminated: ok
err-raw-tab: ok
err-raw-lf: ok
err-raw-nul: ok
esc-ctrl-len: ok
esc-tab: ok
esc-lf: ok
esc-u: ok
=== DONE ===