F2.2: reject raw control bytes (U+0000..U+001F) in JSON strings
parse_string scanned for `"` and `\` but accepted every other byte, including raw control characters. RFC 8259 §7 requires those bytes to be escaped inside a string; an unescaped one is invalid JSON and must surface a parse error, not be silently accepted. Add `BadControlChar` to JsonParseError and reject any unescaped byte < 0x20 in the string body scan (which gates the decode path too, so escaped forms like \t/\n/ still decode correctly; 0x20 and 0x7F are not over-rejected). Regression test in examples/0714: raw 0x09/0x0A/0x00 each raise BadControlChar via `?`/`!`; a positive case proves the escaped forms still decode to the right bytes. All prior assertions kept.
This commit is contained in:
@@ -42,6 +42,15 @@ raises :: (src: string, want: JsonParseError, alloc: Allocator) -> bool {
|
||||
e == want
|
||||
}
|
||||
|
||||
// True when parsing `"a<b>b"` (a string holding the RAW control byte `b`)
|
||||
// raises BadControlChar. Built from a byte buffer because a raw control
|
||||
// byte can't appear in an sx string literal.
|
||||
ctrl_raises :: (b: u8, alloc: Allocator) -> bool {
|
||||
raw : [5]u8 = ---;
|
||||
raw[0] = 34; raw[1] = 97; raw[2] = b; raw[3] = 98; raw[4] = 34; // "a<b>b"
|
||||
return raises(string.{ ptr = @raw[0], len = 5 }, error.BadControlChar, alloc);
|
||||
}
|
||||
|
||||
main :: () -> ! {
|
||||
gpa := GPA.init();
|
||||
arena := Arena.init(xx gpa, 8192);
|
||||
@@ -125,6 +134,21 @@ main :: () -> ! {
|
||||
report("err-overflow", raises("9223372036854775808", error.BadNumber, xx arena));
|
||||
report("err-unterminated", raises("\"abc", error.UnexpectedEnd, xx arena));
|
||||
|
||||
// ── 7. RFC 8259 §7: unescaped control bytes (U+0000..U+001F) ──────
|
||||
// A RAW control byte inside a string is invalid JSON -> BadControlChar.
|
||||
report("err-raw-tab", ctrl_raises(9, xx arena)); // raw 0x09
|
||||
report("err-raw-lf", ctrl_raises(10, xx arena)); // raw 0x0A
|
||||
report("err-raw-nul", ctrl_raises(0, xx arena)); // raw 0x00
|
||||
|
||||
// POSITIVE: the ESCAPED control forms stay valid and decode to the
|
||||
// exact bytes. JSON "\t\n\u0009" -> 0x09 0x0A 0x09 (3 bytes).
|
||||
esc := try parse("\"\\t\\n\\u0009\"", xx arena);
|
||||
es := esc.str;
|
||||
report("esc-ctrl-len", es.len == 3);
|
||||
report("esc-tab", es[0] == 0x09); // \t
|
||||
report("esc-lf", es[1] == 0x0A); // \n
|
||||
report("esc-u", es[2] == 0x09); // \u0009
|
||||
|
||||
print("=== DONE ===\n");
|
||||
return;
|
||||
}
|
||||
|
||||
@@ -33,4 +33,11 @@ err-fraction: ok
|
||||
err-leading-zero: ok
|
||||
err-overflow: ok
|
||||
err-unterminated: ok
|
||||
err-raw-tab: ok
|
||||
err-raw-lf: ok
|
||||
err-raw-nul: ok
|
||||
esc-ctrl-len: ok
|
||||
esc-tab: ok
|
||||
esc-lf: ok
|
||||
esc-u: ok
|
||||
=== DONE ===
|
||||
|
||||
Reference in New Issue
Block a user