F2.2: reject raw control bytes (U+0000..U+001F) in JSON strings
parse_string scanned for `"` and `\` but accepted every other byte, including raw control characters. RFC 8259 §7 requires those bytes to be escaped inside a string; an unescaped one is invalid JSON and must surface a parse error, not be silently accepted. Add `BadControlChar` to JsonParseError and reject any unescaped byte < 0x20 in the string body scan (which gates the decode path too, so escaped forms like \t/\n/ still decode correctly; 0x20 and 0x7F are not over-rejected). Regression test in examples/0714: raw 0x09/0x0A/0x00 each raise BadControlChar via `?`/`!`; a positive case proves the escaped forms still decode to the right bytes. All prior assertions kept.
This commit is contained in:
@@ -349,9 +349,11 @@ write_to_file :: (v: Value, file: *File, staging: []u8) -> !JsonError {
|
||||
//
|
||||
// NOT SUPPORTED (rejected, not silently accepted): a fraction or exponent
|
||||
// in a number (`1.5`, `1e9`) → `BadNumber`; a number outside s64 →
|
||||
// `BadNumber`; a leading-zero integer (`01`) → `BadNumber`. UNESCAPED raw
|
||||
// control bytes (< 0x20) inside a string are passed through verbatim (the
|
||||
// minimal-reader leniency the manifest / db.json never exercise).
|
||||
// `BadNumber`; a leading-zero integer (`01`) → `BadNumber`. An UNESCAPED
|
||||
// raw control byte (U+0000..U+001F) inside a string → `BadControlChar`
|
||||
// (RFC 8259 §7 requires those bytes to be escaped); the escaped forms
|
||||
// (`\t`, `\n`, `\u0009`, …) stay valid and decode normally. Bytes >= 0x20,
|
||||
// including 0x7F (DEL) and UTF-8 continuation bytes (>= 0x80), pass through.
|
||||
//
|
||||
// HEAP DISCIPLINE (binding, see heap-discipline.md). Exactly two kinds of
|
||||
// allocation happen, both through the EXPLICIT `alloc` parameter, never
|
||||
@@ -377,7 +379,7 @@ write_to_file :: (v: Value, file: *File, staging: []u8) -> !JsonError {
|
||||
|
||||
// The reader's failure contract. Meaningful variants so a caller can tell
|
||||
// a truncated document from a bad escape from trailing junk.
|
||||
JsonParseError :: error { UnexpectedToken, UnexpectedEnd, BadEscape, BadNumber, TrailingGarbage }
|
||||
JsonParseError :: error { UnexpectedToken, UnexpectedEnd, BadEscape, BadNumber, TrailingGarbage, BadControlChar }
|
||||
|
||||
// Lowercase/uppercase hex nibble value (0..15) of an ASCII byte; a non-hex
|
||||
// byte in a `\uXXXX` escape is a `BadEscape`.
|
||||
@@ -518,6 +520,11 @@ Parser :: struct {
|
||||
has_escape = true;
|
||||
i += 1;
|
||||
if i >= self.src.len { raise error.UnexpectedEnd; }
|
||||
} else if c < 32 {
|
||||
// RFC 8259 §7: a raw control byte (U+0000..U+001F) must be
|
||||
// escaped inside a string; an unescaped one is invalid JSON.
|
||||
self.pos = i;
|
||||
raise error.BadControlChar;
|
||||
}
|
||||
i += 1;
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user