P2.2: fix put_file content-addressing — hash the published bytes (single source read)
put_file hashed the source path, then copied the source again — two reads. A source mutated in between would publish bytes whose digest != returned key, breaking the content-addressed invariant. Now copy the source once into a provisional staging file, derive the key from the SHA-256 of that staged file (the exact bytes published), then dedup/atomic-rename. Guarantees key == digest(published object) with a single source read. Extends the acceptance test: re-hashes the stored object and asserts it equals the returned key (and std.hash / shasum of the fixture), asserts cross-path dedup (put_file and put_bytes of identical content share one object), and asserts the staging temp is cleaned up on both the success and dedup paths.
This commit is contained in:
@@ -6,13 +6,20 @@
|
||||
// `<root>/objects/<digest>`. This key is what populates an
|
||||
// Artifact.sha256 / Artifact.storage_key at the domain boundary.
|
||||
//
|
||||
// Publish is a two-phase write: bytes are first written to
|
||||
// `<root>/staging/<key>`, then atomically renamed into
|
||||
// `<root>/objects/<key>`. The rename is the only operation that makes an
|
||||
// object visible at its final path, so an interrupted or failed write
|
||||
// never leaves a torn object — a half-written staging file is not
|
||||
// reachable as `objects/<key>`. Staging and objects share `<root>` (one
|
||||
// filesystem), so the rename is atomic.
|
||||
// Publish is a two-phase write: bytes are first written under
|
||||
// `<root>/staging/`, then atomically renamed into `<root>/objects/<key>`.
|
||||
// The rename is the only operation that makes an object visible at its
|
||||
// final path, so an interrupted or failed write never leaves a torn
|
||||
// object — a half-written staging file is not reachable as
|
||||
// `objects/<key>`. Staging and objects share `<root>` (one filesystem),
|
||||
// so the rename is atomic.
|
||||
//
|
||||
// `put_bytes` stages the in-memory bytes at `staging/<key>` (the key is
|
||||
// known up front). `put_file` reads its source exactly once: it copies
|
||||
// the source into a provisional `staging/incoming-<n>`, then derives the
|
||||
// key from the SHA-256 of THAT staged file — the exact bytes that get
|
||||
// published. So `key == digest(published object)` holds even if the
|
||||
// source is mutated after the copy; the source is never read twice.
|
||||
//
|
||||
// Dedup: identical bytes hash to the same key, so a put whose object
|
||||
// already exists returns immediately without re-staging or rewriting.
|
||||
@@ -56,9 +63,12 @@ digest_of_file :: (path: string) -> (string, !StoreErr) {
|
||||
|
||||
Store :: struct {
|
||||
root: string;
|
||||
// Monotonic per-store counter naming `put_file`'s provisional staging
|
||||
// files, so concurrent file puts don't clobber each other's temp copy.
|
||||
seq: s64;
|
||||
|
||||
init :: (root: string) -> Store {
|
||||
return Store.{ root = root };
|
||||
return Store.{ root = root, seq = 0 };
|
||||
}
|
||||
|
||||
objects_dir :: (self: *Store) -> string { return path_join(self.root, "objects"); }
|
||||
@@ -80,10 +90,14 @@ Store :: struct {
|
||||
return sp;
|
||||
}
|
||||
|
||||
// Phase 1 (file source): copy `src`'s bytes into `staging/<key>`.
|
||||
stage_copy :: (self: *Store, key: string, src: string) -> (string, !StoreErr) {
|
||||
// Phase 1 (file source): copy `src` once into a provisional staging
|
||||
// file `staging/incoming-<n>`. The key isn't known until these staged
|
||||
// bytes are hashed, so the name is a per-put sequence — never
|
||||
// `objects/<key>`, so an interrupted copy is never a published object.
|
||||
stage_temp_copy :: (self: *Store, src: string) -> (string, !StoreErr) {
|
||||
if !fs.create_dir_all(self.staging_dir()) { raise error.Stage; }
|
||||
sp := self.staging_path(key);
|
||||
self.seq += 1;
|
||||
sp := self.staging_path(concat("incoming-", int_to_string(self.seq)));
|
||||
if !fs.copy_file(src, sp) { raise error.Stage; }
|
||||
return sp;
|
||||
}
|
||||
@@ -106,11 +120,18 @@ Store :: struct {
|
||||
return key;
|
||||
}
|
||||
|
||||
// Store a file's bytes and return their storage key. Dedup as above.
|
||||
// Store a file's bytes and return their storage key. The source is
|
||||
// read exactly once — copied into staging, then hashed there — so the
|
||||
// returned key is the SHA-256 of the bytes actually published, not of a
|
||||
// separate read that could disagree. Dedup: if the object already
|
||||
// exists, the staged copy is dropped and the existing key returned.
|
||||
put_file :: (self: *Store, path: string) -> (string, !StoreErr) {
|
||||
key := try digest_of_file(path);
|
||||
if self.has(key) { return key; }
|
||||
sp := try self.stage_copy(key, path);
|
||||
sp := try self.stage_temp_copy(path);
|
||||
key := try digest_of_file(sp);
|
||||
if self.has(key) {
|
||||
fs.delete_file(sp);
|
||||
return key;
|
||||
}
|
||||
try self.publish(sp, key);
|
||||
return key;
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user