Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize look-write-rename for writing blocks #181

Open
sourcefrog opened this issue Aug 13, 2022 · 1 comment
Open

Optimize look-write-rename for writing blocks #181

sourcefrog opened this issue Aug 13, 2022 · 1 comment

Comments

@sourcefrog
Copy link
Owner

Yep, it does currently

  1. See if the block is already present, in which case we don't need to do the work to compress it.
  2. Write to a temporary file, so that if the process is interrupted we won't be left with an incomplete file under the final name. (This is not 100% guaranteed by the filesystem, but it's the usual pattern.)
  3. Rename into place.

This is pretty reasonable (although perhaps not optimal) locally but not good if the filesystem is very high latency.

A few options:

  1. Just issue more parallel IO.
  2. Remember which blocks are referenced by the basis index: we can already assume they're present and should not need to check. (The most common case, of an unchanged file, does not check, but there might be other edge cases. This should be pretty rare.)
  3. Similarly, remember blocks that we've already seen are present. (Cache in RAM for presence of blocks #106)
  4. If we have a Transport API for the remote filesystem, then in some cases that may already support a reliable atomic write that cannot leave the file half-written. For example this should be possible on S3. Then we don't need the rename.
  5. Even on Unix or Windows maybe a faster atomic write is possible?

Originally posted by @sourcefrog in #177 (comment)

@sourcefrog
Copy link
Owner Author

The rename happens inside the Transport, in

fn write_file(&self, relpath: &str, content: &[u8]) -> io::Result<()> {
let full_path = self.full_path(relpath);
let dir = full_path.parent().unwrap();
let mut temp = tempfile::Builder::new()
.prefix(crate::TMP_PREFIX)
.tempfile_in(dir)?;
if let Err(err) = temp.write_all(content) {
let _ = temp.close();
return Err(err);
}
if let Err(persist_error) = temp.persist(&full_path) {
persist_error.file.close()?;
Err(persist_error.error)
} else {
Ok(())
}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant