Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FilePath.setContent should do an atomic rename on Windows where possible #3004

Open
twisted-trac opened this issue Jan 22, 2008 · 15 comments
Open

Comments

@twisted-trac
Copy link

glyph's avatar @glyph reported
Trac ID trac#3004
Type defect
Created 2008-01-22 12:04:16Z

It would really be helpful to have someone who knows a lot of Windows stuff look at this ticket and figure out what the right thing to do is.

Using the python stdlib's default os.rename, the right thing to do would be to prefer ending up in a more-consistent state. Currently setContent will delete the old file before moving the new file into place. It might also be advisable to have a "repair" method to complete setContent operations which died after the file was written but before it was moved into place. This should be a separate ticket since UNIX can leave extraneous temporary files around too, but whether it needs to differ per platform depends on whether we can use this:

It looks like the MoveFileWithProgress (or MoveFileEx, or even MoveFileTransacted on Vista) may provide an atomic-rename facility. If we can invoke these without external dependencies it might be better to forego the use of the os module in this case.

Searchable metadata
trac-id__3004 3004
type__defect defect
reporter__glyph glyph
priority__normal normal
milestone__ 
branch__ 
branch_author__ 
status__new new
resolution__None None
component__core core
keywords__ 
time__1201003456000000 1201003456000000
changetime__1559919782134300 1559919782134300
version__None None
owner__ 
cc__exarkun cc__Trent.Nelson cc__zooko@... cc__davidsarah
@twisted-trac
Copy link
Author

exarkun's avatar @exarkun commented

I don't think there could be a repair method which completes the operation. There's no way to know if the temporary file has been fully written or not. If some additional functionality should be provided, it should be to clean up the temporary file (ie, delete it). However, since it's easy to determine the name of the temporary file, I'm not sure this really needs specific support by a new API.

For the particular case where setContent dies on Windows between renaming the original file out of the way and deleting it, it's possible to complete the operation, since it is known that all the data is present in that case. This would be a no-op on other platforms though, and if we use some Windows API that supports the rename atomically, then there's no need for it on Windows either.

@twisted-trac
Copy link
Author

glyph's avatar @glyph commented

Replying to exarkun:

I don't think there could be a repair method which completes the operation. There's no way to know if the temporary file has been fully written or not.

I was referring to the case that you refer to here:

For the particular case where setContent dies on Windows between renaming the original file out of the way and deleting it, it's possible to complete the operation, since it is known that all the data is present in that case. This would be a no-op on other platforms though, and if we use some Windows API that supports the rename atomically, then there's no need for it on Windows either.

On other platforms, or in the case where we have an atomic rename on Windows (the APIs I referred to don't clearly describe their error behavior) it would simply delete any temporary files that were created by not finished. The important thing is that you could call this method on a directory without necessarily understanding the steps involved in setContent, and get something consistent back.

@twisted-trac
Copy link
Author

glyph's avatar @glyph commented

Replying to glyph:

temporary files that were created by not finished.

but not finished, I meant.

@twisted-trac
Copy link
Author

khorn's avatar @khorn commented

This appears to be a duplicate of #3454

@twisted-trac
Copy link
Author

glyph's avatar @glyph set owner to PenguinOfDoom

@twisted-trac
Copy link
Author

Automation's avatar Automation removed owner

@twisted-trac
Copy link
Author

davidsarah's avatar davidsarah commented

As far as I understand, MoveFileEx or MoveFileWithProgress with the MOVEFILE_REPLACE_EXISTING flag does not do an atomic rename, and I don't see any reason to believe that this would be an improvement over what setContent currently does.

MoveFileTransacted with MOVEFILE_REPLACE_EXISTING (and not MOVEFILE_COPY_ALLOWED or MOVEFILE_DELAY_UNTIL_REBOOT) does do an atomic rename on Vista or later, on an NTFS filesystem.

It would also be useful to expose the atomic rename directly as well as using it in setContent.

I don't think it would be a good idea to have FilePath.setContent depend on pywin32; use ctypes for this.

@twisted-trac
Copy link
Author

glyph's avatar @glyph commented

Replying to davidsarah:

I don't think it would be a good idea to have FilePath.setContent depend on pywin32; use ctypes for this.

If it did use pywin32, it would be an optional dependency, falling back to the current behavior. The advantage of pywin32 is that you get 64-bit support for free, and it's somebody else's job to make sure it doesn't segfault.

Thanks for the info though, and I look forward to your ctypes-using patch :).

@twisted-trac
Copy link
Author

exarkun's avatar @exarkun commented

pywin32 is already a fairly significant dependency on Windows. I think we should depend on it if we are going to provide this functionality.

"Atomic update, unless you (or a naive end user) forgot to install a dependency" seems like very unuseful behavior to offer in this API.

@twisted-trac
Copy link
Author

zooko's avatar @zooko commented

For context, we recently succeeded at eliminating pywin32 from the dependencies of Tahoe-LAFS, allowing us to remove instructions to our users to manually install pywin32 if they are on Windows. This is very important to me—it has taken about five years, and we've gradually eliminated every instruction to manually install a dependency other than Python. pywin32 was the last one to go.

This was possible because David-Sarah wrote code using ctypes to do what we previously used pywin32 for. (And, unfortunately, because we changed to use subprocess instead of getProcessOutput.)

For Tahoe-LAFS's purposes, I would rather suffer an unreliable setContent than add a dependency on pywin32. Of course, knowing David-Sarah they would probably write a reliable setContent for us using ctypes and then we would end up maintaining that in the Tahoe-LAFS source code.

@twisted-trac
Copy link
Author

glyph's avatar @glyph commented

Replying to zooko:

For context, we recently succeeded at eliminating pywin32 from the dependencies of Tahoe-LAFS, allowing us to remove instructions to our users to manually install pywin32 if they are on Windows. This is very important to me—it has taken about five years, and we've gradually eliminated every instruction to manually install a dependency other than Python. pywin32 was the last one to go.

So if pywin32 fixed this bug, your objection would go away?

This was possible because David-Sarah wrote code using ctypes to do what we previously used pywin32 for. (And, unfortunately, because we changed to use subprocess instead of getProcessOutput.)

This strikes me as a bad tradeoff. Am I wrong that the choice is writing a bunch of new code vs. setting up an alternative distribution name on PyPI for pywin32 with a metadata pointer at the correct files, and then declaring it as a dependency? Presumably such a fork would motivate the actual pywin32 maintainers to add a step to their release process, or give you the relevant credentials.

But, I wouldn't reject a ctypes-based implementation out of hand; ctypes is always available and the relevant APIs are all system-level things that shouldn't change much, so by all means submit a patch :).

For Tahoe-LAFS's purposes, I would rather suffer an unreliable setContent than add a dependency on pywin32. Of course, knowing David-Sarah they would probably write a reliable setContent for us using ctypes and then we would end up maintaining that in the Tahoe-LAFS source code.

If you really want to avoid the possibility of a future required dependency on pywin32, we should probably have a Twisted buildbot without it installed. Would you like to supply or maintain one? This may be a little tricky, because Buildbot itself requires parts of Twisted that need pywin32; the Python environments would need to be isolated from each other.

Ideally you could just fix the issue with pywin32's installation though, and skip this whole mess :).

@twisted-trac
Copy link
Author

zooko's avatar @zooko commented

Replying to glyph:

So if pywin32 fixed this bug, your objection would go away?

I had forgotten that I almost got that working in 2009!

Yes, my objection is all about installation.

I should definitely try to replicate my success from that comment from 2009. I'm not sure what exactly would be required to make it work for us. Ideally doing python setup.py bdist_egg from a pywin32 source tree on Windows would suffice, but of course we would want that to be automated on a buildbot.

If you really want to avoid the possibility of a future required dependency on pywin32, we should probably have a Twisted buildbot without it installed. Would you like to supply or maintain one? This may be a little tricky, because Buildbot itself requires parts of Twisted that need pywin32; the Python environments would need to be isolated from each other.

Here is a ticket about testing, on the Tahoe-LAFS buildbot, that Tahoe-LAFS doesn't depend on pywin32 [http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1334 Tahoe-LAFS #10915]. I wonder if virtualenv or exocet would help.

Here is a ticket about testing, on the Twisted buildbot, that Twisted doesn't depend on any packages which aren't declared in a machine-parseable format in the Twisted packaging metadata: #4438 . Currently that ticket is closed as invalid but I hope it can be re-opened as a valid thing to test for.

@twisted-trac
Copy link
Author

exarkun's avatar @exarkun commented

The atomicwrites library claims to provide this functionality on Windows.

@Avasam
Copy link

Avasam commented Mar 15, 2024

So if pywin32 fixed this bug, your objection would go away?

Upstream issue has been moved to mhammond/pywin32#334

Ideally doing python setup.py bdist_egg from a pywin32 source tree on Windows would suffice

Relevant: mhammond/pywin32#2208

@adiroiban
Copy link
Member

adiroiban commented Mar 16, 2024

Thanks for update.

I think that win32 API is no longer an issue.

Stdlib should be good enough.

I think that with modern python (3.3 and newer) we can use os.replace for atomic rename on linux and windows.

The current code uses os.rename

sib = self.temporarySibling(ext)
with sib.open("w") as f:
f.write(content)
if platform.isWindows() and exists(self.path):
os.unlink(self.path)
os.rename(sib.path, self.asBytesMode().path)

Looking at the Python source code, I can see that os.replace used MoveFileExW on windows with the OVEFILE_REPLACE_EXISTING flag.
The MOVEFILE_COPY_ALLOWED flat is not used, so this is atomic and will fail if you try to rename across filesystems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants