You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Blob.download_to_file() with no chunk size set causes the file to be pulled entirely into RAM before starting to write it to the supplied file object. This is due to google/resumable_media/download.py's "sans I/O" approach.
I can work around it by setting the chunk size parameter, however this changes both what I am billed for (multiple GETs rather than a single GET) and application's latency profile to paper over an obvious deficiency in the client library, so if you don't consider this a design flaw then at least consider it a performance bug.
Either way, I've lost 24 hours of data due to an OOM triggered by this, the docstring should at a minimum be updated to warn the user that the library cares more about some trend from last week than doing what you'd expect the simplest possible HTTP client talking to Gcloud could have done in the early 90s.
Apologies for the angst, but I'm leaving it in here, because you really need to know how disappointing it is to have a crash of this sort caused by a library function so conceptually simple.
the library cares more about some trend from last week than doing what you'd expect the simplest possible HTTP client
I don't think that's called for, though I do appreciate you talking the time to file the issue. I designed google-resumable-media-python, so you can put the blame 100% on my shoulders.
The sans-I/O approach has nothing to do with the "missing feature" of piping the HTTP payload directly into a file. What's more, the behavior of the old code (i.e. google.cloud.streaming) is mostly the same, and that was written 3+ years ago.
The real issue is that the google-cloud-storage library doesn't do some kind of "smart chunking". It's definitely worth discussing. The reason there is no smart chunking is because we may not "a priori" know the size of the file, and it would be an extra round trip to determine the size before the first request. (Right now the google-cloud-storage package does chunked or full download, depending only on the chunk size.)
We have discussed strategies to reduce the amount of a file that ends up in RAM, but haven't had a chance to implement it yet.
It seems the sans-IO approach requires a logical abstraction for the entire response, which necessitates it living temporarily on disk or in RAM
I'm confused about this concept of smart chunking -- both producer and consumer of the stream already apply chunking (libc / Python io lib's buffering on the consumer side), and kernel's TCP windowing on the producer side, it doesn't seem like you should be implementing any kind of policy in your code, considering both kernel and IO library are trivial to adjust as needs require.
To repeat, your library should not be buffering whatsoever, and the current implementation should never have been checked in.
Apologies for the grumpy bug report earlier, but this has cost me dearly and most of that code should not have existed in the first instance.
Hello,
Blob.download_to_file() with no chunk size set causes the file to be pulled entirely into RAM before starting to write it to the supplied file object. This is due to google/resumable_media/download.py's "sans I/O" approach.
I can work around it by setting the chunk size parameter, however this changes both what I am billed for (multiple GETs rather than a single GET) and application's latency profile to paper over an obvious deficiency in the client library, so if you don't consider this a design flaw then at least consider it a performance bug.
Either way, I've lost 24 hours of data due to an OOM triggered by this, the docstring should at a minimum be updated to warn the user that the library cares more about some trend from last week than doing what you'd expect the simplest possible HTTP client talking to Gcloud could have done in the early 90s.
Apologies for the angst, but I'm leaving it in here, because you really need to know how disappointing it is to have a crash of this sort caused by a library function so conceptually simple.
pip freeze:
The text was updated successfully, but these errors were encountered: