Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Website crashing due to memory leaks. #91

Open
JrtPec opened this issue Mar 23, 2016 · 5 comments
Open

Website crashing due to memory leaks. #91

JrtPec opened this issue Mar 23, 2016 · 5 comments

Comments

@JrtPec
Copy link
Member

JrtPec commented Mar 23, 2016

Yesterday we succeeded in getting CSV's to generate from TMPO live on the website and send them to the browser. We however noticed that each request uses some memory and fails to free it afterwards. After a few requests the server unavoidably crashes.

We have tried following things to reduce memory load and free it up after the request, however none have really worked.

  1. Write a wrapper to close the file buffer after the request has completed. (link)
  2. Use a temporary file to store the CSV and serve it instead of using StringIO or cStringIO.
  3. Setting the Flask flag app.use_x_sendfile = True, to have nginx serve the file directly instead of the app. (I did not thoroughly test this, not sure of its effect)
  4. Deleting the Pandas DataFrame after the CSV is written, using del df
  5. Calling the garbage collector after the delete: import gc; gc.collect() (link)

Does anybody have other ideas we could try? The download page is live, but hidden under opengrid.be/download. The status quo is that it does work, however after a few runs it will crash the server, which then immediately restarts.

@icarus75
Copy link

Tmpo blocks consist of gzipped json. So why not just put the tmpo blocks directly on the wire and offload the CSV conversion work to the browser? With proper HTTP encoding set, the browser will take care of inflating the gzip.

@JrtPec
Copy link
Member Author

JrtPec commented Mar 23, 2016

We could do it that way, but you could only download raw data that way, right? So people would have to convert epoch timestamps, interpolate data, resample it... while the exact purpose of the csv-download page was to enable non-programmers to import data into excel or something and experiment on their own. I don't know if raw data would be very useful for these people...

@JrtPec
Copy link
Member Author

JrtPec commented Mar 24, 2016

I'm going to try and write a generator that creates small dataframes and streams them, like this

@saroele
Copy link
Member

saroele commented Mar 30, 2017

@JrtPec we discussed this last meeting. What is the status now that our droplet has more memory and swap?

@JrtPec
Copy link
Member Author

JrtPec commented Apr 3, 2017

It seems to be much better, but I can still crash the site when selecting a large time period.
We could put a cap on the time period, or figure out some clever way to call tmpo in chunks and stream the csv in blocks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants