Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run a separate celery cluster from the webserver? #3917

Closed
rpwils opened this issue Nov 21, 2017 · 9 comments
Closed

Run a separate celery cluster from the webserver? #3917

rpwils opened this issue Nov 21, 2017 · 9 comments

Comments

@rpwils
Copy link

rpwils commented Nov 21, 2017

Make sure these boxes are checked before submitting your issue - thank you!

  • [ X] I have checked the superset logs for python stacktraces and included it here as text if any
  • [X ] I have reproduced the issue with at least the latest released version of superset
  • [X ] I have checked the issue tracker for the same issue and I haven't found one similar

This is a question -
would it be recommend to run a cluster of Celery workers separate from the web server cluster?

Superset version

0.20.1

Expected results

Actual results

Steps to reproduce

@xrmx
Copy link
Contributor

xrmx commented Nov 21, 2017

Could you please explain what do you mean? celery workers are by definition separated from gunicorn workers.

@rpwils
Copy link
Author

rpwils commented Nov 21, 2017

From the documentation it seems the workers are running in the same servers as gunicorn. In prod environments is it recommended to setup the workers on different nodes, similar to airflow

@uhjish
Copy link

uhjish commented Nov 21, 2017

Yes, as they are otherwise going to be competing for CPU and other resources. In fact, you should probably also separate your Redis or AMQP setup that you're using as your Celery broker. Also, for production, be sure to stick an Nginx in front of your Gunicorns.

@rpwils
Copy link
Author

rpwils commented Nov 22, 2017

Thank you that helps!

@mistercrunch
Copy link
Member

It's designed as two different processes so that you can choose how to set things up in your environment. I recently set up 2 * c4.4xlarge nodes at Lyft to get started and decided to run both celery and the web server on each box. It's nice to have >=2 boxes for high availability. As we get more traffic I may specialize the roles.

@rpwils
Copy link
Author

rpwils commented Nov 22, 2017

We currently have ours on a autoscaling cluster of t2.medium's. Do you know in general what typically utilized more - cpu, memory etc..

@rpwils
Copy link
Author

rpwils commented Nov 26, 2017

We are not having a lot of luck with request going to the celery cluster. we used this in the superset_config for both the webserver cluster and the worker (celery) cluster. Does this look correct.

class CeleryConfig(object):
  BROKER_URL = 'redis://xyz.cache.amazonaws.comcache.amazonaws.com:6379/0'
  CELERY_IMPORTS = ('superset.sql_lab', )
  CELERY_RESULT_BACKEND = 'redis://xyz.cache.amazonaws.com.cache.amazonaws.com:6379/0'
  CELERY_ANNOTATIONS = {'tasks.add': {'rate_limit': '10/s'}}
CELERY_CONFIG = CeleryConfig

from werkzeug.contrib.cache import RedisCache
RESULTS_BACKEND = RedisCache(
    host='xyz.cache.amazonaws.com', port=6379, key_prefix='superset_results')

@mistercrunch
Copy link
Member

Did you set your database as "Allow Async"?

@rpwils
Copy link
Author

rpwils commented Dec 2, 2017

Thanks for the help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants