This article continues the series on building a continuous deployment environment using Python and Django.
Simply put, processes can be run asynchronously and distributed, instead of by the main app. This allows complex calculation, heavy data processing, or third-party services to all be run without blocking the main Django/Python app. And when run on a remote server, without using resources of the main Django app. Celery is a Python project, but there is an app,
Install
a technology to manage the Celery queue (RabbitMQ is recommended). Full
RabbitMQ installation instructions are available at http://www.rabbitmq.com/install.html.
To install RabbitMQ on debian/ubuntu:
RabbitMQ will start automatically upon installation. To start/stop RabbitMQ manually on debina/ubuntu:
On most systems the log file for RabbitMQ can be found at
To install RabbitMQ on OSX (this will take a long time):
RabbitMQ will be installed to
To install Celery for Django:
Don't use any periods in the
Create your first task,
Create a configuration file to run your Celery task,
If RabbitMQ is running remotely, change
Now to test, start Celery (it will print to the console):
In another terminal, open the Python shell and run:
You should see something like the following in the terminal window running Celery:
To run celery as a daemon process on debian/ubuntu, install https://github.com/ask/celery/tree/master/contrib/generic-init.d/, and run:
To configure your process, edit
For deamon scripts on other operating systems and for more information on configuration, see Celery Daemonizing.
To start add
To create the necessary database tables:
For those using mod_wsgi, add the following to your
Celery will automatically look for files named
There is some additional information available at http://ask.github.com/django-celery/.
Lastly, start a process to snapshot the workers, so you can use Django to monitor celery:
More information on monitoring is available at Celery Monitoring.
To create a Celery task, first create a file named
The Celery worker process will log all activity, so monitor it for errors (usually
The
This should provide enough information for you to start using Celery in your own projects. Feel free to leave any questions you may have.
This post is an acknowledgement to: http://mattsnider.com/using-celery-to-handle-asynchronous-processes/
- Starting Your First Django Project
- Testing and Django
- Mock and Coverage
- Using Fabric for Painless Scripting
- Using Celery to handle Asynchronous Processes
- Deployment/Monitoring Strategies
is an open source asynchronous task queue/job queue based on distributed message passing. It is focused on real-time operation, but supports scheduling as well.
Simply put, processes can be run asynchronously and distributed, instead of by the main app. This allows complex calculation, heavy data processing, or third-party services to all be run without blocking the main Django/Python app. And when run on a remote server, without using resources of the main Django app. Celery is a Python project, but there is an app,
django-celery, which plugs into Django.
Getting ready
To install Celery, enter your virtualenv and call:To install RabbitMQ on debian/ubuntu:
$ sudo
apt-get
install
rabbitmq-server
$ invoke-rc.d rabbitmq-server start
$ invoke-rc.d rabbitmq-server stop
/var/log/rabbitmq/rabbit.log.
To install RabbitMQ on OSX (this will take a long time):
$ sudo brew install rabbitmq |
RabbitMQ will be installed to
/usr/local/sbin, so add this directory to your PATH, if you haven't already done so.To start/stop RabbitMQ manually on OSX:
$ sudo rabbitmq-server rabbitmqctl stop |
$ pip install django-celery |
How do it…
Setup RabbitMQ for use with Celery:{CELERY_VHOST}or Celery won't be able to connect to RabbitMQ. If the server cannot connect on OSX, see Broker Installation for additional setup information.
Create your first task,
task.py:
from celery.task import task @task def add(x, y): return x + y |
celeryconfig.py:
BROKER_HOST = "localhost" BROKER_PORT = 5672 BROKER_USER = "{CELERY_USER}" BROKER_PASSWORD = "{CELERY_USER_PASS}" BROKER_VHOST = "{CELERY_VHOST}" CELERY_RESULT_BACKEND = "amqp" CELERY_IMPORTS = ( "tasks" , ) |
If RabbitMQ is running remotely, change
localhostto the name of the remote server.
Now to test, start Celery (it will print to the console):
$ celeryd - - loglevel = INF$ O |
from tasks import add from celery.execute import send_task result = add.apply_async(args = [ 2 , 2 ], kwargs = {}) print result.get() result = send_task( 'task.add' , [ 3 , 3 ]) print result.get() result = add.delay( 4 , 4 ) |
[2011-08-31 23:43:44,242: INFO /MainProcess ] Got task from broker: tasks.add[f5f5ee81-fef5-46d2-87de-0da005d588d0] [2011-08-31 23:43:44,294: INFO /MainProcess ] Task tasks.add[f5f5ee81-fef5-46d2-87de-0da005d588d0] succeeded in 0.0111658573151s: 4 [2011-08-31 23:43:44,301: INFO /MainProcess ] Got task from broker: tasks.add[cd2de0d1-35ad-4d7a-8212-47e800cd85bc] [2011-08-31 23:43:44,298: INFO /MainProcess ] Task tasks.add[cd2de0d1-35ad-4d7a-8212-47e800cd85bc] succeeded in 0.0115258693695s: 6 [2011-08-31 23:43:44,301: INFO /MainProcess ] Got task from broker: tasks.add[cd2de0d1-35ad-4d7a-8212-47e900cd85bc] [2011-08-31 23:43:44,329: INFO /MainProcess ] Task tasks.add[cd2de0d1-35ad-4d7a-8212-47e900cd85bc] succeeded in 0.0115258693695s: 8 |
$ /etc/init .d /celeryd {start|stop|restart|status} |
/etc/default/celeryd:
#
Name of nodes to start # here we have a single node CELERYD_NODES="w1" #
or we could have three nodes: #CELERYD_NODES="w1 w2 w3" # Where to chdir at start. CELERYD_CHDIR="/opt/Myproject/" # Extra arguments to celeryd CELERYD_OPTS="-time-limit=300 -concurrency=8" # Name of the celery config module. CELERY_CONFIG_MODULE="celeryconfig" # %n will be replaced with the nodename. CELERYD_LOG_FILE="/var/log/celery/%n.log" CELERYD_PID_FILE="/var/run/celery/%n.pid" # Workers should run as an unprivileged user. CELERYD_USER="celery" CELERYD_GROUP="celery" |
Django-Celery
If you usedjango-celery, you won't need the
celeryconfig.pyfile, as the Celery configuration will live in the project's
settings.py.
To start add
djcelery
to INSTALLED_APPS
, then add the following to settings.py:
import djcelery djcelery.setup_loader() CELERYBEAT_SCHEDULER = "djcelery.schedulers.DatabaseScheduler" CELERY_RESULT_BACKEND = "amqp" BROKER_HOST = "localhost" BROKER_PORT = 5672 BROKER_USER = "{CELERY_USER}" BROKER_PASSWORD = "{CELERY_USER_PASS}" BROKER_VHOST = "{CELERY_VHOST}" |
$ python manage.py syncdb |
*.wsgifile:
import os os.environ[ "CELERY_LOADER" ] = "django" |
tasks.pyin other installed apps and process them accordingly. Another great use for Celery are periodic tasks:
from datetime import timedelta from celery.decorators import periodic_task @periodic_task (run_every = timedelta(days = 1 )) def update_users(): aggregate_user_data() # not defined here, just an example |
Lastly, start a process to snapshot the workers, so you can use Django to monitor celery:
$ nohup python manage.py celerycam & |
How it works…
Celery uses tasks that are executed concurrently on one or more workers. Tasks can execute asynchronously or synchronously (result = task.get()
or task.wait()
).
A third party service is used for managing queueing of tasks and
storing of task results. RabbitMQ is recommended for the queue, although
many DB technologies are supported, and I prefer AMQP or Redis for the
results backend.To create a Celery task, first create a file named
tasks.pyand then add task functions there. Decorate each function that will be explicitly called with
@task
and functions that execute periodically with @periodic_task(run_every=timedelta({FREQUENCY}))
. Tasks, including periodic tasks, can be called explicity using mytask.delay(*args, **kwargs)
. This returns a result
object, which can be used to lock the current thread until the task is completed using result.wait()
or response = result.get()
.The Celery worker process will log all activity, so monitor it for errors (usually
/var/logs/celeryd.log). Try to make sure task functions gracefully handle errors, so that no data is ever lost.
The
celery-djangopackage manges your Celery commands and configuration, adds an admin tool, and discovers
tasks.pyautomatically in all your apps. To monitor the Celery workers via Django start the
celerycamprocess, which will take periodic snapshots of the workers and write to the
djcelery_taskstatetable.
This should provide enough information for you to start using Celery in your own projects. Feel free to leave any questions you may have.
There’s more…
Much of this article was inspired by the Celery Documentation. I recommend starting there is you have any questions. For additional information:This post is an acknowledgement to: http://mattsnider.com/using-celery-to-handle-asynchronous-processes/
No comments:
Post a Comment