Discussion:
IMPORTANT on cron jobs, scheduled jobs and delayed jobs
Massimo Di Pierro
2011-08-08 14:28:00 UTC
Permalink
## preambole

I have been working on porting django-celery to web2py-celery.
http://code.google.com/p/web2py-celery
There are a few issues to resolve and I am working on it.

Yet I found it to be overkill for most users. It has lots of
dependencies (for example RabbitMQ) and it is not easy to manage. If
you do not need a huge number of worker nodes there may be a better
solution.

So I added this to trunk:

gluon/scheduler.py

This email is a request for comments as I think this should replace te
current cron mechanism.

## What is it?
It is a lightweight replacement for celery that uses the database
instead of queues to schedule tasks and uses the default web2py admin
interface to allow you to schedule tasks. It consists of a single file
and has no dependencies.

## How does it work?

For any existing
app

Create File: app/models/scheduler.py
======
from gluon.scheduler import
Scheduler

def
demo1(*args,**vars):
print 'you passed args=%s and vars=%s' % (args,
vars)
return
'done!'

def
demo2():

1/0

scheduler =
Scheduler(db,dict(demo1=demo1,demo2=demo2))
=====================================

Create File: app/modules/scheduler.py
======
scheduler.worker_loop()
=====================================

## run worker nodes
with:
python web2py.py -S app -M -N -R applications/app/modules/
scheduler.py

## schedule jobs
using
http://127.0.0.1:8000/scheduler/appadmin/insert/db/task_scheduled

## monitor scheduled
jobs
http://127.0.0.1:8000/scheduler/appadmin/select/db?query=db.task_scheduled.id%3E0

## view completed
jobs
http://127.0.0.1:8000/scheduler/appadmin/select/db?query=db.task_run.id%3E0

Compared to celery it lacks the ability to bind tasks and workers ,
remotely interrupt tasks and set timeout, yet these features can be
added easily and I will so eventually.

Please let me know what you think.

Massimo
Fran
2011-08-08 14:59:47 UTC
Permalink
Looks very interesting :)

This appears to be an easier way to meet most of the needs for which I was
investigating Celery, since I'm not looking at massively-scalable systems,
but rather just a way to have longer-running requests (e.g. onaccepts)
pushed asynchronously to givers users a more responsive system & avoid
browser timeouts. & also want a way to easily build a graphical scheduler
which works even on Win32 service mode (which the current cron doesn't)
It has lots of dependencies (for example RabbitMQ)
This is a little unfair - whilst RabbitMQ is the recommended Broker for
Production systems, there are many other supported & porting django-kombu to
web2py-kombu shouldn't be hard.
Compared to celery it lacks the ability to bind tasks and workers ,
This isn't important to me
remotely interrupt tasks and set timeout
These would indeed be useful in time :)
Please let me know what you think.
I'll give it a try :)

Many thanks,
Fran.
Massimo Di Pierro
2011-08-08 15:55:50 UTC
Permalink
Post by Fran
 It has lots of dependencies (for example RabbitMQ)
This is a little unfair - whilst RabbitMQ is the recommended Broker for
Production systems, there are many other supported & porting django-kombu to
web2py-kombu shouldn't be hard.
I agree.
Our private discussions on the topic helped me understand better how
celery works and what the needs of some users are.
celery is great and django-celery ()ported to web2py) is fantastic
too. Yet I found out that IF the communication between broker and
workers is not a bottle neck (and often is not) and if you want to use
django (or our case web2py) to schedule tasks, tasks have to be stored
in database. So for small number of workers it is not a big overhead
if they pick the tasks directly form the database. This is an enormous
simplification.

Yet is it so much simpler that it may actually scale well for many
practical apps where you may have a few tasks/minute or less.

The db used for the scheduler does not need to be the same used as
main db thus reducing the load.

Massimo
Ross Peoples
2011-08-08 15:00:32 UTC
Permalink
I certainly like the minimalistic one file approach. Talking to the tasks
seems like a pretty important feature. I would certainly love to see this in
action. Will the scheduler be started automatically, like the built-in cron,
or would you need to start it manually (like from external cron)?
Massimo Di Pierro
2011-08-08 15:37:26 UTC
Permalink
P.S. I added task groups and I added task timeout
Post by Ross Peoples
I certainly like the minimalistic one file approach. Talking to the tasks
seems like a pretty important feature. I would certainly love to see this in
action. Will the scheduler be started automatically, like the built-in cron,
or would you need to start it manually (like from external cron)?
currently, my understanding is, that celery does not allow talking to
the tasks, only stopping and killing tem. This is implemented using OS
signals. It is possible to add this feature in the current
implementation by sending and catching os signals. I will wait that
this is tested more before I implement it.

Massimo
pbreit
2011-08-08 16:55:43 UTC
Permalink
I definitely like the idea of something simpler. Even though Celery is
pitched as somewhat easy, I could never make heads or tails of it. I look
forward to giving this a try.

What are the web2py dependencies? Do you foresee bundling DAL and whatever
to make it standalone?

Is SQLite a reasonable DB or will this likely need something that works
better with concurrency?

What is the mechanism to start the scheduler, start on reboot, monitor it,
etc?
Ross Peoples
2011-08-08 17:04:35 UTC
Permalink
Great questions from pbreit. I also have a quick one: Would it be easy for
plugins to use the Scheduler? Imagine how many cool plugins would be
possible (mail queues, session cleanups, statistical analysis, image
manipulation, etc.) with a standardized web2py scheduler.
Massimo Di Pierro
2011-08-08 22:33:50 UTC
Permalink
Post by pbreit
I definitely like the idea of something simpler. Even though Celery is
pitched as somewhat easy, I could never make heads or tails of it. I look
forward to giving this a try.
What are the web2py dependencies? Do you foresee bundling DAL and whatever
to make it standalone?
In only needs dal.py and globals.py. Could be used standalone, it
would need a main() and I may build it later today. Should not take
much.
Post by pbreit
Is SQLite a reasonable DB or will this likely need something that works
better with concurrency?
If running the tasks take longer than retrieving the task (as it
should be else there is no reason for using this), the db access is
not an issue.
Post by pbreit
What is the mechanism to start the scheduler, start on reboot, monitor it,
etc?
You just need to start web2py and start the background process. There
is nothing else to do. There are some difference from celery.

In celery the celerybeat deamon pushes tasks to the celeryd services
(workers). In gluon/scheduler.py the background processes (workers)
pull the tasks from the database. There is no deamon dealing with
scheduling.

There are three tables.
* task_scheduler stores a list of task, when you want to run
(next_run_time), how ofter (perdiod), how many times (repeats,
times_run), within what time frame (start_time, stop_time), the max
timeout, etc.
* task_run stores the output of each task run. One task_scheduled with
repeats=10 will generate 10 task_run records.
* worker_heartbeat stores the heartbeat of the workers, i.e. the time
when they poll for tasks. Each task_schedule can be:
- queued (waiting to be picked up)
- running (task was picked by a worker)
- completed (was run as many times as requested)
- failed (the task failed and will not be run again)
- overdue (the task has not reported, probably a worker has died in
the middle of it, should not happen under normal conditions.)
A task that does not fail and is schedule to run 3 times will go
through:
queued -> running -> queued -> running -> queued -> running ->
completed
They only run if they are queued and are due to run.

This design allows you to do what you to do what you normally with
cron but with some differences:
- cron is at the web2py level; gluon/scheduler.py is at the app level
(although some apps may share a scheduler)
- cron spawns a process for each task and this created problems to
some users. gluon/scheduler.py runs tasks sequentially in a fixed
number of processes (one in the example).
- tasks can be managed from the admin interface (schedule, start,
stop, restart, change input, read output, etc).
- the same task cannot overlap with itself therefore it is easier to
manage
- tasks are not executed exactly when due, but as close as possible,
in a FIFO order based on the requested schedule and the workload and
resources available. More like celery than cron.

Hope this makes sense.

gluon/scheduler.py is 170 lines of code and you may want to take a
look at what it does.

Massimo
Marin Pranjic
2011-08-08 17:12:08 UTC
Permalink
It looks good.
How to add a task which will repeat infinite times?
What are Start time and Stop time used for? Just to clarify...

On Mon, Aug 8, 2011 at 4:28 PM, Massimo Di Pierro
Post by Massimo Di Pierro
Please let me know what you think.
Massimo
Massimo Di Pierro
2011-08-08 22:34:34 UTC
Permalink
for now just set an insame large value for repeats. We could agree to
use -1.
Post by Marin Pranjic
It looks good.
How to add a task which will repeat infinite times?
What are Start time and Stop time used for? Just to clarify...
On Mon, Aug 8, 2011 at 4:28 PM, Massimo Di Pierro
Post by Massimo Di Pierro
Please let me know what you think.
Massimo
David J
2011-08-08 22:46:29 UTC
Permalink
How do use this from uwsgi? Currently I am launching the web2py app using
the web2py uswgi handler
Post by Massimo Di Pierro
for now just set an insame large value for repeats. We could agree to
use -1.
Post by Marin Pranjic
It looks good.
How to add a task which will repeat infinite times?
What are Start time and Stop time used for? Just to clarify...
On Mon, Aug 8, 2011 at 4:28 PM, Massimo Di Pierro
Post by Massimo Di Pierro
Please let me know what you think.
Massimo
Massimo Di Pierro
2011-08-09 00:12:32 UTC
Permalink
All you need is to follow the example and run the worker task (the
background task).

Whatever web server you use, your web2py apps will be able to queue
and schedule tasks using admin.

BTW... download it again as I added some stuff.
Post by David J
How do use this from uwsgi? Currently I am launching the web2py app using
the web2py uswgi handler
Post by Massimo Di Pierro
for now just set an insame large value for repeats. We could agree to
use -1.
Post by Marin Pranjic
It looks good.
How to add a task which will repeat infinite times?
What are Start time and Stop time used for? Just to clarify...
On Mon, Aug 8, 2011 at 4:28 PM, Massimo Di Pierro
Post by Massimo Di Pierro
Please let me know what you think.
Massimo
Andrew
2011-08-09 02:07:46 UTC
Permalink
I don't know anything about Celery, but I am interested in scheduling
functionality for web2py. When you mention it doesn't have
dependencies, I assume you mean the code itself.

One of my next tasks was to look at open source schedulers,
particularly the ability to do Job/tasks dependencies, in that a
"start" job is kicked off by the cron, that then initiates a stream of
work with multiple jobs, sometimes parallel, sometiems serial. Is
this possible ?

I see a database driven approach as a good thing. It may perhaps be a
little slower but the scheduler overhead can be such a small component
of the overall workload. (My target app is Data Integration).

Would love to know more and to give it a try.

thanks
Andrew
Post by Massimo Di Pierro
All you need is to follow the example and run the worker task (the
background task).
Whatever web server you use, your web2py apps will be able to queue
and schedule tasks using admin.
BTW... download it again as I added some stuff.
Post by David J
How do use this from uwsgi? Currently I am launching the web2py app using
the web2py uswgi handler
Post by Massimo Di Pierro
for now just set an insame large value for repeats. We could agree to
use -1.
Post by Marin Pranjic
It looks good.
How to add a task which will repeat infinite times?
What are Start time and Stop time used for? Just to clarify...
On Mon, Aug 8, 2011 at 4:28 PM, Massimo Di Pierro
Post by Massimo Di Pierro
Please let me know what you think.
Massimo- Hide quoted text -
- Show quoted text -
Massimo Di Pierro
2011-08-09 07:50:12 UTC
Permalink
Post by Andrew
I don't know anything about Celery, but I am interested in scheduling
functionality for web2py.  When you mention it doesn't have
dependencies, I assume you mean the code itself.
One of my next tasks was to look at open source schedulers,
particularly the ability to do Job/tasks dependencies, in that a
"start" job is kicked off by the cron, that then initiates a stream of
work with multiple jobs, sometimes parallel, sometiems serial.   Is
this possible ?
Yes. Basically that is what it is for.
You create a record task_scheduled which tells web2py which function
you want to run, its arguments in json, and when it should runs (once,
twice, 1000 times, starting now, starting later, every day, etc.) and
worker nodes will pick up the tasks when scheduled and do it. Right
now it uses admin as a web interface and it works fine but we could
come up with something sleeker.

It is really easy to try. I will try make a vide about it.
Post by Andrew
I see a database driven approach as a good thing.  It may perhaps be a
little slower but the scheduler overhead can be such a small component
of the overall workload.  (My target app is Data Integration).
Would love to know more and to give it a try.
thanks
Andrew
Post by Massimo Di Pierro
All you need is to follow the example and run the worker task (the
background task).
Whatever web server you use, your web2py apps will be able to queue
and schedule tasks using admin.
BTW... download it again as I added some stuff.
Post by David J
How do use this from uwsgi? Currently I am launching the web2py app using
the web2py uswgi handler
Post by Massimo Di Pierro
for now just set an insame large value for repeats. We could agree to
use -1.