Getting started with Beanstalkd

Introduction to Beanstalkd, a messaging middleware system

Beanstalkd is simple messaging middleware system written in C, which provides the efficient way to solve latency issue when you are designing such system in which time-consuming processes are involved. Generally, these type of queues allows you to store metadata for processing jobs in future. They can support in the development of service-oriented architecture by providing the flexibility to defer tasks to separate processes. When applied correctly, queues can dramatically increase the user experience of websites by reducing load times.

Beanstalkd describes itself as simple, fast work queue. Before dive into what beanstalkd is or how to get start with it, let’s explore few real time issues of web applications.

Imagine you are writing a web application which involves some time-consuming task such as sending bulk email or generating the reports. Think about the interaction part of your web application, where the HTTP request will be timed out because of this time-consuming task? Or if you manage to perform such operation by sending request data into chunks but the user close the browser and some part of request might have completed but few requests are unable to reach the server? These are few of the cases when the system can’t be considered as a mature or well-designed application.

To solve such problems the queue based system is an effective solution. Which reduces not only the making dependent service faster but also helps you in implementing such system which can be debugged easily and scale on the fly.

Further, these implementations still lack few things like each queue jobs will have to wait until a job is completed and obviously such system must be designed with queues in mind.

There are some alternative of beanstalld, such as Amazon SQS and IronMQ which are managed services and you can opt them but Beanstalkd can be hosted on your server and it’s open source.

The service which consumes or process the queued items is called worker or queue-worker, which are the implementation of the time-consuming tasks. Now when the number of jobs will increase, you can run many workers on the same queue and the system can be scaled horizontally. These queue based system will be useful if the worker task(or time-consuming task) can be accomplished asynchronously such as publish/subscribe systems where the requester doesn’t have to worry about the time taken to process their request. But then also performance can be increased by adding more workers.

The key advantage of these queues are-

  • Asynchronous - Queue it now, run it later.
  • Decoupling - Separates application logic.
  • Resilience - Won’t take down your whole application if part of it fails.
  • Redundancy - Can retry jobs if they fail.
  • Guarantees - Makes sure that jobs will be processed.
  • Scalable - Many workers can process individual jobs in a queue.
  • Profiling - Can support in identifying performance issues.

Getting started with beanstalkd

Beanstalkd communicates via PUSH sockets providing instant communication between providers and workers. When a provider enqueues a job, a worker can reserve it immediately if it is connected and ready. Jobs are reserved until a worker has sent a response (delete, bury, etc.). It has client libraries available in all major languages, which can be explored from here. And here are many tools which can help you in managing beanstalkd.

Just like most applications, beanstalkd comes with its own terminology to explain its parts.

  • Tubes/Queues - Beanstalkd Tubes translate to queues from other messaging applications. They are through where jobs (or messages) are transferred to consumers (i.e. workers).
  • Jobs/Messages - Since Beanstalkd is a “work queue”, what’s transferred through tubes are referred as jobs - which are similar to messages being sent.
  • Producers/Senders - Producers, similar to Advanced Message Queuing Protocol’s definition, are applications which create and send a job (or a message). They are to be used by the consumers.
  • Consumers/Receivers - Receivers are different applications of the stack which get a job from the tube, created by a producer for processing.

Installing beanstalkd

On Ubuntu, you can hit sudo apt-get install beanstalkd to install beanstalkd or with brew you can hit brew install beanstalkd. If you are using any other distro or looking for alternate methods you can follow steps from here.

Working with beanstalkd

By default beanstalkd comes up on port 11300, you can connect it by telnetting on this port. In production, it is highly recommended to create a firewall blocking external connections to the port that Beanstalkd is running on. Because there is no authentication required to connect with beanstalkd. Providers can enqueue jobs and workers can reserve jobs without passing through a security model.

Once you have started telnet session, you can check the stats by typing stats. This gives quite a lot of results, which includes how many jobs are waiting/in progress/failed. From this telnet prompt you can look at a bunch of things-

  • list-tubes - shows which tubes are available/in use
  • use [tube] - use a specific tube
  • stats-tube [tube] - get the stats for a single tube
  • peek-ready - shows the next job to be processed in the current tube

When you are done type quit to return to your prompt. Using these commands you can check in with your queue if you need to check the stats.

Like it? Tweet.

Follow me on Twitter

I tweet about tech more than I write about it here 😀

Ritesh Shrivastav