Multi-Threaded Libevent Server Example

Recently I had a need to write a socket server in C. In the past I’ve done lots of these in Java, and some in C. Being a huge fan and avid user of memcached, and knowing that memcached uses libevent at its core, I decided to go the libevent route this time. So I looked for some examples. I found cliserver and echoserver, both of which were helpful (I apologize for not providing links, as the links I had before are now dead). So, I went about implementing my server using these two as examples for how to put libevent to work dispatching events and doing non-blocking I/O for me. So far, so good.

Libevent is a nice library for handling and dispatching events, as well as doing nonblocking I/O. This is fine, except that it is basically single-threaded — which means that if you have multiple CPUs or a CPU with hyperthreading, you’re really under-utilizing the CPU resources available to your server application because your event pump is running in a single thread and therefore can only use one CPU core at a time.

The solution is to create one libevent event queue (AKA event_base) per active connection, each with its own event pump thread. This project does exactly that, giving you everything you need to write high-performance, multi-threaded, libevent-based socket servers.

There are mentionings of running libevent in a multithreaded implementation, however it is very difficult (if not impossible) to find working implementations. So, I went about developing a working implementation of a multi-threaded, libevent-based socket server.

The overall architecture is a standard, multi-threaded socket server. The main thread listens on a socket and accepts new connections, then farms the actual handling of those connections out to a pool of worker threads. Each connection has its own isolated event queue and runs on a single worker thread.

You can download the code from the GitHub project page here:

The server itself simply echoes whatever you send to it. Start it up, then telnet to it:
telnet localhost 5555
Everything you type should be echoed back to you.

One advantage of using libevent with a handful of event pump threads, is that in many cases you don’t need hundereds or thousands of threads to achieve good performance under load. In theory, for maximum performance, the number of worker threads should be set to the number of CPU cores available. Feel free to experiment with this. If your request processing code tends to block while waiting for database or other blocking operations to complete, you’ll need hundereds or thousands of threads to achieve good performance at speed, because when a thread blocks, it isn’t doing any work. But if your request processing is more of a compute-intensive task which does not block while waiting for external operations or resources, matching the number of threads to the number of CPU cores should theoretically give you optimum performance.

Also note that the server includes a multithreaded work queue implementation, which can be re-used for other purposes.

Since the code is BSD licensed, you are free to use the source code however you wish, either in whole or in part.

Some inspiration and coding ideas came from echoserver and cliserver, both of which are single-threaded, libevent-based servers.

Multi-Threaded Libevent Server Example:

21 responses to “Multi-Threaded Libevent Server Example”

  1. When I compile your multi-threaded-libevent-server, I got following error.
    Could help me to check this erorr. Thank you.
    $ gcc -o echoserver_threaded echoserver_threaded.c workqueue.c -levent -lpthread
    echoserver_threaded.c: In function ‘buffered_on_read’:
    echoserver_threaded.c:129:19: error: dereferencing pointer to incomplete type
    echoserver_threaded.c:131:23: error: dereferencing pointer to incomplete type
    echoserver_threaded.c:131:56: error: dereferencing pointer to incomplete type
    workqueue.c: In function ‘workqueue_init’:
    workqueue.c:74:3: warning: passing argument 3 of ‘pthread_create’ from incompatible pointer type [enabled by default]
    /usr/include/pthread.h:225:12: note: expected ‘void * (*)(void *)’ but argument is of type ‘int (*)(struct workqueue_t *, int)’

    • That’s very strange. I’ve compiled on both CentOS 64 bit and Ubuntu 64 bit, and never encountered any errors. Can you please post the details of your OS?

          • Hi guys,

            you can modify the while loop as following:

            do {
            nbytes = evbuffer_remove(bev->input, data, 4096);
            if (nbytes == -1) break;
            evbuffer_add(client->output_buffer, data, nbytes);
            if (nbytes < 4096) break;

            It should work.

          • Are you seeing a bug? I’m using this code in a high-volume server and it’s been working fine as-is for over two years. If you’re seeing a problem, please provide code to reproduce it and I’ll see if I can fix it. Should be good as-is though.

  2. Even I face same issue. here are the details. Thanks in advance!

    /Tools/libevent-thread$ gcc -o echoserver_threaded echoserver_threaded.c workqueue.c -levent -lpthread
    echoserver_threaded.c: In function ‘buffered_on_read’:
    echoserver_threaded.c:132:19: error: dereferencing pointer to incomplete type
    echoserver_threaded.c:134:23: error: dereferencing pointer to incomplete type
    echoserver_threaded.c:134:56: error: dereferencing pointer to incomplete type

    /Tools/libevent-thread$ lsb_release -a
    No LSB modules are available.
    Distributor ID: Ubuntu
    Description: Ubuntu 11.10
    Release: 11.10
    Codename: oneiric

  3. You don’t wait for the created threads to finish, or i am missing something… ? no ptherad_exit(). The workqueue.jobs_mutex, and workqueue.jobs_cond shouldn’t be destroyed as well when the program ends?

    First, i got those errors too, i am using libevent 2.0.21-stable. For getting data from socket, i used bufferevent_read(), for sending data i used bufferevent_write().

    • event_base_dispatch(evbase_accept) should be blocking until the server is terminated by calling killServer(), which is called when the process receives a signal. I just pkill -9 the thing when I want it to die.

  4. Forgot one little thing… .

    while (worker->workqueue->waiting_jobs == NULL) {
    pthread_cond_wait(&worker->workqueue->jobs_cond, &worker->workqueue->jobs_mutex);

    You should check for worker->terminate inside this loop, otherwise it gets stuck here. This is observable when you terminate the program using C-c and the jobs linked list is empty.

    Quite nice and useful piece of code though.

    • The workqueue_shutdown() function should be taking care of this by setting all workers to terminate, removing all jobs, then notifying all workers by calling pthread_cond_broadcast(&workqueue->jobs_cond).

      void workqueue_shutdown(workqueue_t *workqueue) {
      worker_t *worker = NULL;

      /* Set all workers to terminate. */
      for (worker = workqueue->workers; worker != NULL; worker = worker->next) {
      worker->terminate = 1;

      /* Remove all workers and jobs from the work queue.
      * wake up all workers so that they will terminate. */
      workqueue->workers = NULL;
      workqueue->waiting_jobs = NULL;

      • I think it is really a problem.

        As your code presents, in the workqueue_shutdown() function, you want`workqueue->waiting_jobs = NULL;` and notify the work threads by `pthread_cond_broadcast(&workqueue->jobs_cond);` but i think it can’t be seccessful actually. Because if the work thread is blocked in `pthread_cond_wait()`,when he recved the broadcast signal, he will wake up and lock the mutex (if we do not consider the contend), then he will check the `woker->workqueue->waiting_jobs == null`again, and it is not satisfied, so he will blocked by `pthread_cond_wait()` again. And you can’t shutdown the worker thread as you wish.

        So I think if we test if the `worker->terminate == 1` in the loop of `while (worker->workqueue->waiting_jobs == NULL)` as the adrian said, it will solve the problem.

        Thanks for your code anyway 🙂
        If there is something wrong with my thought, please let me kown.

        • I believe you are correct. I’ve added an extra check for the terminate boolean, so it should be fixed now. I committed the change into the git repo, and also created a new release download.

          I’m noticing that newer versions of libevent have multi-threading support built in. Not all Linux distros have these newer libevent versions though. So…the usefulness of my little example may be diminishing as time goes by. All I can say for sure is that it works for me, at least in the limited (but high volume) project in which I used it.

  5. Thanks for the example code. I try to learn libevent and am still a complete noob.

    However, I don’t get why there is an event_base per client? Doesn’t that mean that you can only deal ‘number of threads’ clients in parallel? Because ‘event_base_dispatch’ is blocking until the client is done. You use something like libevent to not have that problem, right? Or do I understand the code in a wrong way?

    Wouldn’t having an event_base per thread and then assigning threads to clients be better? Or even better get all events on a single thread and give all the work to the work queue (or does this not work because libevent is not thread safe?)

    Thanks and regards

    • The events are asynchronous and non-blocking. The idea is that you initiate your processing and then immediately return. When the processing completes, you get a callback, then you populate a buffer for the response and queue it up to be asynchronously delivered to the client. This is the point of asynchronous I/O and libevent: to eliminate the overhead of hundreds or thousands of thread (and the costly context switches which go with them) and to achieve higher levels of throughput.

      If you’ve ever used Node.js, the asynchronous I/O model should be very familiar. Node.js does nearly everything asynchronously.

      For example, let’s say you had a server which takes in some parameters in the request, queries MySQL, and then returns some data to the client. Maybe a header, a bit of data per row returned from the query, and a footer containing some totals. With the asynchronous model, when you get the data-ready callback, you queue up the header to be sent back to the client, then you initiate the MySQL query and immediately return. As rows come back from the MySQL query, you get callbacks to another function which you provided when you initiated the query. In each of those callbacks, you take the row data, format it, and queue it up to be sent back to the client. When the query completes, you get a callback to another function. In that function, you send out the footer with totals, then end the response and return. At that point, all that’s left is for the asynchronous I/O to finish draining the buffers you queued up containing the response to the client, then close the connection.

      This is an over-simplification, I realize. But it serves as an illustration of how a server works when using asynchronous I/O in an event-based model. Because everything is asynchronous and nothing ever blocks, you need far fewer threads. Because you save the expensive thread context switches, you get higher concurrency and throughput than with a threaded server.

  6. I’m studying libevent. And I believe the example is what I’m looking for to start with. Thanks, Ron. This is really awesome.

    During my test I found trivial an issue of cleanup. Hopes them helps. Looks like the mutex unlock operation is missing around line #41 (the check of worker specific variable ‘terminate’) of workqueue.c: in function worker_function. Maybe it’s bit more graceful if the mutex is released before the thread quits, so as to give other threads the chance to do the clean up as well. And also event_base_loopexit prevents every thread from getting cleaned up.

    • Thanks for the feedback!

      I’ve added code to release the mutex before exiting the worker thread’s main loop.

      Also, it should be noted that at the time when I wrote this, the multi-threaded version of libevent was only available on certain Linux distros. By now (over three years later), it’s probably everywhere. So this little project may be completely irrelevant at this time.

      Also, if you’re at all familiar with JavaScript, I highly recommend writing your libevent-like server projects in Node.js. It’s so much easier and faster to get things going. The non-blocking, event-based callback architecture is built into every aspect of Node, so you end up writing a lot less code than you would when using libevent directly.

  7. Ron, Have a question on concurrency of connections being handled at any point.

    The number of connections handled at any point is equal to number of threads created. Consider the scenario when there are more connections than the number of threads and each of the connection stays active for long duration, for this scenario what will you change to avoid starving of connections waiting to receive a worker thread?

    • Robert, that is an excellent question. With event-based server programming, you use non-blocking I/O, and any processing which requires blocking for a long time would generally be handled in its own thread or in a separate process.

      So, when you’re not doing I/O, each libevent thread is free to service other requests. That’s the beauty of event-driven/non-blocking server development. By using non-blocking I/O and an event-driven architegure, you eliminate the need for kernel context switching which is required in a one-thread-per-request or one-process-per-request architecture such as Apache. As a result, you can service more simultaneous requests with less hardware, as long as the requests don’t require a lot of computing to service.

      If you’re going to perform a regression on 20 years of daily stock quotes, and try to do that inside an event-driven server architecture such as libevent or Node.js, you’ll want to either outsource the calculations to something external, or break it into short-running steps where each step does a bit of processing and then initiates some I/O request which triggers a callback when the data is ready. The callback then executes the next chunk, which triggers another callback, and so on, until the request is satisfied.

      By writing everything in a non-blocking fashion, you eliminate the kernel overhead required for context switching in either a multi-threaded or multi-process server architecture. The gain in throughput is significant.

      Node.js even has a non-blocking MySQL driver. So, at any point in servicing your request, you can issue a query to MySQL and return immediately. You provide it with callbacks which it calls under various conditions (error callback, result row available callback [which gets called for each row in the result set], query complete callback). In each of those callbacks, you do a bit of processing and (optionally) trigger a callback which initiates the next phase of the request processing, then return immediately (freeing up the CPU to service other requests). When every step of the request is complete, you output the response and end the request.

      This model, when implemented properly, makes it easy to handle far more concurrent requests than in a multi-threaded or multi-process architecture.

      The reason for the multiple event pump threads when using libevent, is to be able to take advantage of multiple CPU cores. So, you’d probably never want to create more event pump threads than you have CPU cores. If you do, then you re-introduce the context switching expense, and your throughput could suffer.

      Hope that helps!

  8. Great example. Thanks. I read the complete libevent documentation but was not clear on how use it a multithreaded setup for a server. Thanks a million.

    • Glad it was helpful for you. I read long ago that the latest libevent supports multiple event-pump threads. Not sure whether they ever got that working well though.

      The original project for which I developed this example, I migrated to Node.js, and haven’t looked back. For me, it’s so much easier/faster for developing event-based server daemons, since you don’t have to re-compile each time you make a change, or do your own memory management as with C/C++.

Leave a Reply

Your email address will not be published.