Asynchronous message handling with RabbitMQ, MongoDB and Python

In this post, we'll go through how to process RabbitMQ messages in an asynchronous way and save them in MongoDB with an asynchronous connection.

Real Case Scenario

We have to build a distributed service, that needs to be scaled quickly and processed fast for better performance. The primary purpose was to migrate data amongst platforms and continuously listen for new updates.

So, the service must consume from multiple channels (platforms), pre-process messages with specific rules and publish them back to other related channels as new updates. All these processes needed to be fast not to overload queues and stay in high performance.

As a solution RabbitMQ covers the most of requirements where it supports asynchronous messaging for internal operations. However, we also need to consider the external part of our code such as connecting channels, declaring queues, processing messages, and database operations to make the run overall process concurrently.

Environment Configurations

First, let's install the required dependencies then we're going to configure docker-compose to run RabbitMQ and MongoDB containers.

requirements.txt

aio-pika==8.1.1
uvloop==0.16.0
ujson
urllib3
motor

There are 2 main packages that we need to consider:

aio-pika is a wrapper for the aiormq for asyncio. It allows concurrent connections to RabbitMQ assets which will increase the performance.
motor is an asynchronous python driver for MongoDB.

Now, create a docker-compose file to run the services:

docker-compose.yml

version: '3'
services:

  mongodb:
    image: mongo
    ports:
      - "27017:27017"
    command: mongod --bind_ip 0.0.0.0

  rabbitmq:
    container_name: rabbitmq
    image: rabbitmq:3-management-alpine
    ports:
        - 5672:5672
        - 15672:15672
    networks:
        - rabbitmq_net

networks:
  rabbitmq_net:

We will not dockerize the actual service since it's much better to debug it through VScode (or another editor) to see the steps in detail. You can create a virtual environment and install the requirements above to run the project locally.

launch.json (for VSCode)

{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Consumer",
            "type": "python",
            "request": "launch",
            "program": "${workspaceRoot}/src/ingest.py",
            "env": { "PYTHONPATH": "${workspaceRoot}"},
            "console": "integratedTerminal",
            "justMyCode": false,
        },

        {
            "name": "Producer",
            "type": "python",
            "request": "launch",
            "program": "${workspaceRoot}/src/publisher_manual.py",
            "env": { "PYTHONPATH": "${workspaceRoot}"},
            "console": "integratedTerminal",
            "justMyCode": false,
        }

    ]
}

Implementation of Consumer

In this part, we will implement the consumer service alongside with helper functions and abstract classes. First, let's start by adding an abstract class of RabbitMQ consumer:

src/utils/abstract_rabbitmq_consumer.py

import asyncio
from abc import ABCMeta, abstractmethod

from aio_pika.message import IncomingMessage
from aio_pika.queue import Queue


async def mark_message_processed(orig_message: IncomingMessage):
    """Notify message broker about message being processed."""
    try:
        await orig_message.ack()
    except Exception:
        await orig_message.nack()
        raise


class RabbitMQConsumer(metaclass=ABCMeta):
    """RabbitMQ consumer abstract class responsible for consuming data from the queue."""

    def __init__(
        self,
        queue: Queue,
        iterator_timeout: int = 5,
        iterator_timeout_sleep: float = 5.0,
        *args,
        **kwargs,
    ):
        """
        Args:
            queue (Queue): aio_pika queue object
            iterator_timeout (int): In seconds.
                The queue iterator raises TimeoutError if no message comes for this time and iterating starts again.
            iterator_timeout_sleep (float): In seconds. Time for sleeping between attempts of iterating.
        """
        self.queue = queue
        self.iterator_timeout = iterator_timeout
        self.iterator_timeout_sleep = iterator_timeout_sleep

        self.consuming_flag = True

    async def consume(self):
        """Consumes data from RabbitMQ queue forever until `stop_consuming()` is called."""
        async with self.queue.iterator(timeout=self.iterator_timeout) as queue_iterator:
            while self.consuming_flag:
                try:
                    async for orig_message in queue_iterator:
                        await self.process_message(orig_message)

                        if not self.consuming_flag:
                            break  # Breaks the queue iterator
                except asyncio.exceptions.TimeoutError:
                    await self.on_finish()
                    if self.consuming_flag:
                        await asyncio.sleep(self.iterator_timeout_sleep)
                finally:
                    await self.on_finish()

    @abstractmethod
    async def process_message(self, orig_message: IncomingMessage):
        raise NotImplementedError()


    def stop_consuming(self):
        """Stops the consuming gracefully"""
        self.consuming_flag = False

    async def on_finish(self):
        """Called after the message consuming finished."""
        pass

The abstract class includes the main consuming part of messages from the queue and processes messages asynchronously. You can take a look at simple consumer structure in aio-pika from the documentation.

Next, we need to add the actual consumer that will inherit from the abstract class.

src/consumer.py

import logging

from aio_pika.message import IncomingMessage
from aio_pika.queue import Queue
from motor.motor_asyncio import AsyncIOMotorClient
from src.utils.rabbitmq_helpers import RabbitMQConsumer

logger = logging.getLogger(__name__)

class Consumer(RabbitMQConsumer):
    def __init__(
            self,
            queue: Queue,
            db_client: AsyncIOMotorClient,
    ):
        super().__init__(
            queue=queue,
            db_client=db_client,
        )

    async def process_message(self, orig_message: IncomingMessage):
        logger.info(orig_message)

The naming conventions here are important based on what the consumer will be used for. You can rename the file name and class name as you want. The consumer class will initialize with two main parameters:

Subscribed queue where the consuming will happen
Asynchronous MongoDB client for crud operations after processing messages.

The process_message function from an abstract class must be overridden for each consumer service.

In best practices, it's recommended to separate DB services from the actual consuming structure. The messages should be processed in the consuming part and the database functions should be in a separate service that only handles DB operations.

Main Ingest Structure

In this section, we'll add the starting point of the service which is the base structure of ingesting mechanism where the consumers and database are going to be initialized.

src/ingest.py

import os
import asyncio
import logging

import aio_pika
from aio_pika.channel import Channel
from aio_pika.queue import Queue
from motor.motor_asyncio import AsyncIOMotorClient
from src.consumer import Consumer

DEFAULT_QUEUE_PARAMETERS = {
    "durable": True,
    "arguments": {
        "x-queue-type": "quorum",
    },
}


loop = asyncio.get_event_loop()

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)


async def _prepare_consumed_queue(channel: Channel) -> Queue:
    if os.environ.get('RABBITMQ_DEAD_LETTER_EXCHANGE_NAME'):
        # auto reroute nacked messages to DLX
        DEFAULT_QUEUE_PARAMETERS["arguments"]["x-dead-letter-exchange"] = os.environ.get('RABBITMQ_DEAD_LETTER_EXCHANGE_NAME')
    queue = await channel.declare_queue(
        os.environ.get('RABBITMQ_QUEUE_NAME'),
        **DEFAULT_QUEUE_PARAMETERS,
    )

    await queue.bind(os.environ.get('RABBITMQ_EXCHANGE_NAME'), "blog.posts.create")

    return queue


async def _prepare_dead_letter_queue(channel: Channel) -> Queue:
    dead_letter_queue: Queue = await channel.declare_queue(
        os.environ.get("RABBITMQ_DEAD_LETTER_QUEUE_NAME"),
        **DEFAULT_QUEUE_PARAMETERS,
    )
    bindings = os.environ.get('RABBITMQ_QUEUE_BINDINGS')
    for routing_key in bindings:
        await dead_letter_queue.bind(os.environ.get('RABBITMQ_DEAD_LETTER_EXCHANGE_NAME'), routing_key)

    return dead_letter_queue


async def main(consumer_class) -> None:
    mongo_db_client = AsyncIOMotorClient({
        'host': "localhost", 
        'port': 27017
    })
    rabbitmq_connection = await aio_pika.connect_robust(
        loop=loop, 
        url="amqp://guest:guest@localhost/"
    )

    try:
        async with rabbitmq_connection.channel() as channel:
            await channel.set_qos(prefetch_count=100)

            if os.environ.get('RABBITMQ_DEAD_LETTER_ENABLED'):
                await _prepare_dead_letter_queue(channel)

            queue = await _prepare_consumed_queue(channel)

            consumer = consumer_class(
                queue=queue,
                db_client=mongo_db_client,
            )

            await consumer.consume()
    finally:
        await rabbitmq_connection.close()


if __name__ == '__main__':
    try:
        asyncio.run(
            main(Consumer)
        )
    except asyncio.CancelledError:
        logger.info('Main task cancelled')
    except Exception:
        logger.exception('Something unexpected happened')
    finally:
        logger.info("Shutdown complete")

Basically, we're using the combination of asyncio, aio-pika and motor to start ingesting process by initializing the rabbitmq assets, databases and consumers.

Sometimes messages can be unprocessable for different reasons, such as invalid body or excessive queue length or due to some internal logic. However, these messages must be processed again at a certain point to prevent data loss.

As a solution, you must create an exchange (type can be topic) that will be used to redirect unprocessable messages to related queues. Then, declare a queue (can be more than one) with {"dead-letter-exchange":"my-dlq-exchnage"} as an extra argument. That will mark the queue to listen from the dead letter exchange.

Once the message is unacknowledged, RabbitMQ will automatically look for dead letter exchange and forward messages to corresponding queues.

Manual Publisher

Finally, we just need a simple script to push some messages to queues.

src/publisher_manual.py

import asyncio
import json

import aio_pika
from bson import json_util



async def main() -> None:
    connection = await aio_pika.connect_robust(
        "amqp://guest:guest@localhost/",
    )

    async with connection:
        routing_key = "blog.posts.create"

        channel = await connection.channel()

        await channel.default_exchange.publish(
            aio_pika.Message(body=json.dumps({"ping": "pong"}, default=json_util.default).encode()),
            routing_key=routing_key,
        )


if __name__ == "__main__":
    asyncio.run(main())

We can run it through VSCode debugging as provided beginning of this post.

Conclusion

Now, we're ready to run the project with the following steps:

Run docker containers (rabbitmq, mongodb)
Run Consumer configuration through VSCode debug and the consumer will start to listen to incoming messages.
Run Producer config to start the publisher manual script for sending messages to queues.

You can use this initial structure for real distributed projects. However, you have to extend the functionalities as your project required.

https://github.com/PylotStuff/rabbitmq-mongodb-python-aiopika/tree/master

Support 🌏

If you feel like you unlocked new skills, please share with your friends and stay connected :)