Testing Sagas with Real Failure Scenarios

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • MyrinNew
    Senior Member
    • Feb 2024
    • 5175

    #1

    Testing Sagas with Real Failure Scenarios

    In the previous post, I walked through the compensation logic in each service. The code looks clean on paper. But sagas have a lot of moving parts, and bugs tend to hide in the transitions between services, not inside a single service.


    This post covers how I test the saga system: unit tests for each service, orchestrator routing tests, and the edge cases that caught me off guard.


    Testing the Orchestrator Routing

    The orchestrator's state transition table is the most critical piece. If it routes to the wrong topic, the entire saga breaks. I test every (source, status) combination:






    @Test
    void shouldReturnNextTopicGivenValidSourceAndSuccessSta tus() {
    setEvent(PAYMENT_SERVICE.toString(), SUCCESS);

    TopicsEnum topic = sagaExecutionController.getNextTopic(event);

    assertEquals(INVENTORY_SUCCESS, topic);
    }

    @Test
    void shouldReturnFailTopicGivenValidSourceAndFailStatus () {
    setEvent(PAYMENT_SERVICE.toString(), FAIL);

    TopicsEnum topic = sagaExecutionController.getNextTopic(event);

    assertEquals(PRODUCT_VALIDATION_FAIL, topic);
    }

    @Test
    void shouldReturnRollbackTopic() {
    setEvent(PRODUCT_VALIDATION_SERVICE.toString(), ROLLBACK);

    TopicsEnum topic = sagaExecutionController.getNextTopic(event);

    assertEquals(PRODUCT_VALIDATION_FAIL, topic);
    }







    These tests are fast and deterministic. No Kafka, no databases. Just the lookup logic. If someone adds a new service to the saga and forgets to update the table, the test for that (source, status) pair will fail with "Topic not found!"


    Edge Cases in Routing

    Two cases that caught me early on:






    @Test
    void shouldThrowValidationExceptionWhenSourceIsNull() {
    setEvent(null, SUCCESS);

    ValidationException ex = assertThrows(ValidationException.class, () -> {
    sagaExecutionController.getNextTopic(event);
    });

    assertEquals("Source and status must be informed.", ex.getMessage());
    }

    @Test
    void shouldThrowValidationExceptionWhenTopicNotFound() {
    setEvent(PAYMENT_SERVICE.toString(), TIMEOUT);

    ValidationException ex = assertThrows(ValidationException.class, () -> {
    sagaExecutionController.getNextTopic(event);
    });

    assertEquals("Topic not found!", ex.getMessage());
    }







    The TIMEOUT status exists in the enum but has no mapping in the saga table. Without this test, a timeout event would silently disappear. The exception makes it visible immediately.


    Testing the OrchestrationService

    The orchestration layer adds history entries and publishes to Kafka. I mock the producer and verify the correct topic:






    @Test
    void shouldStartSagaSuccessfully() {
    when(sagaExecutionController.getNextTopic(event))
    .thenReturn(TopicsEnum.PRODUCT_VALIDATION_SUCCESS) ;

    orchestrationService.startSaga(event);

    verify(producer).sendEvent(eq("product-validation-success"), eq("{json}"));
    assertEquals("ORCHESTRATOR", event.getSource());
    assertEquals(SUCCESS, event.getStatus());
    assertTrue(event.getEventHistory().stream()
    .anyMatch(h -> h.getMessage().contains("Saga started")));
    }

    @Test
    void shouldFinishSagaWithFailure() {
    orchestrationService.finishSagaFail(event);

    verify(producer).sendEvent(eq("notify-ending"), eq("{json}"));
    assertEquals(FAIL, event.getStatus());
    assertTrue(event.getEventHistory().stream()
    .anyMatch(h -> h.getMessage().contains("with errors")));
    }







    The history assertion is important. It verifies that each step leaves a trace. If a saga fails and the history is empty, debugging becomes guesswork.


    Testing Payment: The Happy and Sad Paths

    The payment-service has the most complex logic. It validates amounts, checks fraud scores, simulates gateway responses, and handles refunds. Here's how I test the main scenarios:


    Payment Success





    @Test
    void shouldRealizePaymentSuccessfully_givenValidOrderAn dAmount() {
    givenNoExistingPayment();
    givenPaymentFound();
    givenJsonSerialization();

    paymentService.realizePayment(event);

    assertEquals(SUCCESS, event.getStatus());
    assertEquals("PAYMENT_SERVICE", event.getSource());
    assertEquals(20.0, event.getOrder().getTotalAmount());
    assertHistoryContains("Payment realized successfully");
    verify(producer).sendEvent("{json}");
    }







    Amount Below Minimum





    @Test
    void shouldRollback_givenAmountIsLessThanMinimum() {
    event = buildEvent(0.0, 1); // unit value = 0.0
    payment = buildPayment(0.0, 1);
    givenNoExistingPayment();
    givenPaymentFound();
    givenJsonSerialization();

    paymentService.realizePayment(event);

    assertEquals(ROLLBACK, event.getStatus());
    assertHistoryContains("minimal amount");
    }







    Duplicate Transaction





    @Test
    void shouldRollback_givenTransactionAlreadyExists() {
    when(paymentRepository.existsByOrderIdAndTransacti onId(any(), any()))
    .thenReturn(true);
    givenJsonSerialization();

    paymentService.realizePayment(event);

    assertEquals(ROLLBACK, event.getStatus());
    assertHistoryContains("transactionId");
    }







    Refund (Compensation)





    @Test
    void shouldRealizeRefund_whenPaymentExists() {
    when(paymentRepository.findByOrderIdAndTransaction Id(any(), any()))
    .thenReturn(Optional.of(payment));
    givenJsonSerialization();

    paymentService.realizeRefund(event);

    assertEquals(FAIL, event.getStatus());
    assertEquals(PaymentStatus.REFUND, payment.getStatus());
    assertHistoryContains("Rollback executed for payment");
    verify(paymentRepository).save(payment);
    }







    Refund Failure (Compensation of the Compensation)

    This is the tricky one. What if the refund itself fails? The payment-service still publishes FAIL so the saga can continue rolling back. It just logs that the refund didn't execute:






    @Test
    void shouldHandleRefundFailureGracefully_whenPaymentNot Found() {
    when(paymentRepository.findByOrderIdAndTransaction Id(any(), any()))
    .thenThrow(new RuntimeException("DB error"));
    givenJsonSerialization();

    paymentService.realizeRefund(event);

    assertEquals(FAIL, event.getStatus());
    assertHistoryContains("Rollback not executed for payment");
    verify(producer).sendEvent("{json}");
    }







    The saga doesn't get stuck. The refund failure is recorded in the history for manual intervention later.


    Testing Inventory Rollback

    The inventory tests follow the same pattern. The interesting case is restoring stock to its previous value:






    @Test
    void shouldRollbackInventorySuccessfully() {
    OrderInventory orderInventory = OrderInventory.builder()
    .inventory(inventory)
    .oldQuantity(10)
    .newQuantity(5)
    .orderId("order-1")
    .transactionId("tx-123")
    .build();

    when(orderInventoryRepository.findByOrderIdAndTran sactionId("order-1", "tx-123"))
    .thenReturn(List.of(orderInventory));

    inventoryService.rollbackInventory(event);

    assertEquals(FAIL, event.getStatus());
    assertEquals(10, inventory.getAvailable()); // restored to old value
    assertHistoryContains("Rollback executed for inventory");
    }







    The oldQuantity was 10, the forward action reduced it to 5, and the rollback restores it to 10. Without the OrderInventory record that saves both values, this rollback would be impossible.


    A Helper That Saves Time

    I use the same assertion helper across all service tests:






    private void assertHistoryContains(String expectedMessage) {
    assertTrue(event.getEventHistory().stream()
    .anyMatch(h -> h.getMessage().toLowerCase()
    .contains(expectedMessage.toLowerCase())),
    "Expected message not found in history: " + expectedMessage);
    }







    This checks that the service added the right message to the event history. Every test verifies both the status AND the history. The status controls the saga flow. The history tells you why.


    What I'd Do Differently

    Looking back, there are a few things I'd add:


    Integration tests with embedded Kafka. The unit tests mock the producer, so they don't catch serialization bugs or topic misconfiguration. An embedded Kafka setup would let me publish a real event and verify the full chain.


    Testcontainers for the databases. The unit tests mock the repositories. A Testcontainers setup with real PostgreSQL and MongoDB would catch schema issues and migration bugs.


    Chaos testing. Kill a service mid-saga and verify recovery. Introduce network delays between services. These are the scenarios that break sagas in production, and they're hard to test with mocks alone.


    These are in the roadmap. For now, the unit tests cover the routing logic and compensation flows well enough to catch regressions.


    Wrapping Up

    The saga orchestrator pattern works because each piece is testable in isolation. The state transition table is a pure function. Each service's forward and compensation logic can be tested with mocked dependencies. The event history gives you a built-in audit trail.


    The full test suite runs in seconds because nothing touches real infrastructure. That's the payoff of keeping the orchestrator stateless and the services decoupled.


    The repo (with all tests): github.com/pedrop3/saga-orchestration







    More...
Working...