Transaction Management: Making ACID Real

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • MyrinNew
    Senior Member
    • Feb 2024
    • 5168

    #1

    Transaction Management: Making ACID Real

    Hello, I'm Maneshwar. I'm working on FreeDevTools online currently building **one place for all dev tools, cheat codes, and TLDRs* — a free, open-source hub where developers can quickly find and use tools without any hassle of searching all over the internet.*


    In the previous post, we established what transactions are and why they are unavoidable in critical writes.

    We treated transactions as logical units of consistency and introduced ACID as the contract a DBMS promises to uphold.


    Now we go one level deeper.


    Knowing that transactions exist is not enough. The real challenge is how a DBMS actually manages them, especially when applications run concurrently and failures are inevitable.


    This is where transaction management begins.


    Transactions vs Processes: A Crucial Separation

    Applications run as processes or threads, managed by the operating system. But transactions do not, this distinction is fundamental.


    The operating system:
    • Understands processes and threads
    • Tracks them using process IDs, descriptors, and kernel data structures
    • Schedules CPU time and manages memory


    But the operating system:
    • Does not understand transactions
    • Does not know which database operations belong together
    • Cannot enforce Atomicity, Isolation, or Durability


    From the OS’s perspective, a database write is just another instruction.


    From the DBMS’s perspective, it may be one step in a much larger logical operation that must either fully succeed or fully disappear.


    That responsibility belongs entirely to the DBMS.


    Transactions as DBMS-Level Work Units

    In a DBMS:
    • Transactions are the true units of work
    • Not processes
    • Not threads


    A single process or thread can:
    • Execute multiple transactions
    • Even have multiple active transactions concurrently
    • Access the same or different databases


    To manage this, the DBMS assigns:
    • A transaction identifier
    • A transaction descriptor
    • Additional transaction control metadata


    These structures live entirely in user space, inside the DBMS.


    The operating system is unaware of them.


    Transaction management is therefore pure DBMS engineering, not OS magic.


    Who Ensures ACID?

    Applications play a surprisingly small role.


    Except for Consistency, applications do not actively enforce ACID properties.
    • Atomicity → DBMS responsibility
    • Isolation → DBMS responsibility
    • Durability → DBMS responsibility


    Applications:
    • Define transaction boundaries
    • Issue operations
    • Decide when to commit or abort


    They do not:
    • Implement rollback logic
    • Handle crashes
    • Prevent concurrent interference


    This is intentional.


    If every application had to reimplement concurrency and recovery logic, database systems would be unusable.


    The Illusion of Atomicity

    One uncomfortable truth underlies transaction management:


    No database operation is truly atomic.


    Not even a single CPU instruction is fully atomic in the physical sense.


    Disk writes happen in steps.

    Memory updates can be interrupted.

    Crashes can occur between any two instructions.


    Yet users observe transactions as:
    • Indivisible
    • Instantaneous


    Atomicity is an illusion deliberately constructed by the DBMS.


    To build this illusion, DBMS designers must answer two fundamental questions:

    1. When can the results of a transaction be made visible to others?
    2. How can partial execution be completely undone?


    The answers define the heart of transaction management.


    Two Pillars of Transaction Management

    Transaction management rests on two major subsystems:

    1. Concurrency Control
    2. Failure Recovery


    Together, they enforce:
    • Atomicity
    • Isolation
    • Durability


    Let’s examine them one by one.


    Concurrency Control: Correctness Under Overlap

    Concurrency means:
    • Multiple transactions executing at the same time
    • Or interleaving their operations


    Concurrency is essential:
    • Improves resource utilization
    • Reduces response time
    • Increases throughput


    But it is also dangerous.


    The Core Problem

    When transactions execute concurrently:
    • They may access the same data items
    • Their operations may interleave unpredictably


    This can lead to:
    • Transactions seeing temporary inconsistent states
    • Correct applications producing incorrect behavior


    Crucially:


    The DBMS must never cause correct applications to malfunction.


    This is not optional. It is a correctness requirement.


    Serializability: The Gold Standard

    If transactions ran one after another:
    • Each would see a consistent database
    • Life would be simple
    • Performance would be terrible


    Instead, the DBMS allows concurrency but enforces the illusion that:


    The execution is equivalent to some serial execution of the same transactions.


    This property is called serializability.


    A history of transactions is serializable if:
    • All committed transactions
    • Perform the same operations
    • And observe the same results
    • As in some sequential execution


    Serializability is the formal foundation of transaction isolation.


    Lock-Based Concurrency Control

    Most practical DBMSs enforce serializability using locking.


    A dedicated subsystem called the lock manager:
    • Maintains locks on data items
    • Decides which operations may proceed
    • Forces others to wait or abort


    Basic Locking Rules

    • Read operations do not modify data
      → Multiple reads can coexist
    • Write operations modify data
      → Writes conflict with:
      • Other writes
      • Reads


    So:
    • A write lock is exclusive
    • A read lock is shared


    Before accessing a data item:
    • A transaction must acquire the appropriate lock


    If it cannot:
    • It waits
    • Or aborts and rolls back


    Strict Two-Phase Locking (Strict 2PL)

    A widely used protocol is strict two-phase locking:

    1. A transaction:
      • Acquires locks as needed
      • Never releases them early
    2. All locks are released:
      • Only at commit or abort


    This guarantees:
    • Serializable executions
    • No other transaction ever sees partial effects


    Strict 2PL is the workhorse behind isolation and atomicity in most systems.


    Failure Recovery: Correctness Despite Crashes

    Concurrency control assumes a “perfect world” with no failures.


    Reality disagrees.


    Failures happen:
    • Transactions abort unexpectedly
    • Processes crash
    • Machines reboot
    • Power is lost


    When this happens, the DBMS must guarantee:
    • Effects of committed transactions survive
    • Effects of uncommitted transactions disappear


    This is the domain of failure recovery.


    Logging: The Foundation of Recovery

    To recover correctly, the DBMS maintains a journal, also called a log.


    The log:
    • Is a sequential record of changes
    • Stored separately from database files
    • Contains log records


    Each log record includes:
    • A descriptor (transaction ID, operation type)
    • A payload with undo and/or redo information


    A dedicated log manager:
    • Writes log records
    • Flushes them to stable storage
    • Coordinates with transaction execution


    Write-Ahead Logging (WAL)

    To avoid corruption, DBMSs follow a critical rule:


    Log first. Modify data later.


    This is the Write-Ahead Logging (WAL) protocol.


    Before changing a database page:
    • The DBMS writes an undo log record to stable storage


    This guarantees:
    • Even if a crash happens mid-write
    • The change can be undone


    Flush-Log-at-Commit

    When a transaction commits:
    • All its redo log records must be flushed to stable storage


    Only then is the commit acknowledged.


    This ensures:
    • Durability
    • Committed updates can always be redone after a crash


    Recovery After a Crash

    When the DBMS restarts after a failure:
    • It does not immediately accept new transactions
    • It first runs recovery


    During recovery:
    • Log records are scanned
    • Effects of committed transactions are redone
    • Effects of incomplete transactions are undone


    A critical requirement:
    • Recovery must be idempotent
    • Running recovery multiple times yields the same final state


    Depending on design, systems may use:
    • Physical logging
    • Logical logging
    • Physiological logging
    • Undo-only, redo-only, or undo-redo logs


    All aim for the same goal:


    Restore the database to a correct, consistent state.


    Where We Are Now

    We can now see how ACID stops being abstract:
    • Concurrency control enforces Isolation and helps Atomicity
    • Recovery enforces Atomicity and Durability
    • Logging bridges volatile execution and persistent storage


    Transactions are no longer just a concept.

    They are actively tracked, regulated, logged, and repaired.


    Stay tuned for knowing the next stage of learning: database interaction.





    👉 Check out: FreeDevTools


    Any feedback or contributors are welcome!


    It’s online, open-source, and ready for anyone to use.


    ⭐ Star it on GitHub: freedevtools




    More...
Working...