[Day 3] I Had a Local LLM Analyze a Year of My Credit Card Statements

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • MyrinNew
    Senior Member
    • Feb 2024
    • 5175

    #1

    [Day 3] I Had a Local LLM Analyze a Year of My Credit Card Statements

    [Day 3] I Had a Local LLM Analyze a Year of My Credit Card Statements

    Intro

    Day 3: I'm going to hand a year of credit card statements over to a local LLM and see what it can do.


    This is experiment #3.


    What I'm using today: DGX Spark + Ollama + Qwen2.5 (comparing 7B vs 72B). Ollama is the de-facto local-LLM runtime, and Qwen2.5 is a multilingual model from Alibaba (China) that handles Japanese reasonably well, apparently.





    Today's setup

    • Data: 12 months of credit card statements from a single card.
    • Volume: 383 transactions, ¥2,761,555 in total spend.
    • Goal: get the AI to spot waste patterns and propose savings.
    • Comparison axes:
      • Model size: 7B (light) vs 72B (heavy)
      • Input format: raw CSV vs pandas-aggregated summary
      • 4 patterns total


    Takeaway: "If you ask an AI to aggregate raw data, the numbers come out way off." / "If you pre-aggregate with a spreadsheet tool first and then feed the AI, you get fast and accurate results." A small but practical finding.





    1. Get the CSVs onto the DGX

    Log into the credit card company's web statements page on myPC1 (my Windows laptop), download 12 months of CSVs, then push them to the DGX.


    I deliberately skipped GitHub for the transfer this time — once you push something, it's in the history forever, and credit card data shouldn't be there even briefly. Instead, I used direct PC-to-PC transfer over SSH (one command, finishes in seconds; details in the collapsibles at the end). The .gitignore excludes private-data/ too, so accidental commits are ruled out.





    2. Install Ollama

    Ollama is the de-facto runtime for local LLMs. One command should be enough.


    There was a small password hiccup during install (details below), but eventually it was up and running.


    The DGX Spark specs really show through:
    • Memory: 121 GB
    • Default context window: ~262,144 tokens


    In other words: "throw a whole book at it, no problem" territory. Reassuring.





    3. Two model sizes: Qwen2.5 7B vs 72B

    The strategy: same model family, different sizes. That way the differences come from size, not architecture.
    • 7B (light): ~4.7 GB, downloads in 5 minutes. Fast.
    • 72B (heavy): ~47 GB, 25 minutes to download. Slow but smart.


    What does "B" mean? Short for Billion. It's the number of "weights" inside the AI — more weights, more it remembers, basically. So 7B has 7 billion weights, 72B has 72 billion.


    Loading both onto the DGX simultaneously, memory usage looks like:


    qwen2.5:72b 61 GB
    qwen2.5:7b 8.2 GB
    Total 69 GB


    69 GB. Spacious!





    4. Prepping the CSVs

    Once I had the CSVs in hand, three small headaches before they were ready for the AI:
    • Headache 1: An older encoding (Windows Japanese flavor) → needs converting to modern UTF-8
    • Headache 2: Some merchant names contain commas, which breaks naive CSV parsing
    • Headache 3: Each file has a "monthly total" line at the end that isn't actually data


    Details in the collapsible. After cleanup, the 12 files merge into a single dataset:


    Transactions 383
    Period 12 months (1 year)
    Total spend ¥2,761,555
    Avg per tx ¥7,210
    Median per tx ¥3,000
    Largest single tx ¥209,283 (overseas flight)
    Smallest ¥-3,980 (refund)


    Now to feed this to 7B and 72B and see what each of them says.





    5. Experiment 1: Throw the raw CSV at the AI

    No tricks: all 383 rows, straight at the AI. Prompt is the full ask: "As a household budget consultant, output category breakdown / monthly trend / waste patterns / savings suggestions / lifestyle hypothesis."


    7B's answer (75 seconds)

    ...this is where the numbers go wildly off.


    Amazon total ¥2,014,386 (257 tx) ¥693,663 (166 tx)
    Amazon Downloads ¥2,014,386 (257 tx) ¥80,323 (50 tx)
    Outdoor brand ¥495,740 ¥154,820
    A local recreation venue "¥49,574" cited (a different small charge actually exists)


    None of the numbers line up. Amazon total is roughly 3× off, Amazon Downloads about 25× off, and the cited venue context is a different charge entirely.


    Reading 383 rows of CSV and computing totals turned out to be a heavy lift for the 7B model.


    72B's answer (12m 9s)

    What if we throw size at the problem? After 12 minutes of patience:


    Amazon total ¥635,792 (104 tx) ¥693,663 (166 tx)
    AI/dev tools ¥193,629 (21 tx) ¥176,850 (24 tx)
    Travel ¥487,555 (43 tx) ¥416,268 (8 tx)


    Not exact, but the off-by amounts are within ~10%, and there are no fabricated venues. A real improvement.


    However — when asked about the monthly trend, here's what 72B said:


    Month 1: ¥316,789 → Month 2: ¥229,600 → Month 3: ¥237,500 → ... → Month 12: ¥291,500

    (Gradually increasing.)


    The actual range is ¥69,961 (low) to ¥493,072 (high) — a chaotic up-and-down waveform. "Gradually increasing" isn't quite right. Even 72B isn't great at aggregating distributed data over a long CSV.





    6. Experiment 2: Aggregate first, then feed the AI

    If the AI struggles with aggregation, do the aggregation in a different tool first and only hand the AI the result.


    The flow:






    📥 Raw CSV (22,132 chars, 383 rows)

    🔧 Pre-aggregate with a spreadsheet tool (Python's pandas)

    📋 Aggregate summary (1,884 chars, ~90% smaller)

    🤖 Hand it to the AI (let it interpret and propose)







    Python's pandas = a spreadsheet-like library, but ~10,000× more powerful than Excel functions, used for tabular data analysis.


    7B + pre-aggregated input (50 seconds)

    Numbers are fully accurate now.


    Amazon total ¥693,663 ¥693,663
    AI/dev tools ¥176,850 ¥176,850
    Monthly max ¥493,072 ¥493,072
    Monthly min ¥69,961 ¥69,961


    Quoting straight from the pre-aggregated numbers, the hallucinations vanished.


    And 7B did this in 50 seconds — better quality than the 72B + raw CSV at 12 minutes. Quietly remarkable.


    Time 75s 50s
    Numbers wildly off exact
    Verdict not usable as-is quote directly


    72B + pre-aggregated input (12m 13s)

    72B's numbers also match exactly (well, since they're being quoted from pre-aggregated data, that's expected). The proposal quality was the strongest of the four patterns:


    Reduce Amazon dependency
    • Current: online shopping (Amazon family) is 25.1% of total (¥693,663).
    • Suggestion: stick to essentials only, regular review, avoid impulse buys.
    • Expected savings: ¥57,805/month average (25% reduction) → ¥693,660/year


    ...wait, hold on. Annual Amazon spend was ¥693,663. The "savings" 72B suggests is ¥693,660. That's basically the same number. So the proposal is effectively "stop buying on Amazon entirely (100%)" — definitely not 25%. Apparently 72B's percentage arithmetic isn't bulletproof either.


    That aside, the lifestyle hypothesis section was kind of striking. Here's what 72B observed:

    • Heavy reliance on apps and subscriptions: "App/subscription" category is 10.5% of total
    • Frequent international travel: "Travel/airline" is 15.1%, with notable overseas charges
    • Frequent online shopping: "Online (Amazon)" is 25.1% of total


    It's just one card's data, so this isn't a complete picture — but if I fed an AI my full household financials, the analysis and advice would probably go a lot deeper.


    Summary: 4 patterns

    1 7B Raw CSV 75s ❌ Numbers way off
    2 72B Raw CSV 12m 9s △ Misread monthly trend
    3 7B Aggregated 50s ✅ Exact ○ Some repetition
    4 72B Aggregated 12m 13s ✅ Exact ◎ Best (mind the % math)


    Quietly notable: 72B takes ~12 minutes regardless of input size (shrinking the prompt didn't change wall-clock time much). Output generation is the bottleneck. Which strengthens the case for "small model + pre-aggregate" as the cost-effective default.





    7. Cross-check: the actual graphs

    Before trusting any of the AI output, let me put the real numbers on charts using the spreadsheet tool (pandas).


    Monthly spending




    Average ¥230,130/month, but the range is ¥69,961 (lowest) to ¥493,072 (highest) — about a 7× spread. The 72B's "gradually increasing" claim was a bit off the mark; the reality is bouncy.


    Category share




    "Other" being 32% is because my categorization rule is sloppy. I just wrote a simple "if the merchant name contains keyword X, bucket Y" rule, and lots of merchants didn't match any keyword and ended up in "Other." Reading meaning from a merchant name is exactly the kind of thing AI is good at, so next time I'll let the AI do the categorization itself.


    Top 15 merchants




    Amazon at ¥421,978 (105 tx) is far and away #1. Amazon really is too convenient...


    Weekday rhythm




    Tuesday alone is ¥692,549 — way above the rest. Probably because that's when most of the subscription auto-charges land.





    8. Today's takeaways

    Separate "aggregation" from "interpretation"

    Multi-row sum/average (numbers go wildly off) Categorization (interpreting fuzzy meaning)
    Percentage math (saw "25% off → 100% off") Pattern recognition / hypothesis generation
    Distributed aggregation like monthly totals Narrative interpretation, savings proposals


    Aggregation is the spreadsheet tool's job; interpretation is the AI's. When you split the work, things go fast and accurate. "Data prep matters before analysis" — yeah, that old saying really is true. Note to self.


    Sometimes input quality beats raw size

    "7B + pre-aggregated input in 50 seconds" outperformed "72B + raw CSV in 12 minutes". Sometimes you don't need a bigger model — you need cleaner input. Felt that one today.


    The local-LLM angle

    Feeding 12 months of raw credit card data to an AI without a single byte going to the cloud — it was surprisingly stress-free. This is one of the spots local LLMs really shine. Got personal info, or anything cloud-uncomfortable? This is the place for them.





    9. Tech details (Claude explains)

    The technical bits, written up by my AI pair.

    1. SCP transfer to the DGX (mDNS, no IP needed)


    NVIDIA Sync auto-configures a Host alias in ~/AppData/Local/NVIDIA Corporation/Sync/config/ssh_config:



    Host spark-XXXX.local
    Hostname spark-XXXX.local
    User [user]
    Port 22
    IdentityFile "...\\nvsync.key"




    Which means I can SSH/SCP using spark-XXXX.local without ever looking up an IP. The .local suffix uses mDNS (Multicast DNS) for hostname resolution within the LAN.


    Transfer command (one line, from PowerShell on the Windows side):



    scp -r "C:\Users\[user]\Desktop\docs\dgx\csv" spark-XXXX.local:/home/[user]/personal/dgx-100-experiments/private-data/credit-card-csv



    1. Ollama install + the sudo-TTY catch + GPU detection log


    Ollama install:



    curl -fsSL https://ollama.com/install.sh | sh




    Running this through Claude Code's Bash, it errored at the sudo password prompt — an interactive TTY is required there:



    sudo: a terminal is required to read the password




    Reopened a separate SSH session, ran the same command manually, and it went through.


    Once installed, systemd auto-starts the service. The GPU detection log via journalctl -u ollama:



    inference compute id=GPU-986c194b... name=CUDA0 description="NVIDIA GB10"
    total="121.7 GiB" available="79.0 GiB"
    default_num_ctx=262144
    • VRAM (DGX Spark unified memory): 121.7 GiB
    • Default context: 262,144 tokens


    Compared with a typical RTX 4090 (24 GB VRAM, 8K–32K default context), the gap is significant.

    1. Loading both models simultaneously



    ollama pull qwen2.5:7b # 4.7 GB
    ollama pull qwen2.5:72b # 47 GB




    After loading both, ollama ps shows:



    NAME SIZE PROCESSOR CONTEXT
    qwen2.5:72b 61 GB 100% GPU 32768
    qwen2.5:7b 8.2 GB 100% GPU 32768




    Total ~69 GB used out of 79 GB available. Both models stay resident, switching between them is instant.

    1. Custom CSV parser for the credit card data


    Three quirks needed handling: CP932 encoding, no quotes (commas in some merchant names break parsing), and a trailing summary row in each file.



    def parse_line(line: str) -> list[str] | None:
    fields = line.rstrip("\r\n").split(",")
    if len(fields) < 7 or not fields[0]:
    return None # skip blank/summary rows
    if len(fields) > 7:
    merchant = ",".join(fields[1:-5])
    fields = [fields[0], merchant] + fields[-5:]
    return fields


    def load_one(path: Path) -> pd.DataFrame:
    rows = []
    with path.open(encoding="cp932") as f:
    next(f) # skip header (cardholder metadata)
    for line in f:
    parsed = parse_line(line)
    if parsed is not None:
    rows.append(parsed)
    df = pd.DataFrame(rows, columns=COLUMNS)
    df["利用日"] = pd.to_datetime(df["利用日"], format="%Y/%m/%d")
    df["利用金額"] = df["利用金額"].astype(int)
    return df



    1. Japanese fonts in matplotlib


    japanize-matplotlib doesn't work on Python 3.12 — it imports distutils, which was removed from the standard library.


    The modern replacement is matplotlib-fontja:



    pip install matplotlib-fontja






    import matplotlib_fontja # noqa: F401 ← just importing it sets up IPAexGothic



    1. Calling Ollama from Python


    The official ollama Python client is straightforward:



    import ollama

    client = ollama.Client()
    stream = client.chat(
    model="qwen2.5:72b",
    messages=[
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": user_prompt},
    ],
    options={"temperature": 0.3},
    stream=True,
    )
    for chunk in stream:
    print(chunk["message"]["content"], end="", flush=True)




    Streaming makes long generation easier to watch unfold.





    Tomorrow: Day 4

    Day 4 plan: let a local AI sort 20,000 iPhone photos.


    The actual goal is to have a local image-recognition model (CLIP family?) clean up my photo library so I can stop paying iCloud for storage upgrades...!





    100ExperimentsWithDGX #LocalLLM #Ollama



    More...
Working...