Field note · Used hardware

Six months on a used RTX 3090 — what actually broke

Bought it for $820 in November. It's now May. This is what happened.

I bought a used RTX 3090 from eBay in November 2025. The card was an EVGA FTW3 Ultra, listed by an original owner with the original box and a 30-day return window. $820 plus $40 shipping. Seller rating 100% positive on 1,400 transactions. Photos showed both fan sides, the underside of the cooler, and the 8-pin connectors. I asked about duty cycle; the seller said "gaming maybe 4 hours a week for two years, no mining." That sounded honest because the price would've been lower if they wanted to move it fast.

Six months later — daily inference workload, ~3-5 hours a day of mixed Llama 3.3 70B Q4 chat plus occasional ComfyUI image generation — here's the failure log, in chronological order.

Month 1: nothing

First 30 days were uneventful. nvidia-smi showed steady 320-340W under sustained inference, GPU temp 72-78°C, fan ramp up at 75°C to about 65%. Idle draw 22-25W. The card looked exactly like a new card on the bench. I felt smug about the $1,000 I saved vs a new 4090.

Month 2: a coil whine I'd not noticed in store

Started hearing a faint coil whine during long inference sessions. Not loud — about as audible as the fans on idle — but present. Specifically during model loading, when the memory bandwidth was peaking. I'd missed it on the test bench because the test bench is in a noisy room and I was running single-prompt benchmarks, not 8K-context Llama loads.

Coil whine isn't a failure mode. It's a manufacturing variance on the inductor windings. Some 3090s have it, some don't, and you can't tell from photos. If I were buying again I'd ask explicitly: "Does it whine under load? Can you record a 20-second video at 80% utilization?" Most flippers won't. That's data.

Month 3: the fan bearing started ticking

At about month 3 — call it 200 hours of accumulated inference time — the middle fan started a faint tick at low RPM. Specifically when fans were ramping down from a high load back to idle. At full speed it was inaudible; at 30-40% it was intermittent; under 1,000 RPM it was clearly bearing wear.

This is the most predictable failure mode on a used Ampere card. The fans are sleeve-bearing 80mm units; 200-300 hours of sustained AI inference is more duty cycle than they were designed for. Replacement fans run $20-30 for a triple-pack on AliExpress; the swap is a 30-minute job with a Phillips screwdriver and patience for the screw under the GPU sticker. I haven't done it yet because the noise hasn't crossed the threshold of bothering me. The card is in a room I work in 5 feet from my chair.

If you're noise-sensitive, factor in $30 of replacement fans + an evening of work into the used-3090 budget. The math doesn't change much; the planning does.

Month 4: the surprise that wasn't a surprise

Hit a CUDA out-of-memory error I didn't expect. I'd been running Llama 3.3 70B Q4_K_M with a 16K context window for months without trouble. One Wednesday afternoon, after a background task had been quietly accumulating embeddings into a Postgres pgvector store, I tried to reload the model and got OOM at the load step.

The cause turned out to be the embedding model. I'd left bge-large-en-v1.5 resident on the GPU between batches, which eats about 600 MB. With KV cache headroom for 16K context plus a draft model for speculative decoding, the budget was tight. Adding 600 MB pushed me over. Solved by unloading the embedding model between batches; nothing about the GPU actually failed.

The lesson — and I knew this in theory, but had not internalized it operationally — is that VRAM budget is not "model size." VRAM budget is total resident allocation across every model and every cache. A 24 GB card running 70B Q4 with KV cache at 16K context plus a small embedding model is at 95% VRAM. The next thing you ask of it OOMs. I now keep a 4 GB headroom budget and check `nvidia-smi --query-gpu=memory.free --format=csv` before starting any new resident model. Should have been doing that from day one.

Month 5: the connector check I should have done in month 0

Pulled the card out for a planned case airflow improvement. Inspected both 8-pin power connectors for the first time. The plastic on one of the pins on the upper connector was slightly discolored — not melted, not catastrophic, but warmer than I wanted. The PSU side of the same cable was fine. Used a contact-cleaner spray on both ends, reseated the connector, and watched temperatures for the next two weeks. Discoloration didn't progress. Probably caused by the original owner's PSU sag at peak load over the years before I bought it.

Should've inspected on day one. The PCIe 8-pin failure mode is less dramatic than the 12VHPWR connector on the 4090, but it's still real on used cards that have run hot for years. Worth a five-minute inspection on receipt of any used card.

Month 6: the throughput drift I'd missed

At about month 6 I noticed Llama 70B Q4 decode throughput had dropped from ~14-15 tok/s when I bought the card to ~12-13 tok/s. Looked at temperatures — same as before, 72-78°C sustained. Looked at clocks — 1,420-1,480 MHz, about 100 MHz lower than month 1. The card was holding its boost less aggressively over the same workload.

Two probable causes I haven't disambiguated: (1) thermal paste is starting to degrade after roughly 4 years of total card age, or (2) the PyTorch / CUDA stack on this machine has drifted through several updates since November and a regression somewhere in the kernel selection logic is shaving 5-8% throughput. I suspect both. The repaste is a $5 part and an evening of work; I haven't done it.

For comparison: the same workload on the same card with the same PyTorch + CUDA combination on a fresh OS install runs at 13.5-14 tok/s. So somewhere between 0.5 and 1.5 tok/s of the drift is software, the rest is mechanical. That's directionally honest and operationally useless — both should be addressed and I haven't gotten to either.

Things I expected to break that didn't

The VRAM thermal pads. I went into this expecting to need to repad in month 6-9 based on community lore. The card has held memory temperatures at ~85-90°C under sustained image generation, which is hot but not in the territory where GDDR6X errors start to surface. I'll inspect at month 12.
The 12VHPWR-style failure. Doesn't apply — this is the older 8-pin connector setup. The 4090's 12VHPWR risk doesn't carry over.
Any dramatic noise increase. The card is louder than a new 4090 with stock fans, but only marginally — 38-42 dBA at 1m sustained, vs maybe 35-38 dBA on the new card. Not enough to drive a fan replacement on its own.

Would I buy a used 3090 again in 2026?

Yes — but with a different process. Six months ago I treated the purchase as a one-time decision. With hindsight, it should have been the start of a maintenance plan I committed to before the card arrived. Specifically:

Order a triple-pack of replacement 80mm fans on the same day I order the card. They cost $30 and ship in a week. I should have had them on hand before month 3.
Order Arctic MX-6 thermal paste and Gelid Solutions Extreme thermal pads at the same time. ~$25 combined. Plan a repaste at month 6-9. That's standard maintenance for a used card, not a panic move.
Inspect both 8-pin connectors on receipt. Take photos. If there's any discoloration, return the card within the seller's window. Don't try to "see if it's okay."
Do a 1-hour stress test at full inference load on receipt and log nvidia-smi output. That establishes the baseline throughput and temperature so you can detect drift later. I didn't do this and I now can't tell whether my month-6 throughput drop is normal aging or my own software regression.

What this didn't change my mind about

The used 3090 is still the highest-leverage AI buy in 2026 for buyers in the $700-1,000 range who need 24 GB of CUDA VRAM. Six months in, the card has paid back its purchase price several times over in inference I'd otherwise have done in cloud. The maintenance overhead is real but small — call it $50-80/year in parts and a couple of evenings of work. The new-card alternative at the same VRAM tier is $1,800+ for a 4090 or $2,000-2,500 for a 5090.

What changed is how I'd frame the buy to someone else. Six months ago I'd have said "used 3090 is the value play." Today I'd say "used 3090 is the value play if you're willing to treat it as a maintenance commitment, not a plug-and-play purchase." Most buyers are willing. Some aren't. Knowing which one you are is more important than the spec sheet.

This is a field note — a specific deployment story, not a buyer guide. There's no ranked picks block at the bottom and no "check current price" CTA. If you want the buyer-guide framework for used GPUs, the best used GPU for local AI guide covers it. If you want our editorial principles, see editorial philosophy.