A slow PostgreSQL database under production pressure invites guessing, and guessing is how rescues go wrong. Adding an index, raising a memory setting or restarting a server can hide the real bottleneck and introduce new risk. The work that actually holds up starts from evidence and changes one thing at a time.
When the problem appears
The call usually comes when something visible breaks: pages time out at peak, background jobs pile up, a nightly report no longer finishes, or connection counts spike until the application cannot open new sessions. Often nothing changed in the query, only the data volume. Sometimes a deploy added a query that looks harmless and runs a sequential scan over a table that has quietly grown to tens of millions of rows. The first job is not to fix anything. It is to describe the incident precisely.
Common failure modes
Tuning settings before reading queries. Raising work_mem or shared_buffers feels productive, but if one query plan is wrong, the server-wide knob just spreads the cost around. Evidence first, settings second.
Indexing by intuition. Indexes that match no real query add write overhead and bloat without helping reads. Unused indexes are a cost, not insurance.
Ignoring locks and long transactions. A single long-running transaction can block autovacuum, hold locks and stall everything behind it. The slow symptom is often a lock problem wearing a performance costume.
Letting bloat and autovacuum drift. Tables and indexes bloat when autovacuum cannot keep up, and bloat slowly degrades every query against them. It is invisible until you measure it.
Changing production blind. A fix that works on a small dataset can behave very differently against production-sized data. Without a production-like test, “it worked locally” is not reassurance.
How to investigate
Start with the shape of the incident, then narrow with evidence.
Describe the symptom precisely
Slow pages, delayed jobs, lock waits, high CPU, IO pressure, connection exhaustion or growing latency each point down a different path. Naming the symptom narrows the search before you touch anything.
Read the queries
Find the slow statements, look at their execution plans, compare row estimates against reality, and check for missing indexes, unused indexes and repeated full scans. A handful of plans usually tells a clearer story than any generic tuning advice.
Check the operational layer
Inspect locks, long transactions, autovacuum health, table and index bloat, recent migration behavior, connection pooling and background job pressure. A slow database is frequently a system problem, not only a SQL problem.
Operational checklist
- The incident has a named symptom, not just “the database is slow.”
- Slow queries are identified from evidence, with plans captured.
- Row estimates are compared to actual rows to spot stale statistics.
- Locks and long-running transactions have been ruled in or out.
- Autovacuum health and bloat are measured, not assumed.
- Connection pooling and job concurrency are accounted for.
- Every proposed change has a rollback and a way to verify it.
A safe path forward
Measure, isolate one bottleneck, change the smallest useful thing, verify under production-like conditions, and keep rollback options clear. Resist the urge to apply five changes at once; if something improves, you will not know which change did it, and if something breaks, you will not know which to undo. One change, one measurement, repeat.
This database work usually arrives as part of modernization and infrastructure rescue, and it often sits next to the real-time operational dashboards that made the slowdown visible in the first place. It connects to our broader services and to the modernization thinking in the Rails modernization checklist. If a production database is under pressure, tell us what is breaking and we will start from evidence.
This article is general engineering guidance, not advice for one specific system. A real rescue should always begin with measurements from your own production environment.

