Soft Deletes in Multi-Tenant PostgreSQL: Patterns and Pitfalls

Q: What is the naive implementation and why does it fail at scale?

The naive approach looks like this ``sql ALTER TABLE orders ADD COLUMN deletedat TIMESTAMPTZ; CREATE INDEX orderstenantidx ON orders (tenantid, created_at); ``

Q: How do you use partial indexes to fix this?

A partial index indexes only the rows that match a WHERE clause. For soft deletes, the right pattern is ``sql CREATE INDEX ordersactiveidx ON orders (tenantid, createdat DESC) WHERE deleted_at IS NULL; ``

Q: What is the index bloat and VACUUM problem in practice?

PostgreSQL MVCC means that when you run UPDATE orders SET deleted_at = NOW() WHERE id = $1, the original row does not disappear. PostgreSQL creates a new row version and marks the old one as dead. VACUUM reclaims the space from dead rows.

Soft deletes look simple: add a deleted_at column, filter on NULL, done. In a multi-tenant PostgreSQL database at scale, the naive implementation creates index bloat, breaks your Row Level Security policies, and leaks deleted data in COUNT queries. This post covers the correct patterns and the specific PostgreSQL behaviors you need to know before shipping soft deletes to production.

Soft deletes look simple: add a deleted_at column, filter on IS NULL, and you are done. In a multi-tenant PostgreSQL database at scale, the naive implementation creates index bloat, breaks your Row Level Security policies, leaks deleted data in aggregate queries, and eventually turns your VACUUM runs from routine maintenance into fire drills. This post covers the correct patterns for soft deletes in a multi-tenant Go SaaS backend, and the specific PostgreSQL behaviors you need to understand before shipping this to production.

Why do teams reach for soft deletes in the first place?

Soft deletes are a data integrity pattern. Instead of removing a row from the database with DELETE, you stamp it with a timestamp (deleted_at = NOW()) and filter it out in all queries going forward. The row stays in the database forever, which means you get audit history for free, you can recover accidentally deleted records without restoring a backup, and foreign key references never become dangling.

For Lebanese and MENA SaaS products that serve industries with regulatory obligations like healthcare, legal services, and finance, soft deletes are often a requirement. A clinic management platform cannot let a doctor permanently delete a patient record. A legal billing system cannot let an operator wipe an invoice. The row stays, it just becomes invisible to normal application queries.

What is the naive implementation and why does it fail at scale?

The naive approach looks like this:

ALTER TABLE orders ADD COLUMN deleted_at TIMESTAMPTZ;
CREATE INDEX orders_tenant_idx ON orders (tenant_id, created_at);

And in every query:

SELECT * FROM orders
WHERE tenant_id = $1
  AND deleted_at IS NULL
ORDER BY created_at DESC
LIMIT 50;

This works fine at low volume. At 500,000 rows per tenant, it breaks in two ways.

First, the index orders_tenant_idx includes every row including the deleted ones. If 30 percent of your rows are soft-deleted, your index is 30 percent larger than it needs to be for active queries. Scans take longer, page cache usage is higher, and your index maintenance overhead grows with every delete.

Second, deleted_at IS NULL is not a simple equality predicate. PostgreSQL can use a regular B-tree index to filter by tenant_id = $1, but the IS NULL check forces it to scan every row in that tenant's partition of the index and check the condition. At scale, this stops using the index efficiently.

How do you use partial indexes to fix this?

A partial index indexes only the rows that match a WHERE clause. For soft deletes, the right pattern is:

CREATE INDEX orders_active_idx
    ON orders (tenant_id, created_at DESC)
    WHERE deleted_at IS NULL;

This index contains only the active (non-deleted) rows. It is dramatically smaller than a full index on the table, and PostgreSQL can use it directly for the common query pattern WHERE tenant_id = $1 AND deleted_at IS NULL. Query planning is simpler, index scans are faster, and VACUUM has less dead tuple overhead to manage for this specific index.

The tradeoff is that this index is useless for queries over deleted rows. If you have an admin panel that needs to show deleted records, you need a separate index or a full table scan with an appropriate filter.

-- For admin recovery queries, a separate partial index on deleted rows
CREATE INDEX orders_deleted_idx
    ON orders (tenant_id, deleted_at DESC)
    WHERE deleted_at IS NOT NULL;

How do soft deletes interact with Row Level Security?

In a multi-tenant PostgreSQL setup, you are likely using Row Level Security to enforce tenant isolation. The standard RLS policy looks like this:

ALTER TABLE orders ENABLE ROW LEVEL SECURITY;

CREATE POLICY orders_tenant_isolation ON orders
    USING (tenant_id = current_setting('app.tenant_id')::uuid);

This policy correctly isolates tenants. But it does not filter soft-deleted rows. Your application layer filters on deleted_at IS NULL, which is correct for most reads, but any query that bypasses your ORM or service layer and hits the database directly will still see deleted rows within the same tenant.

For stricter isolation where deleted rows should be completely invisible unless explicitly requested, add the soft-delete filter to the RLS policy:

CREATE POLICY orders_active_tenant ON orders
    USING (
        tenant_id = current_setting('app.tenant_id')::uuid
        AND deleted_at IS NULL
    );

The downside of baking soft-delete filtering into RLS is that you now need a separate role with elevated privileges to run recovery queries or admin operations that need to see deleted rows. We typically implement this with a second database role called app_admin that has a permissive policy:

CREATE POLICY orders_admin_all ON orders
    TO app_admin
    USING (tenant_id = current_setting('app.tenant_id')::uuid);

The app_admin role sees deleted rows. The app_user role does not.

What happens to foreign keys and joins with soft-deleted rows?

This is where soft deletes get complicated. If you soft-delete a parent record, the child records are still active and their foreign keys still point to the (now soft-deleted) parent. Your application needs to decide what to do here, and PostgreSQL will not enforce it for you.

Three common approaches:

Cascade soft-delete. When you soft-delete an order, automatically soft-delete all order items. This requires a service-layer implementation or a trigger:

func (s *OrderService) SoftDelete(ctx context.Context, orderID, tenantID uuid.UUID) error {
    return s.db.Transaction(func(tx *sqlx.Tx) error {
        now := time.Now()
        _, err := tx.ExecContext(ctx,
            `UPDATE orders SET deleted_at = $1
             WHERE id = $2 AND tenant_id = $3 AND deleted_at IS NULL`,
            now, orderID, tenantID,
        )
        if err != nil {
            return err
        }
        _, err = tx.ExecContext(ctx,
            `UPDATE order_items SET deleted_at = $1
             WHERE order_id = $2 AND deleted_at IS NULL`,
            now, orderID,
        )
        return err
    })
}

Restrict soft-delete. Block soft-deleting a parent if active children exist. This keeps referential integrity explicit in your service layer.

Allow orphans. Accept that soft-deleted parent rows will have active children. Useful when children are independently meaningful records.

We default to cascade soft-delete for most MENA SaaS clients because it produces the cleanest audit trail: every related record shows the same deletion timestamp.

What is the index bloat and VACUUM problem in practice?

PostgreSQL MVCC means that when you run UPDATE orders SET deleted_at = NOW() WHERE id = $1, the original row does not disappear. PostgreSQL creates a new row version and marks the old one as dead. VACUUM reclaims the space from dead rows.

In a SaaS system with high delete volume, soft deletes compound this. You have two categories of dead tuples: old versions from normal updates, and rows that your application considers deleted but PostgreSQL still stores. Both need VACUUM.

For high-churn tables (orders, sessions, events), configure autovacuum aggressively:

ALTER TABLE orders SET (
    autovacuum_vacuum_scale_factor = 0.01,  -- vacuum when 1% of rows are dead
    autovacuum_analyze_scale_factor = 0.005,
    autovacuum_vacuum_cost_delay = 2
);

The default autovacuum_vacuum_scale_factor = 0.2 means autovacuum waits until 20 percent of rows are dead. On a 1M-row orders table with 15 percent monthly soft-delete rate, you will wait months before autovacuum triggers. The index bloat builds silently until query times degrade.

For one Lebanese e-commerce platform we work with, this was the difference between a 12ms p99 query time and a 380ms p99 query time on the orders listing endpoint. Same data volume, same indexes, same hardware. The only difference was autovacuum configuration.

Key lessons from production

Soft deletes are the right pattern for SaaS systems where data needs to be auditable and recoverable. The naive implementation works at low scale and breaks predictably as data grows. Partial indexes cut index bloat significantly. RLS integration with a separate admin role gives you clean isolation without sacrificing recovery capability. Aggressive autovacuum configuration on high-churn tables prevents the slow query degradation that comes from dead tuple accumulation.

The test for whether your soft-delete implementation is production-ready: run EXPLAIN ANALYZE on your most common list query after inserting one million rows with 30 percent soft-deleted. If the index scan rows exceed the actual result rows by more than 5x, your index strategy needs work.

Not sure where to start?

Voxire builds multi-tenant SaaS backends on PostgreSQL for companies across Lebanon and the MENA region. If you are designing your data layer and want to avoid the common pitfalls before your table sizes make them expensive to fix, we can help.

https://voxire.com/get-a-quote/

Why do teams reach for soft deletes in the first place?

What is the naive implementation and why does it fail at scale?

How do you use partial indexes to fix this?

How do soft deletes interact with Row Level Security?

What happens to foreign keys and joins with soft-deleted rows?

What is the index bloat and VACUUM problem in practice?

Key lessons from production

Not sure where to start?

Keep reading

بناء نظام موارد بشرية وكشوف رواتب بـ Go للشركات في لبنان والمنطقة

Service-to-Service Authentication in Go Microservices: Internal Tokens and Shared Secrets in Production

Structuring a Go SaaS Backend: Package Design, Interfaces, and Dependency Injection That Scales