Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Introduction

IFI recommends 4TB 6TB of disk space for an on-site PostgreSQL instance. Normally, this will accommodate approximately 3 years of database growth (depending on your subscription level). If you are running out of disk space more quickly than anticipated, the reason may be an increased number of deletes occurring during the update process. This is especially likely to occur when IFI replaces content at a more aggressive rate than usual, as we have been doing since October 2019 in order to replace our old translations with higher-quality translations from Google. The blog Understanding of Bloat and VACUUM in PostgreSQL does a good job of explaining why your database may be growing disproportionately to actual pipeline data flow. This blog will help diagnose whether bloat is the cause of your problem and provide some potential solutions.

...

Code Block
languagesql
SELECT schemaname || '.' || relname as tblnam,
    n    n_dead_tup,
        (n_dead_tup::float / n_live_tup::float) * 100 as pfrag
FROM pg_stat_user_tables
WHERE schemaname = 'xml' and n_dead_tup > 0 and n_live_tup > 0 order by pfrag desc;

...

After you run this, you may want to adjust the aggressiveness of your autovacuum settings to avoid accumulating more bloat in the future. The following settings should be considered:

  • autovacuum_max_workers  = 5 - This is is the recommended setting recommended setting for a 16-core machine. If you have, for example, a 4-core, a setting of 2 would be the best available option.
  • autovacuum_vacuum_scale_factor = 0.02 - This setting indicates the threshold which determines when autovacuum needs to run per table. It refers to the fraction of the table size. So if the percentage of dead tuples (the pfrag column in the SQL query above) is greater than 2%, this spawns an autovacuum process. The default value for this setting is 20%, which will create massive bloat in your database before the autovacuum process even starts.
  • autovacuum_analyze_scale_factor = 0.01 - Although not specific to table size and vacuum cleanup, this setting, which is also a percentage of fragmentation of the table, tells the autovacuum process to analyze the table (i.e., update query planner statistics) when the fragmentation percentage reaches 1% (the default is 10%).

...

1. Create a hot standby on a separate machine with +2TB disk, i.e., 6TB 8TB instead of 4TB6TB. See Hot Standby for more information.

...