Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The PostgreSQL component is the heart of CLAIMS Direct. It contains the XML for the entire data warehouse collection, processes updates from the primary and functions as data source for the optional SOLR index.

...

work indexAll indices pertaining to loading.30GB (variable)
work textAll raw table data queued for loading.100GB (variable)
xml indexAll permanent indices for the data warehouse400GB
xml textAll permanent text for the data warehouse1TB
pg dataThe cluster meta data, reporting and logging directory5GB (variable)
pg xlogLog shipping for replication50GB (variable)

...

As mentioned above, the CLAIMS Direct postgresql cluster can utilize TABLESPACES to separate text, index, and work table data. An optimal (but not mandatory) layout will have each of the following paths on separate disk groups where "disk group" is understood to be a discreet disk or set of disks exposed to the operating system as a device capable of supporting an ext4 file system.

Please note, these are only suggestions. Your environment and disk sub-system naming may be different, or , you can choose not to use TABLESPACES at all. A postgresql cluster running on a 2TB RAID0 sub-system exposed as one device, for example, wouldn't benefit as noticeably using TABLESPACES as a mixed RAID environment with multiple devices.

If you chose choose to implement TABLESPACEs, you will need to edit alexandria-dwh.sql and insert your custom tablespace definitions. These are clearly marked per segment as:

...

 

su - postgres
 
# -D, --pgdata: specify location of pg data segment
# -X, --xlogdir: sepcify specify location of pg xlog segment
initdb -A trust \
       -D /var/lib/pgsql/data \
       -E utf8 \
       -X /pgxlog
 

 

The following configuration files are to be edited

...

 

# Allow trusted local access
local   all             all                                     trust
 
# IPv4 local connections
host    all             all             127.0.0.1/32            trust
 
# Other hosts on subnets that may require access, for example
# host    all             all             192.168.10.0/24         trust
 

 

/var/lib/pgsql/data/postgresql.conf

 

listen_addresses = '*'                  # what IP address(es) to listen on;
 

 

If you already have an initialized cluster, please be certain that local access is enabled for stand-alone installation. In either distributed install, if a separate services machine is created, its IP address needs access. This is imperative for the client update procedures.

...

 

# on systemd based systems:
systemctl enable postgresql.service
 
systemctl start postgresql.service
 

 

CLAIMS Direct postgresql role

...

Although specifying exact postgresql.conf configuration parameters may seem beneficial, in reality, every installation is different. There are many factors in tuning your cluster including system memory, resource contention from other services running on the server, available disk space, disk types et al. The primary cluster supporting your updates was tuned using the tool pgtune. It is recommended to run the tool and apply the changes before bulk loading the data.

Load the Tables

...

The process of loading CLAIMS Direct data into postgresql tables will take 3-5 days on most hardware configurations.

 

...

The following code block outlines the loading of the CLAIMS Direct data into postgresql tables

...

:

 

Code Block
# for each table, load ...
gunzip -c data/<table>.gz | psql -Ualexandria alexandria

...

 
# sample batch script
for tbldata in $(ls data/*.gz); do
  gunzip -c $tbldata | psql -Ualexandria alexandria
done

 

Creating the Credentials Database for CDWS

TODO: sql package, installation and population instructions

...