The PostgreSQL component is the heart of CLAIMS Direct. It contains the XML for the entire data warehouse collection, processes updates from the primary, and functions as data source for the optional SOLR index.
Hardware Requirements
Requirement | Recommended |
---|---|
CPU | 4-cores |
System Memory | 24GB |
Storage Capacity | 4TB (SSD preferred) |
Software Requirements
RequirementMinimum | VersionSupported Versions | Notes |
---|---|---|
Operating System | RHEL 7, CentOS 7, Amazon Linux AMI | |
PostgreSQL | 9.2-6, 10 | yum -y install/ \Note:Versions 10 and above are incompatible with CLAIMS Direct. |
IFI CLAIMS Repository | RHEL/CentOS 7 Note: CentOS 7 needs an additional repository: Amazon Linux |
Note | ||
---|---|---|
| ||
Some CLAIMS Direct loading and maintenance code utilizes the PostgreSQL perl extension (plperl) as well as a heavy reliance on the libxml2 XML parsing library. IFI CLAIMS has produced a patched release of libxml2 as an RPM. It is highly recommended to update libxml2 from the IFI CLAIMS software repository. For additional distributions, please contact support@ificlaims.com. |
PostgreSQL
InstallationRegardless of installation type, careful planning of disk resources is important for efficient data loading into and extraction out of PostgreSQL. There are 6 logical segments inside the CLAIMS Direct data warehouse.
Each of these segments can be allocated discrete disk space through the use of TABLESPACES. Although not required, the use of TABLESPACES will improve loading and extraction performance. The total logical size of the data warehouse is approximately 2TB after initial loading.
Suggested PostgreSQL Disk Layout
As mentioned above, the CLAIMS Direct PostgreSQL cluster can utilize TABLESPACES to separate text, index, and work table data. An optimal (but not mandatory) layout will have each of the following paths on separate disk groups where "disk group" is understood to be a discrete disk or set of disks exposed to the operating system as a device capable of supporting an ext4 file system.
Please note, these are only suggestions. Your environment and disk sub-system naming may be different, or you can choose not to use TABLESPACES at all. A PostgreSQL cluster running on a 2TB RAID0 sub-system exposed as one device, for example, wouldn't benefit as noticeably using TABLESPACES as a mixed RAID environment with multiple devices.
If you choose to implement TABLESPACES, you will need to edit alexandria-dwh.sql
and insert your custom tablespace definitions. These are clearly marked per segment as:
-- TABLESPACE definition
for
work text
-- TABLESPACE definition
for
xml text
-- TABLESPACE definition
for
work index
-- TABLESPACE definition
for
xml index
-- insert your TABLESPACE name in the appropriate section
set default_tablespace =
''
tblspc_wrkidx | work index | /wrkidx/_pg | 30GB |
tblspc_wrktxt | work text | /wrktxt/_pg | 100GB |
tblspc_xmlidx | xml index | /xmlidx/_pg | 400GB |
tblspc_xmltxt | xml text | /xmltxt/_pg | 2000GB |
During cluster initialization (initdb), you can further define the location for the segments: pg data and pg xlog.
PostgreSQL Initial Configuration
CLAIMS Direct requires a working PostgreSQL cluster. If you do not have an initialized cluster, the following steps will initialize the cluster and give you rudimentary authentication and access levels needed to run CLAIMS Direct.
su - postgres
# -D, --pgdata: specify location of pg data segment
# -X, --xlogdir: specify
location of pg xlog segment
initdb -A trust \
-D /var/lib/pgsql/data \
-E utf8 \
-X /pgxlog
The following configuration files are to be edited:
/var/lib/pgsql/data/pg_hba.conf
# Allow trusted local access
local all all trust
# IPv4 local connections
host all all
127.0
.
0.1
/
32
trust
# Other hosts on subnets that may require access,
for
example
# host all all
192.168
.
10.0
/
24
trust
/var/lib/pgsql/data/postgresql.conf
listen_addresses = '*' # what IP address(es) to listen on; |
If you already have an initialized cluster, please be certain that local access is enabled for stand-alone installation. In either distributed install, if a separate services machine is created, its IP address needs access. This is imperative for the client update procedures.
Finally, enable and start the PostgreSQL cluster:
# on systemd based systems:
systemctl enable postgresql.service
systemctl start postgresql.service
CLAIMS Direct PostgreSQL Role
CLAIMS Direct requires the role alexandria:
psql -Upostgres postgres=#>
Schema and Tools
In an effort to further streamline CLAIMS Direct PostgreSQL schema versioning, we are introducing a schema and tools package into the CLAIMS Direct yum
repository. The new package will offer the main CLAIMS Direct database schema as well as the supporting users
database to support deployment of on-site web services. The new package can be installed as follows:
Code Block | ||
---|---|---|
| ||
# If you haven't already, install the repository:
# CentOS, RHEL 7
sudo yum -y install https://repo.ificlaims.com/ifi-claims-direct/centos/7/x86_64/ifi-claims-direct-beta-0.0.2-1.el7.centos.x86_64.rpm
# CentOS, RHEL 8
sudo yum -y install https://repo.ificlaims.com/ifi-claims-direct/centos/8/x86_64/ifi-claims-direct-beta-0.0.2-1.el8.x86_64.rpm
# Install the schema and tools
sudo yum -y install alexandria-schema-tools |
The package contains the SQL needed to create the database(s) supporting PostgreSQL versions 9.2 - 10.x, as well as tools to check installation, populate the tables, and perform extracts. To create the databases, simply load the SQL via psql
into the instance.
Code Block | ||
---|---|---|
| ||
echo "create role alexandria with superuser login; |
Creating the Database
The PostgreSQL data warehouse portion of CLAIMS Direct is delivered in two parts:
- PostgreSQL database schema (alexandria-dwh.sql)
- <table>.gz files located in the sub-directory data
cat alexandria-dwh.sql
" \ | psql -Upostgres postgres gunzip -c /usr/share/alexandria-schema-tools/alexandria-dbs-10.x.gz \ | psql -Ualexandria -h <POSTGRESQL-HOST> postgres # There's also support for 9.x, # gunzip -c /usr/share/alexandria-schema-tools/alexandria-dbs-9.x.gz \ # | psql -Ualexandria -h <POSTGRESQL-HOST> postgres |
Tuning the Data Warehouse
pgtune (optional)
Although specifying exact postgresql.conf configuration parameters may seem beneficial, in reality, every installation is different. There are many factors in tuning your cluster including system memory, resource contention from other services running on the server, available disk space, disk types et al. The primary cluster supporting your updates was tuned using the tool pgtune. It is recommended to run the tool and apply the changes before bulk loading the data. Note that this requires Python.In addition to the schema, there is a collection of tools to help determine instance loading feasibility, populating, bulk extracting, counts and sizes of tables, and more. All tools accept the same parameters with reasonable defaults for a Type-1 CLAIMS Direct installation.
Code Block | ||
---|---|---|
| ||
yumUsage: -y install pgtune pgtune -i postgresql.conf -T DW -c 500 [ paste added configuration options at end of postgres.conf ] |
Redirecting errors to LOG.2 (optional)
If desired, you can redirect errors (if any) to LOG.2:
Code Block | ||
---|---|---|
| ||
cat alexandria-dwh.sql | psql -Ualexandria postgres 2>LOG.2 |
Pre-flight check
Use the pre-flight check to ensure the database and operating system are properly configured before loading the tables:
Code Block |
---|
./pre-flight-check.sh |
The sample output of a properly configured system would look like this:
Code Block |
---|
# CD-TOOL OPTIONS Option Description Default -------------------------------------------------------- -H|--host specify host localhost -P|--port specify port 5432 -d|--database specify database alexandria -u|--user specify database user name alexandria -h|--help print this usage and exit |
Tool | Description | |||||
---|---|---|---|---|---|---|
cd-pre-flight-check.sh | Use this tool to test the viability of populating the newly created database. It runs a few simple checks.
|
|
|
|
|
|
|
Loading the Tables
The process of loading CLAIMS Direct data into PostgreSQL tables will take 1-2 days on most hardware configurations.
The following code block outlines the loading of the CLAIMS Direct data into PostgreSQL tables:
cd-load.sh | You can use this tool to populate the initial data delivery. After unpacking or otherwise making the initial data available,
Note: This tool expects there to be a | |||||
cd-count.sh | This tool counts the rows in all tables in the main schema (cdws, xml)
| |||||
cd-analyze.sh | Analyze all tables in the main schema | |||||
cd-table-size.sh | Calculate the on-disk size of each table in the main schema
| |||||
cd-extract.sh | Extract all relevant table data to |
Panel | ||
---|---|---|
| ||
Once the data has been loaded, proceed to:
|