...
Note | |||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| |||||||||||||||||||||
Some CLAIMS Direct loading and maintenance code utilizes the PostgreSQL perl extension (plperl) as well as a heavy reliance on the libxml2 XML parsing library. The following table lists some inconsistent behavior with disparate versions of PostgreSQL and libxml2.
No PostgreSQL version compiled with libxml2 < 2.7.8 works and additionally, PostgreSQL 9.1.7 fails even with libxml2 2.7.8 IFI Claims CLAIMS has produced a patched release of libxml2-2.9.2 as an RPM. It is recommended to locally install this package replacing the package in the distribution. The RPM can be downloaded at the URL: http://alexandria.fairviewresearch.com/software/libxml2/f20/libxml2-2.9.2-1.fc20.x86_64.rpm. For additional versions, please contact support@ificlaims.com. |
PostgreSQL Installation
Regardless of installation type, careful planning of disk resources is important for efficient data loading into and extraction out of PostgreSQL. There are 6 logical segments inside the CLAIMS Direct data warehouse.
Segment | Description | Size |
---|---|---|
work index | All indices pertaining to loading | 30GB (variable) |
work text | All raw table data queued for loading | 100GB (variable) |
xml index | All permanent indices for the data warehouse | 400GB |
xml text | All permanent text for the data warehouse | 1TB |
pg data | The cluster meta datametadata, reporting, and logging directory | 5GB (variable) |
pg xlog | Log shipping for replication | 50GB (variable) |
...
/var/lib/pgsql/data/pg_hba.conf
# Allow trusted local access local all all trust # IPv4 local connections host all all 127.0 . 0.1 / 32 trust # Other hosts on subnets that may require access, for example # host all all 192.168 . 10.0 / 24 trust |
...
/var/lib/pgsql/data/postgresql.conf
listen_addresses = '*' # what IP address(es) to listen on; |
...
Finally, enable and start the PostgreSQL cluster:
# on systemd based systems: systemctl enable postgresql.service systemctl start postgresql.service |
...
Creating the Database
The PostgreSQL data warehouse portion of CLAIMS Direct is delivered in 2 two parts:
- PostgreSQL database schema (alexandria-dwh.sql)
- <table>.gz files located in the sub-directory data
# create the database |
Tuning the Data Warehouse
Although specifying exact postgresql.conf configuration parameters may seem beneficial, in reality, every installation is different. There are many factors in tuning your cluster including system memory, resource contention from other services running on the server, available disk space, disk types et al. The primary cluster supporting your updates was tuned using the tool pgtune. It is recommended to run the tool and apply the changes before bulk loading the data.
...
Loading the Tables
The process of loading CLAIMS Direct data into PostgreSQL tables will take 3-5 days on most hardware configurations.
...
# for each table, load ... gunzip -c data/<table>.gz | psql -Ualexandria alexandria # sample batch script for tbldata in $(ls data/*.gz); do gunzip -c $tbldata | psql -Ualexandria alexandria done |
Creating the Credentials Database for CDWS
Coming soon: SQL package, installation, and population instructions