...
Requirement | Recommended |
---|---|
CPU | 4-cores |
System Memory | 24GB |
Storage Capacity | 4TB (SSD preferedpreferred) |
Software Requirements
Requirement | Minimum Version | Notes |
---|---|---|
Operating System | RHEL 6, Fedora 20, Centos 6 | |
Development Tools | Distribution version | yum|dnf groupinstall "Development Tools" |
PostgreSQL | Distribution version | yum|dnf install \ |
System Libraries | Distribution version | yum|dnf install \# Please see note below regarding libxml2 |
Perl and Modules | Distribution version | yum|dnf install \ |
Note | |||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| |||||||||||||||||||||
Some CLAIMS Direct loading and maintenance code utilizes the postgresql perl extension (plperl) as well as a heavy reliance on the libxml2 XML parsing library. The following table lists some inconsistent behavior with disparate versions of postgresql and libxml2.
No postgresql version compiled with libxml2 < 2.7.8 works and additionally, postgresql 9.1.7 fails even with libxml2 2.7.8 IFI Claims has produced a patched release of libxml2-2.9.2 as an RPM. It is recommended to locally install this package replacing the package in the distribution. The RPM can be downloaded at the URL: http://alexandria.fairviewresearch.com/software/libxml2/f20/libxml2-2.9.2-1.fc20.x86_64.rpm. For additional versions, please contact support@ificlaims.com. |
...
PostgreSQL Installation
Regardless of installation type, careful planning of disk resources is important for efficient data loading into and extraction out of postgresql. There are 6 logical segments inside the CLAIMS Direct data warehouse.
...
Each of these segments can be allocated discrete disk through the use of TABLESPACES. Although not required, the use of TABLESPACES will improve loading and extraction performance. The total logical size of the data warehouse is approximately 2TB after initial loading.
Suggested
...
PostgreSQL Disk Layout
As mentioned above, the CLAIMS Direct postgresql PostgreSQL cluster can utilize TABLESPACES to separate text, index, and work table data. An optimal (but not mandatory) layout will have each of the following paths on separate disk groups where "disk group" is understood to be a discreet disk or set of disks exposed to the operating system as a device capable of supporting an ext4 file system.
...
If you choose to implement TABLESPACEs, you will need to edit alexandria-dwh.sql
and and insert your custom tablespace definitions. These are clearly marked per segment as:
|
...
-- TABLESPACE |
...
definition for xml text -- TABLESPACE |
...
definition for work index -- TABLESPACE |
...
definition for xml index -- insert your TABLESPACE name in the appropriate section set default_tablespace = '' |
tblspc_wrkidx | work index | /wrkidx/_pg | 30GB |
tblspc_wrktxt | work text | /wrktxt/_pg | 100GB |
tblspc_xmlidx | xml index | /xmlidx/_pg | 400GB |
tblspc_xmltxt | xml text | /xmltxt/_pg | 2000GB |
During cluster initialization (initdb), you can further define the location for the segments: pg data and pg xlog.
...
PostgreSQL Initial Configuration
CLAIMS Direct requires a working postgresql PostgreSQL cluster. If you do not have an initialized cluster, the following steps will initialize the cluster and give you rudimentary authentication and access levels needed to run CLAIMS Direct.
su - postgres
# -D, --pgdata: specify location of pg data segment # -X, --xlogdir: |
...
location of pg xlog segment initdb -A trust \ -D /var/lib/pgsql/data \ -E utf8 \ -X /pgxlog |
The following configuration files are to be edited:
/var/lib/pgsql/data/pg_hba.conf
# Allow trusted local access local all all trust # IPv4 local connections host all |
...
all 127.0 . 0.1 / 32 |
...
trust # Other hosts on subnets that may require access, for example # host all |
...
all 192.168 . 10.0 / 24 |
...
trust |
/var/lib/pgsql/data/postgresql.conf
listen_addresses = '*' |
...
# what IP address(es) to listen on; |
If you already have an initialized cluster, please be certain that local access is enabled for stand-alone installation. In either distributed install, if a separate services machine is created, its IP address needs access. This is imperative for the client update procedures.
...
Finally, enable and start the postgresql PostgreSQL cluster:
# on systemd based systems: systemctl enable postgresql.service systemctl start postgresql.service |
CLAIMS Direct
...
PostgreSQL Role
CLAIMS Direct requires the role alexandria
psql -Upostgres postgres |
Creating the Database
The
...
PostgreSQL data warehouse portion of CLAIMS Direct is delivered in 2 parts:
- postgresql database schema (alexandria-dwh.sql)
- <table>.gz files located in the sub-
...
- directory data
...
# create the database |
Tuning the Data Warehouse
Although specifying exact postgresql.conf configuration parameters may seem beneficial, in reality, every installation is different. There are many factors in tuning your cluster including system memory, resource contention from other services running on the server, available disk space, disk types et al. The primary cluster supporting your updates was tuned using the tool tool pgtune. It is recommended to run the tool and apply the changes changes before bulk bulk loading the data.
Load the Tables
The process of loading CLAIMS Direct data into postgresql PostgreSQL tables will take 3-5 days on most hardware configurations.
The following code block outlines the loading of the CLAIMS Direct data into postgresql PostgreSQL tables:
...
...
# for |
...
each |
...
table, |
...
load |
...
... |
...
gunzip |
...
-c |
...
data/<table>.gz |
...
| |
...
psql |
...
-Ualexandria |
...
alexandria |
...
|
...
# |
...
sample |
...
batch |
...
script |
...
for |
...
tbldata |
...
in |
...
$(ls |
...
data/*.gz); |
...
do gunzip -c |
...
$tbldata |
...
| |
...
psql |
...
-Ualexandria |
...
alexandria |
...
done |
Creating the Credentials Database for CDWS
TODOComing soon: sql SQL package, installation, and population instructions