The PostgreSQL component is the heart of CLAIMS Direct. It contains the XML for the entire data warehouse collection, processes updates from the primary, and functions as data source for the optional SOLR index.
...
Requirement | Recommended |
---|---|
CPU | 4-cores |
System Memory | 24GB |
Storage Capacity | 4TB (SSD preferred) |
Software Software Requirements
Requirement | Minimum Version | Notes |
---|---|---|
Operating System | RHEL 6, Fedora 20, Centos 6 | |
Development Tools | Distribution version | yum|dnf groupinstall "Development Tools" |
PostgreSQL | Distribution version | yum|dnf install \ |
System Libraries | Distribution version | yum|dnf install \# Please see note below regarding libxml2 |
Perl and Modules | Distribution version | yum|dnf install \ |
Note | |||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| |||||||||||||||||||||
Some CLAIMS Direct loading and maintenance code utilizes the postgresql PostgreSQL perl extension (plperl) as well as a heavy reliance on the libxml2 XML parsing library. The following table lists some inconsistent behavior with disparate versions of postgresql PostgreSQL and libxml2.
No postgresql PostgreSQL version compiled with libxml2 < 2.7.8 works and additionally, postgresql PostgreSQL 9.1.7 fails even with libxml2 2.7.8 IFI Claims has produced a patched release of libxml2-2.9.2 as an RPM. It is recommended to locally install this package replacing the package in the distribution. The RPM can be downloaded at the URL: http://alexandria.fairviewresearch.com/software/libxml2/f20/libxml2-2.9.2-1.fc20.x86_64.rpm. For additional versions, please contact support@ificlaims.com. |
...
Regardless of installation type, careful planning of disk resources is important for efficient data loading into and extraction out of postgresqlPostgreSQL. There are 6 logical segments inside the CLAIMS Direct data warehouse.
work index | All indices pertaining to loading | 30GB (variable) |
work text | All raw table data queued for loading | 100GB (variable) |
xml index | All permanent indices for the data warehouse | 400GB |
xml text | All permanent text for the data warehouse | 1TB |
pg data | The cluster meta data, reporting, and logging directory | 5GB (variable) |
pg xlog | Log shipping for replication | 50GB (variable) |
Each of these segments can be allocated discrete disk space through the use of TABLESPACES. Although not required, the use of TABLESPACES will improve loading and extraction performance. The total logical size of the data warehouse is approximately 2TB after initial loading.
...
As mentioned above, the CLAIMS Direct PostgreSQL cluster can utilize TABLESPACES to separate text, index, and work table data. An optimal (but not mandatory) layout will have each of the following paths on separate disk groups where "disk group" is understood to be a discreet discrete disk or set of disks exposed to the operating system as a device capable of supporting an ext4 file system.
Please note, these are only suggestions. Your environment and disk sub-system naming may be different, or you can choose not to use TABLESPACES at all. A postgresql PostgreSQL cluster running on a 2TB RAID0 sub-system exposed as one device, for example, wouldn't benefit as noticeably using TABLESPACES as a mixed RAID environment with multiple devices.
...
Creating the Database
The PostgreSQL data warehouse portion of CLAIMS Direct is delivered in 2 parts:
- postgresql PostgreSQL database schema (alexandria-dwh.sql)
- <table>.gz files located in the sub-directory data
...