Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Note
titlelibxml2

Some CLAIMS Direct loading and maintenance code utilizes the PostgreSQL perl extension (plperl) as well as a heavy reliance on the libxml2 XML parsing library. The following table lists some inconsistent behavior with disparate versions of PostgreSQL and libxml2.

8.4.62.7.8works
9.1.22.7.8works
9.1.42.7.8works
9.1.72.7.8fails
9.2.42.9.1yesworks
9.3.12.7.6fails
9.3.22.9.1works

No PostgreSQL version compiled with libxml2 < 2.7.8 works and additionally, PostgreSQL 9.1.7 fails even with libxml2 2.7.8

IFI Claims CLAIMS has produced a patched release of libxml2-2.9.2 as an RPM. It is recommended to locally install this package replacing the package in the distribution. The RPM can be downloaded at the URL: http://alexandria.fairviewresearch.com/software/libxml2/f20/libxml2-2.9.2-1.fc20.x86_64.rpm. For additional versions, please contact support@ificlaims.com.

PostgreSQL Installation

Regardless of installation type, careful planning of disk resources is important for efficient data loading into and extraction out of PostgreSQL. There are 6 logical segments inside the CLAIMS Direct data warehouse.

SegmentDescriptionSize
work indexAll indices pertaining to loading30GB (variable)
work textAll raw table data queued for loading100GB (variable)
xml indexAll permanent indices for the data warehouse400GB
xml textAll permanent text for the data warehouse1TB
pg dataThe cluster meta datametadata, reporting, and logging directory5GB (variable)
pg xlogLog shipping for replication50GB (variable)

...

/var/lib/pgsql/data/pg_hba.conf

 

# Allow trusted local access
local   all             all                                     trust
 
# IPv4 local connections
host    all             all             127.0.0.1/32            trust
 
# Other hosts on subnets that may require access, for example
# host    all             all             192.168.10.0/24         trust

...

/var/lib/pgsql/data/postgresql.conf 

listen_addresses = '*'                  # what IP address(es) to listen on;

...

Finally, enable and start the PostgreSQL cluster:

 

# on systemd based systems:
systemctl enable postgresql.service
 
systemctl start postgresql.service

...

Creating the Database

The PostgreSQL data warehouse portion of CLAIMS Direct is delivered in 2 two parts:

  •  PostgreSQL database schema (alexandria-dwh.sql)
  • <table>.gz files located in the sub-directory data

 

# create the database
cat alexandria-dwh.sql | psql -Ualexandria postgres

 

Tuning the Data Warehouse

Although specifying exact postgresql.conf configuration parameters may seem beneficial, in reality, every installation is different. There are many factors in tuning your cluster including system memory, resource contention from other services running on the server, available disk space, disk types et al. The primary cluster supporting your updates was tuned using the tool pgtune. It is recommended to run the tool and apply the changes before bulk loading the data.

...

Loading the Tables

The process of loading CLAIMS Direct data into PostgreSQL tables will take 3-5 days on most hardware configurations.

...

 

for each table, load ...
gunzip -c data/<table>.gz | psql -Ualexandria alexandria
  
# sample batch script
for tbldata in $(ls data/*.gz); do
  gunzip -c $tbldata | psql -Ualexandria alexandria
done
 

Creating the Credentials Database for CDWS

Coming soon: SQL package, installation, and population instructions