|Pivotal HAWQ / HAWQ Administration|
This chapter provides information on adding additional resources to an existing HAWQ system to scale performance.
Careful planning is critical to the success of a system expansion operation. By thoroughly preparing all new hardware and carefully planning all the steps of the expansion procedure, you can minimize risk and down time for the HAWQ database. For performance-related considerations when expanding large-scale systems, see Planning Table Redistribution later in this section.
This section provides an overview and a checklist for the system expansion process.
System expansion consists of three phases:
You can refer to the general considerations for deploying new hardware described in Planning New Hardware Platforms. For more information on hardware platforms, consult the Pivotal platform engineers. After the new hardware platforms are provisioned and networked, you must run performance tests using Pivotal HAWQ utilities.
These operations are performed with the system offline. The gpexpand utility will shut down the database during initialization, if not already shut down.
Users can continue to access HAWQ after initialization is complete and the system is back online, though they could experience some performance degradation on systems that rely heavily on hash distribution of tables. During this process, normal operations such as ETL jobs, user queries, and reporting can continue, although users might experience slower response times.
To complete system expansion, you must run gpexpand to redistribute data tables across the newly added segments. Depending on the size and scale of your system, this might be accomplished in a single session during low-use hours, or it might require you to divide the process into batches over an extended period. Each table or partition will be unavailable for read or write operations during the period in which it is being redistributed. As each table is successfully redistributed across the new segments, according to its distribution key (if any), the performance of the database should incrementally improve until it equals and then exceeds pre-expansion performance levels.
In a typical operation, you will run the gpexpand utility four times, using different options, during the complete expansion process.
gpexpand -f hosts_file
gpexpand -i input_file -D database_name
gpexpand -d duration
In systems whose large scale requires multiple redistribution sessions, you may need to run gpexpand several more times to complete the expansion. For more information, see Planning Table Redistribution.
This checklist provides a quick overview of the steps required for a system expansion.
|Online Pre-Expansion Preparation (Perfrom these tasks when the system is up and available)|
|Offline Expansion Tasks|
|Online Expansion Tasks|
Careful preparation of new hardware for system expansion is extremely important. Deliberate and thorough deployment of compatible hardware can greatly minimize the risk of issues developing later in the system expansion process.
Pivotal recommends the following:
The steps to plan and set up new hardware platforms will vary greatly for each unique deployment. Some of the possible considerations include:
After selecting and adding new hardware to your network environment, make sure you perform the burn-in tasks described in Verifying OS Settings.
Expanding the HAWQ Database requires a limited period of system down time. During this time period, you must run gpexpand to initialize new segments into the array and create an expansion schema.
The time required will depend on the number of schema objects in the HAWQ system, as well as other factors related to hardware performance.
By default, new hosts are initialized with the same number of segments as existing hosts. Optionally, you can increase the number of segments per host, or add new segments only to existing hosts.
For example, if existing hosts currently have two segments per host, you can use gpexpand to initialize two additional segments on existing hosts (for a total of four), and four new segments on new hosts.
The interactive process for creating an expansion input file prompts for this option, and the input file format allows you to specify new segment directories manually as well. For more information, see Creating an Input File for System Expansion.
At initialization time, gpexpand creates an expansion schema. If you do not specify a particular database at initialization time (gpexpand -D), the schema is created in the database indicated by the PGDATABASE environment variable.
The expansion schema stores metadata for each table in the system, so that its status can be tracked throughout the expansion process. The expansion schema consists of two tables and a view for tracking the progress of an expansion operation:
The redistribution of tables is performed with the system online. For many HAWQ systems, table redistribution can be completed in a single gpexpand session scheduled during a low-use period. Larger systems may require you to plan multiple sessions and set the order of table redistribution, so as to minimize the performance impact.
Pivotal recommends completing the table redistribution in one session, if your database size and design permit it.
The performance impact of table redistribution depends on the size, storage type, and partitioning design of a table. Redistributing a table with gpexpand takes approximately as much time per table as a CREATE TABLE AS SELECT operation would take. When redistributing a terabyte-scale fact table, the expansion utility can use a significant portion of available system resources, with resulting impact on the performance of queries or other database workload items.
Append-only and compressed append-only tables are redistributed by gpexpand at different rates. The CPU capacity required to compress and decompress data tends to increase the impact on system performance. For similar-sized tables with similar-sized data, you may find overall performance differences, such as zlib-compressed append-only tables expanding at a significantly slower rate than uncompressed append-only tables (which can be potentially up to 80% slower).
Certain sequences of alter operations on tables could render such tables unalterable from a redistribution perspective. gpexpand does not support redistribution of unalterable tables. For example, if you have a table initially created with a column of user-defined types and the column is subsequently dropped, this table may qualify as unalterable. If gpexpand reports a table as unalterable, you need to redistribute the table manually. To do this, create a new table matching the schema of the unalterable table and execute the following statement:
INSERT INTO <new_table> SELECT * FROM <unalterable table>;
Because the expansion utility can process a large table partition-by-partition, an efficient partition design reduces the performance impact of table redistribution. Only the child tables of a partitioned table are set to a random distribution policy, and only one child partition table is unavailable during redistribution.
To prepare new system nodes for expansion, install the HAWQ software binaries, exchange the required SSH keys and run performance tests. Pivotal recommends running performance tests at least twice: first on the new nodes only, and then on both the new and existing nodes together. The second set of tests must be run with the system offline, to prevent user activity from distorting test results.
Beyond these general guidelines, Pivotal recommends running performance tests any time that the networking of nodes is modified, or for any special conditions in the system environment. For example, if you plan to run the expanded system on two network clusters, run the performance tests on each cluster.
This rest of this section describes how to run the HAWQ administrative utilities to verify that your new nodes are ready for integration into the existing HAWQ system.
New nodes must exchange SSH keys with the existing nodes to allow HAWQ administrative utilities to connect to all segments without a password prompt.
Pivotal recommends performing the key exchange process twice: once as root (for administration convenience) and once as the gpadmin user (required for the HAWQ management utilities). Perform the following tasks in this order:
For existing hosts, you can use the same host file that you used for the initial setup of SSH keys in the system.The files should include all hosts (master, backup master and segment hosts) and list one host name per line. If using a multi-NIC configuration, make sure to exchange SSH keys using all of the configured host names for a given host. Make sure there are no blank lines or extra spaces. For example:
mdw OR masterhost sdw1-1 seghost1 sdw1-2 seghost2 sdw1-3 seghost3 sdw1-4 sdw2-1 sdw2-2 sdw2-3 sdw2-4 sdw3-1 sdw3-2 sdw3-3 sdw3-4
$ su - # source /usr/local/hawq/greenplum_path.sh
# gpssh-exkeys -f /home/gpadmin/existing_hosts_file -x /home/gpadmin/new_hosts_file
***Enter password for root@hostname: <root_password>
# gpssh -f new_hosts_file '/usr/sbin/useradd gpadmin -d /home/gpadmin -s /bin/bash'
# gpssh -f new_hosts_file 'echo gpadmin_password | passwd gpadmin --stdin'
You must log in to each segment host and set the gpadmin user’s password on each host. For example:
# ssh <segment_hostname> # passwd gpadmin # New password: <gpadmin_password> # Retype new password: <gpadmin_password>
# gpssh -f new_hosts_file ls -l /home
Log in as gpadmin, and run the gpssh-exkeys utility, referencing the host list files. For example:
# gpssh-exkeys -e /home/gpadmin/existing_hosts_file -x /home/gpadmin/new_hosts_file
gpssh-exkeys will check the remote hosts and perform the key exchange between all hosts. Enter the gpadmin user password when prompted. For example:
***Enter password for gpadmin@hostname: <gpadmin_password>
Use the gpcheck utility to verify that all the new hosts in your array have the correct OS settings for running the HAWQ software.
To run gpcheck:
$ su - gpadmin
$ gpcheck -f new_hosts_file
Use the gpcheckperf utility to test disk I/O and memory bandwidth.
To run gpcheckperf:
$ gpcheckperf -f new_hosts_file -d /data1 -d /data2 -v
If your network is divided into subnets, repeat this procedure with a separate host file for each subnet.
Before initializing the system with all new segments, repeat the performance tests on all nodes in the system, new and existing. Shut down the system and run these same tests using host files that include all nodes, existing and new:
Because user activity may skew the results of these test, you must shut down HAWQ (gpstop) before running them.
This topic describes how to install the HAWQ components on the new segments created after running gpexpand.
If you have already installed the PL/R and pgcrypto packages on existing segments, use the following instructions to install these packages on the expanded segments:
mkdir plr mv plr*.tgz plr cd plr tar -xzf plr*.tgz
mkdir pgcrypto mv pgcrypto.tgz pgcrypto cd pgcrypto tar -xzf pgcrypto.tgz
./plr_install.sh -f ~/hostfile -x
./pgcrypto_install.sh -f ~/hostfile -x
If you have already installed PL/Java on existing segments, use the following instructions to install these packages on the expanded segments:
These instructions assume that you have a precompiled build of PL/Java from Pivotal.
./pljava_install.sh -f ~/hosts.txt
Where ~/hosts.txt is a text file containing hostnames of segment hosts in HAWQ deployment that are currently active. The file must contain one hostname per line.
source $GPHOME/greenplum_path.sh gpstop -ar
gpconfig -c custom_variable_classes -v \'pljava\'
If you have existing custom_variable_classes defined, prefix them with pljava in a comma-separated list.
psql -d <dbname> -c "CREATE LANGUAGE pljava"Run this command for every database where you want to install PL/Java.
cd $GPHOME/lib/postgresql/java gpscp -f ~/hosts.txt myfunc.jar =:$GPHOME/lib/postgresql/java/
gpconfig -c pljava_classpath -v \'myfunc.jar\'
The following steps assume that MADlib was installed and running before adding new nodes. If MADlib was not installed, you can install it using the instructions provided in HAWQ Installation and Upgrade.
hawq_install.sh -r <RPM_FILEPATH> -f <HOSTFILE>
Use the gpexpand utility to initialize the new segments, create the expansion schema, and set a system-wide random distribution policy for the database. The utility performs these tasks by default the first time you run it with a valid input file on the HAWQ master. Subsequently, it will detect that an expansion schema has been created, and perform table redistribution.
To begin expansion, the gpexpand utility requires an input file containing information about the new segments and hosts. If you run gpexpand without specifying an input file, the utility displays an interactive interview that collects the required information and automatically creates an input file for you.
If you choose to create the input file by using the interactive interview, you can optionally specify a file containing a list of expansion hosts. If your platform or command shell limits the length of the list of hostnames you are allowed enter when prompted in the interview, specifying the hosts with gpexpand -f (as shown below) could be mandatory.
Before running gpexpand to create an input file in interactive mode, make sure you have the following required information:
The utility automatically generates an input file based on this information and on the dbid, content ID, and data directory values stored in gp_segment_configuration and pg_filespace, then saves the file in the current directory.
To create an input file in interactive mode:
$ gpexpand -f /home/gpadmin/new_hosts_file
> sdw5, sdw6, sdw7, sdw8
After you have entered all required information, the utility generates an input file and saves it in the current directory. For example:
If your system has shared filesystem filespaces, gpexpand expects a filespace configuration file (input_file_name.fs) to exist in the same directory as your expansion configuration file. See User-defined Filespaces and gpexpand for more information.
This topic describes two scenarios:
When you initialize a new HAWQ cluster, it has 2 filespaces by default: pg_system and dfs_system (Lookup system tables pg_filespace & pg_filespace_entry).
Since HAWQ has these two default filespaces, the expansion utility expects corresponding filespaces for the new segments. gpexpand requests local filesystem paths for pg_system filespace, but auto-generates paths for shared filespace paths, to maintain consistency of paths between all the segment data directories
This means that one or more filespaces, other than the default, have been defined in the existing HAWQ system.
You can use the gpfilespace utility to add filespaces to your HAWQ system. User-defined filespaces always have a shared path.
Therefore, if you have one or more user-defined filespaces in your HAWQ system, gpexpand requests local filesystem paths for the pg_system filespace, but auto-generates paths for shared filespace paths so that it can maintain consistency of paths between all the segment data directories.
You can create your own input file in the required format. Unless you have special needs for your expansion scenario, Pivotal recommends creating the input file using the interactive interview process.
The format for the expansion input.fs file is:
filespaceOrder=filespace1_name :filespace2_name : ... dbid:/path/for/filespace1 :/path/for/filespace2 : ... dbid:/path/for/filespace1 :/path/for/filespace2 : ... ...
An expansion input file in this format requires the following information for each new segment:
|hostname||hostname||Hostname for the segment host|
|port||An available port number||Database listener port for the segment, incremented on the existing segment port base number.|
|fselocation||Directory name||The data directory (filespace) location for a segment as per the pg_filespace_entry system catalog.|
Must not conflict with existing dbid values.
|Database ID for the segment. The values
you enter should be incremented sequentially from existingdbid
values shown in the system catalog
For example, to add four nodes to an existing ten-segment array with dbid values of 1-10, list new dbid values of 11, 12, 13 and 14.
Must not conflict with existing content .
|The content ID of the segment. A
primary segment and its mirror should have the same content ID,
incremented sequentially from existing values.
For more information, see content in the reference for gp_segment_configuration.
|preferred_role||p||"p" (primary) is the only option.|
After you have created an input file, run gpexpand to initialize new segments. The utility will automatically stop HAWQ for the time required to initialize the segments, then restarts the system when finished.
To run gpexpand with an input file:
$ gpexpand -i input_file -D database1
The utility detects if there is an existing expansion schema for the HAWQ system. If there is an existing schema, you must remove it with gpexpand -c before beginning a new expansion operation. See Removing the Expansion Schema. When the new segments are initialized and the expansion schema is successfully created, the utility prints a success message and exits.
When the initialization process is complete, you can connect to HAWQ and view the expansion schema. The schema resides in the database you specified with -D, or in the database specified by the PGDATABASE environment variable. For more information, see About the Expansion Schema.
You can roll back a failed expansion setup operation by using the command gpexpand -r | --rollback. However, this command is only allowed in a failure scenario. Once a setup operation has completed successfully, the expansion is committed, and you cannot roll back.
To roll back a failed expansion setup, use the following command, specifying the database that contains the expansion schema:
gpexpand --rollback -D database_name
After successfully creating an expansion schema, you can bring HAWQ back online and redistribute tables across the entire array. You can redistribute tables with gpexpand at specified intervals, targeting low-use hours when the utility’s CPU usage and table locks will have the least impact on database operations. Also, you can rank tables to ensure that the largest or most critical tables are redistributed in your preferred order.
While the redistribution of tables is underway:
The table or partition currently being redistributed will be locked and unavailable for read or write operations. When its redistribution is completed, normal operations resume.
insert into <new_table> select * from <unalterable table>;
To redistribute tables with gpexpand:
$ gpexpand -d 60:00:00
The utility redistributes tables until the last table in the schema is successfully marked completed, or until the specified duration or end time is reached. Each time a session is started or finished, the utility updates the status and updated time in gpexpand.status.
At any time during the process of redistributing tables, you can query the expansion schema. The view gpexpand.expansion_progress provides a summary of the current progress, including calculations of the estimated rate of table redistribution and estimated time to completion. The table gpexpand.status_detail can be queried for per-table status information.
Because the estimates in gpexpand.expansion_progress are based on the rates achieved for each table, the view cannot calculate an accurate estimate until the first table has completed. Calculations are restarted each time you re-run gpexpand to start a new table redistribution session.
=# select * from gpexpand.expansion_progress; name | value ------------------------------+----------------------- Bytes Left | 5534842880 Bytes Done | 142475264 Estimated Expansion Rate | 680.75667095996092 MB/s Estimated Time to Completion | 00:01:01.008047 Tables Expanded | 4 Tables Left | 4 (6 rows)
The table gpexpand.status_detail stores status, last updated time, and other useful information about each table in the schema. To monitor the status of a particular table by querying gpexpand.status_detail, connect to HAWQ using psql or another supported client. Query gpexpand.status_detail with a command similar to the following:
=> SELECT status, expansion_started, source_bytes FROM gpexpand.status_detail WHERE fq_name = ‘public.sales’; status | expansion_started | source_bytes -----------+----------------------------+------------------------------------ COMPLETED | 2009-02-20 10:54:10.043869 | 4929748992 (1 row)
The expansion schema can safely be removed after the expansion operation is completed and verified. To run another expansion operation on a HAWQ system, you must first remove the existing expansion schema.
To remove the expansion schema:
$ gpexpand -c