Contents

This utility function reports the number and type of GPUs attached to hosts on the database cluster.

This can be useful when determining which segments to use for training deep neural nets. For example, for economic reasons you may wish to set up a heterogeneous clusters with GPUs only on some of the hosts, not all of them. This utility can help you identify where the GPUS are and direct the compute to those locations only for model training.

GPU Confuguration

gpu_configuration( output_table,
                   source
                   )

Arguments

output_table

TEXT. Name of the output table to write out the GPU information.

source (optional)

TEXT, default: 'tensorflow'. Source for determining GPU configuration. Using 'tensorflow' returns a description based on what TensorFlow reports. Using 'nvidia' returns a description based on what the Nvidia Systems Management Interface (nvidia-smi) reports [1]. Note that MADlib and Keras will use the TensorFlow information; the lower level nvidia-smi info is provided for convenience.

Note: If the command 'nvidia-smi -L' returns an error, we do not pass through the error message, but instead will show no GPUs for that host in the output table. You may want to run nvidia-smi from the command line to see error and informational messages.

Output
The output table contains the following:

hostname	TEXT. Name of the host machine in the cluster. Does not include master or mirrors. For PostgreSQL this will always return 'localhost'.
gpu_descr	TEXT. String reported by TensorFlow or nvidia-smi. The formats are different and shown in the examples below.

Examples

Get GPU configuration as per TensorFlow:

DROP TABLE IF EXISTS host_gpu_mapping_tf;
SELECT * FROM madlib.gpu_configuration('host_gpu_mapping_tf');
SELECT * FROM host_gpu_mapping_tf ORDER BY hostname, gpu_descr;

 hostname |                                        gpu_descr
----------+------------------------------------------------------------------------------------------
 phoenix0 | device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0
 phoenix0 | device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:05.0, compute capability: 6.0
 phoenix0 | device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:06.0, compute capability: 6.0
 phoenix0 | device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:07.0, compute capability: 6.0
 phoenix1 | device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0
 phoenix1 | device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:05.0, compute capability: 6.0
 phoenix3 | device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0
 phoenix3 | device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:05.0, compute capability: 6.0
 phoenix3 | device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:06.0, compute capability: 6.0
 phoenix3 | device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:07.0, compute capability: 6.0
 phoenix4 | device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0
 phoenix4 | device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:05.0, compute capability: 6.0
 phoenix4 | device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:06.0, compute capability: 6.0
 phoenix4 | device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:07.0, compute capability: 6.0
(14 rows)

In this heterogeneous cluster there are 4 GPUs attached to hosts 0, 3 and 4. There are 2 GPUs attached to host 1 and no GPUs attached to host 2.

Get GPU configuration as per nvidia-smi:

DROP TABLE IF EXISTS host_gpu_mapping_nvidia;
SELECT * FROM madlib.gpu_configuration('host_gpu_mapping_nvidia', -- output table
                                       'nvidia'                   -- source for GPU info
                                       );
SELECT * FROM host_gpu_mapping_nvidia ORDER BY hostname, gpu_descr;

 hostname |                                  gpu_descr
----------+------------------------------------------------------------------------------
 phoenix0 | GPU 0: Tesla P100-PCIE-16GB (UUID: GPU-f2ccc77e-2501-f6ee-4754-069dda256fb2)
 phoenix0 | GPU 1: Tesla P100-PCIE-16GB (UUID: GPU-b1fc40ca-c7c6-bc86-f20f-6e9a62cda3f8)
 phoenix0 | GPU 2: Tesla P100-PCIE-16GB (UUID: GPU-d93bb21b-96f9-7c1d-3bab-cdd92b7bbc9d)
 phoenix0 | GPU 3: Tesla P100-PCIE-16GB (UUID: GPU-2d79c4a8-479e-2f33-39f8-3ba80b63f830)
 phoenix1 | GPU 0: Tesla P100-PCIE-16GB (UUID: GPU-0af6bb1e-5b5b-4988-ad3a-a917e9584702)
 phoenix1 | GPU 1: Tesla P100-PCIE-16GB (UUID: GPU-d824c976-a8aa-ef26-a13c-9a9a7fe86bfd)
 phoenix3 | GPU 0: Tesla P100-PCIE-16GB (UUID: GPU-3681d0b6-1ec6-0453-fd81-29d88e549cd9)
 phoenix3 | GPU 1: Tesla P100-PCIE-16GB (UUID: GPU-d4b1f2e7-b238-ac9a-bbfe-918adeb69472)
 phoenix3 | GPU 2: Tesla P100-PCIE-16GB (UUID: GPU-42a32ef1-a60c-e599-c8cf-0e669111ab6f)
 phoenix3 | GPU 3: Tesla P100-PCIE-16GB (UUID: GPU-1cce09c4-6856-8031-be0b-8e8bbf9a10f3)
 phoenix4 | GPU 0: Tesla P100-PCIE-16GB (UUID: GPU-a71bdc18-fdd5-ba25-617e-19b23cc8e827)
 phoenix4 | GPU 1: Tesla P100-PCIE-16GB (UUID: GPU-f9d13688-7fe6-a029-24d1-985a5659f18f)
 phoenix4 | GPU 2: Tesla P100-PCIE-16GB (UUID: GPU-06a7f54b-c07a-e87a-20d6-09bd99b19531)
 phoenix4 | GPU 3: Tesla P100-PCIE-16GB (UUID: GPU-af3b32f3-8bd9-cb75-a8fb-25253b9da926)
(14 rows)

To get a fuller picture at the segment level, combine with the Greenplum catalog table 'gp_segment_configuration' which contains information about segment instance configuration [2]. Here is an example of this table filtering out master and mirrors:

SELECT * FROM gp_segment_configuration WHERE role='p' AND content>=0 ORDER BY hostname, dbid;

 dbid | content | role | preferred_role | mode | status | port  | hostname | address  | replication_port
------+---------+------+----------------+------+--------+-------+----------+----------+------------------
    2 |       0 | p    | p              | c    | u      | 40000 | phoenix0 | phoenix0 |            70000
    3 |       1 | p    | p              | c    | u      | 40001 | phoenix0 | phoenix0 |            70001
    4 |       2 | p    | p              | c    | u      | 40002 | phoenix0 | phoenix0 |            70002
    5 |       3 | p    | p              | c    | u      | 40003 | phoenix0 | phoenix0 |            70003
    6 |       4 | p    | p              | c    | u      | 40000 | phoenix1 | phoenix1 |            70000
    7 |       5 | p    | p              | c    | u      | 40001 | phoenix1 | phoenix1 |            70001
    8 |       6 | p    | p              | c    | u      | 40002 | phoenix1 | phoenix1 |            70002
    9 |       7 | p    | p              | c    | u      | 40003 | phoenix1 | phoenix1 |            70003
   10 |       8 | p    | p              | c    | u      | 40000 | phoenix2 | phoenix2 |            70000
   11 |       9 | p    | p              | c    | u      | 40001 | phoenix2 | phoenix2 |            70001
   12 |      10 | p    | p              | c    | u      | 40002 | phoenix2 | phoenix2 |            70002
   13 |      11 | p    | p              | c    | u      | 40003 | phoenix2 | phoenix2 |            70003
   14 |      12 | p    | p              | c    | u      | 40000 | phoenix3 | phoenix3 |            70000
   15 |      13 | p    | p              | c    | u      | 40001 | phoenix3 | phoenix3 |            70001
   16 |      14 | p    | p              | c    | u      | 40002 | phoenix3 | phoenix3 |            70002
   17 |      15 | p    | p              | c    | u      | 40003 | phoenix3 | phoenix3 |            70003
   18 |      16 | p    | p              | c    | u      | 40000 | phoenix4 | phoenix4 |            70000
   19 |      17 | p    | p              | c    | u      | 40001 | phoenix4 | phoenix4 |            70001
   20 |      18 | p    | p              | c    | u      | 40002 | phoenix4 | phoenix4 |            70002
   21 |      19 | p    | p              | c    | u      | 40003 | phoenix4 | phoenix4 |            70003
(20 rows)

Now join this table with the GPU resources table to create a table containing a list of all segments on hosts with GPUs attached:

DROP TABLE IF EXISTS segments_to_use;
CREATE TABLE segments_to_use AS
  SELECT DISTINCT dbid, hostname FROM gp_segment_configuration JOIN host_gpu_mapping_tf USING (hostname)
  WHERE role='p' AND content>=0;
SELECT * FROM segments_to_use ORDER BY hostname, dbid;

 dbid | hostname
------+----------
    2 | phoenix0
    3 | phoenix0
    4 | phoenix0
    5 | phoenix0
    6 | phoenix1
    7 | phoenix1
    8 | phoenix1
    9 | phoenix1
   14 | phoenix3
   15 | phoenix3
   16 | phoenix3
   17 | phoenix3
   18 | phoenix4
   19 | phoenix4
   20 | phoenix4
   21 | phoenix4
(16 rows)

References

[1] Nvidia Systems Management Interface (nvidia-smi) https://developer.nvidia.com/nvidia-system-management-interface

[2] Greenplum 'gp_segment_configuration' table https://gpdb.docs.pivotal.io/latest/ref_guide/system_catalogs/gp_segment_configuration.html

Related Topics

See madlib_keras_gpu_info.sql_in