Example Parallelized Workflow

This notebook demonstrates how GDTchron uses a parallelized workflow to process large numbers of time-temperature paths quickly. It creates a suite of pseudo-random t-T paths and predicts AHe ages from them using multiple processors, with the number of paths and processors set by the processors available on the system used. The saved output is from a test on a 64 core node on the Tufts HPC Cluster.

[1]:
# Imports
import os
import time

import cmcrameri.cm as cmc
import matplotlib.pyplot as plt
import numpy as np

import gdtchron as gdt

plt.rcParams['pdf.fonttype'] = 42

The cell below uses the CPU count of the current system to design a test with the number of cores available on the system.

[2]:
# Set up number of processesors
max_cores = int(os.cpu_count() / 2)
print('Maximum cores: ', max_cores)
core_list = [int(2**x) for x in range(max_cores) if 2**x <= max_cores]
print('Cores to run for scaling test: ', core_list)
Maximum cores:  64
Cores to run for scaling test:  [1, 2, 4, 8, 16, 32, 64]

The cell below determines the number of paths to run based on the maximum avaialable cores. The number of paths is variable so that the scaling test is as short as possible while still ensuring that performance improvements will still occur at the highest number of processors. The cell then generates the requisite number of paths by pseudo-randomly picking temperatures between 0°C and 75°C for each 100,000 years from 40 Ma to the present.

[3]:
# Set number of paths based on max cores
n_paths = max_cores * 1500
print('Number of t-T paths: ', n_paths)

# Generate list of pseudo-random t-T paths between 0 and 75 C over 40 Myr
total_time = 40  # Myr
interval = 0.1  # Myr
n_steps = int(total_time / interval) + 1
tsteps = np.arange(total_time, 0 - interval, -interval)  # Myr

rng = np.random.default_rng(17)
temp_paths = [rng.uniform(low=273, high=348, size=n_steps) for x in range(n_paths)]
Number of t-T paths:  96000

The cell below iterates through the core list and runs the t-T paths through GDTchron to predict AHe ages while logging the time it takes to process all paths.

[4]:
# Run through AHe
times = []
for cores in core_list:
    print(cores, ' cores')
    start = time.time()
    ages = gdt.run_tt_paths(temp_paths=temp_paths, tsteps=tsteps, system='AHe',
                            batch_size=100, processes=cores)
    end = time.time()
    elapsed = end - start
    times.append(elapsed)
    print(cores, ': ', elapsed, ' seconds\n')
1  cores
100%|█████████████████████████████████████████████████████████████████████████████| 96000/96000 [25:34<00:00, 62.57it/s]
1 :  1534.25910282135  seconds

2  cores
100%|████████████████████████████████████████████████████████████████████████████| 96000/96000 [12:43<00:00, 125.81it/s]
2 :  763.2714729309082  seconds

4  cores
100%|████████████████████████████████████████████████████████████████████████████| 96000/96000 [06:27<00:00, 247.82it/s]
4 :  388.79385447502136  seconds

8  cores
100%|████████████████████████████████████████████████████████████████████████████| 96000/96000 [03:11<00:00, 501.97it/s]
8 :  192.71459794044495  seconds

16  cores
100%|███████████████████████████████████████████████████████████████████████████| 96000/96000 [01:35<00:00, 1007.28it/s]
16 :  97.4874198436737  seconds

32  cores
100%|███████████████████████████████████████████████████████████████████████████| 96000/96000 [00:52<00:00, 1839.81it/s]
32 :  54.79542279243469  seconds

64  cores
100%|███████████████████████████████████████████████████████████████████████████| 96000/96000 [00:33<00:00, 2896.26it/s]
64 :  36.645010232925415  seconds

The cell below saves the core list and ages as arrays for re-use so that plotting can take place without needing to re-run the scaling test.

[5]:
# Create array and write to disk for reuse
output = np.array([core_list, times])
np.save('scaling.npy', output)

age_array = np.array(ages)
np.save('scaling_ages.npy', age_array)

The cell below reloads the saved arrays if the scaling test above was run in a previous session.

[4]:
# Reload array
output = np.load('scaling.npy')
core_list = output[0]
times = output[1]

ages = np.load('scaling_ages.npy')

The cell below just sets the color used for plotting the results and the name of the device used.

[5]:
# Color and Device
ahe_color = cmc.batlowS.colors[6]
device = 'Tufts HPC Node (Dual Intel Xeon Gold 6448Y)'

The cell below plots the scaling results along with a histogram showing the ages predicted from the pseudo-random t-T paths.

[6]:
# Plot Scaling Results
fig, axs = plt.subplots(1, 2, dpi=300, figsize=(7, 3.5))

axs[0].plot(core_list, times, c=ahe_color, label=device)
axs[0].loglog()
axs[0].set_xlabel('Cores', fontsize=8)
axs[0].set_ylabel('Time (s)', fontsize=8)
axs[0].set_title('AHe Scaling Test', fontsize=10)

ref_x = np.arange(1, max(core_list), 1).astype(float)
ref_y = max(times) * ref_x**-1
axs[0].plot(ref_x, ref_y, c='grey', linestyle='--', label='Idealized Scaling Behavior')
axs[0].legend(fontsize=6)
axs[0].set_xticks(core_list)
axs[0].set_xticklabels(core_list)

# Plot age results
hist_range = (6, 24)
axs[1].hist(ages, color=ahe_color, edgecolor='black', range=hist_range, bins=9)
axs[1].set_title('Age Results', fontsize=10)
axs[1].set_xlabel('AHe Age (Ma)', fontsize=8)
axs[1].set_ylabel('Frequency', fontsize=8)
axs[1].set_xlim(hist_range)

for k, ax in enumerate(axs):
    ax.tick_params(axis='both', labelsize=6)
    ax.tick_params(which='minor', labelbottom=False, labelleft=False)
    ax.annotate(chr(97 + k) + ')', (-0.1, 1.1), xycoords='axes fraction', fontsize=12)

plt.tight_layout()
fig.savefig('scaling_test.pdf')

../_images/notebooks_scaling_test_15_0.png
[ ]: