Skip to content

Commit 1357d2b

Browse files
committed
Update memory profiling and scripts
1 parent d147cd0 commit 1357d2b

7 files changed

Lines changed: 71 additions & 32 deletions

File tree

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -203,3 +203,6 @@ skbuild/*
203203

204204
# SLURM logs
205205
*dtsc*
206+
207+
# Backup files
208+
*.bak

README.md

Lines changed: 26 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,6 @@ Names: Gaspare Li Causi, Lorenzo Tomada
22

33
44

5-
# TODO:
6-
- documentation in the script folder (explain what we are doing)
7-
- run using ulysses
8-
- find a way to import functions in memory profiling
9-
105
# Introduction
116
This repository contains the final project for the course in Development Tools in Scientific Computing.
127

@@ -36,7 +31,7 @@ In order to solve an eigenvalue problem, we considered multiple strategies.
3631
1. The most trivial one was to implement the power method in order to be able to compute (at least) the biggest eigenvalue. We then used `numba` to try and optimize it, but in this case just-in-time compilation was not extremely beneficial.The implementation of the power method is contained in `eigenvalues.py`.
3732
2. Lanczos + QR: this is an approach (tailored to the case of symmetric matrices) to compute *all* the eigenvalues and eigenvectors. Notice that, also in the case of the QR method,`numba` was not very beneficial in terms of speed-up, resulting in a pretty slow methodology. For this reason, we implemented the QR method in `C++` and used `pybind11` to expose it to `Python`. All the code written in `C++` can be found in `cxx_utils.cpp`.
3833
3. `CuPy` implementation of all of the above: we implemented all the above methodologies using `CuPy` to see whether using GPU could speed up computations. Since this was not the case, we commented all the lines of code involving `CuPy`, so that installation of the package is no longer required and we can use our code also on machines that do not have GPU.
39-
4. The core of the project is the implementation (as well as a generalization of the simplified case in which $\rho=1$ considered in our reference) of the _divide et implera_ method for the computation of eigenvalues of a symmetric matrix. Some helpers were originally written in `Python` and then translated to `C++` for efficiency reasons: their original implementation is in `zero_finder.py` and is still present in the project for testing purposes. The translated version can be found in `cxx_utils.cpp`. Instead, the implementation of the actual method to compute the eigenvalues starting from a tridiagonal matrix is contained in `parallel_tridiag_eigen.py` and makes use of `mpi4py`.
34+
4. The core of the project is the implementation (as well as a generalization of the simplified case in which $\rho=1$ considered in our reference) of the _divide et implera_ method for the computation of eigenvalues of a symmetric matrix. Some helpers were originally written in `Python` and then translated to `C++` for efficiency reasons: their original implementation is in `zero_finder.py` and is still present in the project for testing purposes. The translated version can be found in `cxx_utils.cpp`. Instead, the implementation of the actual method to compute the eigenvalues starting from a tridiagonal matrix is contained in `parallel_tridiag_eigen.py` and makes use of `mpi4py`. Notice that the implementation of deflation in `cxx_utils.cpp` is done using the `Eigen` library.
4035

4136
# Results
4237
The results of the profiling (runtime vs matrix size, memory consumption, scalability, and so on) are discussed in detail in `Documentation.ipynb`.
@@ -54,16 +49,34 @@ It is also possible to provide paths to other configuration files by passing the
5449
Notice that the script is *not* called using `mpirun`, but internally it uses MPI.
5550
This is done by spawning a communicator inside the script.
5651

57-
In addition, in the `shell` folder, we provide a `submit.sbatch` file to run using `SLURM`, as well as a `submit.sh` to run the same experiment locally.
58-
These two files perform memory profiling.
52+
In addition, in the `shell` folder, we provide a `submit.sbatch` file to run using `SLURM`, as well as a `submit.sh`.
53+
They are used to perform memory profiling.
54+
55+
The `submit.sbatch` file is supposed to be used on Ulysses (or any other cluster using `SLURM`).
56+
It is supposed to show how to send a job (in which our package is emplyed) using `SLURM`.
57+
Notice, however, that due to Ulysse's problems with `MPI` the profiling for
58+
As a result, we also provide `submit.sh`, which is supposed to be run on a workstation.
59+
It executes `mpirun -np [n_procs] python scripts/profile_memory.py`, basically doing the same as the `submit.sbatch` script, but without using `SLURM`.
60+
Notice that it assumes that `shell/load_modules.sh` has already been executed (see the next section).
61+
62+
We also remark that the script to perform memory profiling `scripts/profile_memory.py` does not spam an `MPI` communicator, but is supposed to be called using `mpirun`. The reason for that is to provide a more extensive list of examples of how our package can be used.
5963

60-
# To install using Ulysses:
64+
Notice that it is possible that `scripts/mpi_running.py` will not run on systems using `SLURM` due to the fact that we are using a specific way to spawn an `MPI` communicator.
65+
Nevertheless, the package still works: as done in `scripts/profile_memory.py`: it sufficies to run a file that can be used in combination with `mpirun` or `srun`.
66+
67+
# How to install:
68+
If you are using Ulysses or a SISSA workstation, it is likely that you will need to load a couple of modules to be able to install the package.
69+
The exact modules change according to the device you are currently using, but it is sufficient that you have `CMake`, `gcc` and `OpenMPI`.
70+
71+
To streamline the installation process, we provide the script `shell/load_modules.sh`.
72+
This script loads the modules that are required on Ulysses/my workstation (according to the flag that is passed).
73+
To use it, run:
6174
```bash
62-
source shell/load_modules.sh
75+
source shell/load_modules.sh Ulysses # or source shell/load_modules.sh workstation
6376
```
64-
The previous line will load CMake and gcc. Both are needed to compile the project.
65-
In addition, it will enable the istallation of `mpi4py`.
66-
After that, you can just write
77+
The previous line will allow the istallation of `mpi4py` and the automatic compilation of the `C++` source file used in the project.
78+
79+
Once the needed modules are loaded, you can regularly install via `pip` using the following command:
6780
```bash
6881
python -m pip install .
6982
```

experiments/config.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
dim: 10
1+
dim: 100
22
density: 0.2
3-
n_processes: 1
4-
plot: false
3+
n_processes: 2
4+
plot: false

scripts/profiling_memory.py

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@
4747
kwargs = comm.bcast(kwargs, root=0)
4848
dim = kwargs["dim"]
4949
density = kwargs["density"]
50-
n_procs = kwargs["n_processes"]
50+
n_procs = size # kwargs["n_processes"]
5151
plot = kwargs["plot"]
5252

5353
# Now we build the matrix on rank 0
@@ -61,7 +61,7 @@
6161
A_np = comm.bcast(A_np, root=0)
6262

6363

64-
# On rank 0, we use the Lanczos method
64+
# On rank 0, we use the Lanczos method.
6565
# We actually call it twice: the first time to ensure that the function is JIT-compiled by Numba, the second one for memory profiling
6666
if rank == 0:
6767
print("Precompiling Lanczos...")
@@ -80,6 +80,7 @@
8080
else:
8181
diag = off_diag = None
8282

83+
# Now we broadcast diag and off_diag to all other ranks so we can use parallel_tridiag_eigen
8384
diag = comm.bcast(diag, root=0)
8485
off_diag = comm.bcast(off_diag, root=0)
8586

@@ -97,19 +98,23 @@
9798

9899
total_mem_children = comm.reduce(delta_mem, op=MPI.SUM, root=0)
99100

101+
# Collect the information across all ranks
100102
if rank == 0:
103+
print(f"########################## SIZE = {size} #####################")
101104
total_mem_all = delta_mem_lanczos
102105
print("Eigenvalues computed.")
103106
process = psutil.Process()
104107

105108
print(f"Total memory across all processes: {total_mem_all:.2f} MB")
106109

110+
# We also profile numpy and scipy memory consumption
107111
mem_np = profile_numpy_eigvals(A_np)
108112
print(f"NumPy eig memory usage: {mem_np:.2f} MB")
109113

110114
mem_sp = profile_scipy_eigvals(A_np)
111115
print(f"SciPy eig memory usage: {mem_sp:.2f} MB")
112116

117+
# Save to the logs folder
113118
os.makedirs("logs", exist_ok=True)
114119
log_file = "logs/memory_profile.csv"
115120
fieldnames = [
@@ -140,6 +145,8 @@
140145
)
141146

142147
if plot:
148+
# We only plot if all the runs have been done already. In this way, we get a complete memory usage graph.
149+
143150
import matplotlib.pyplot as plt
144151
import pandas as pd
145152

@@ -194,5 +201,5 @@
194201
)
195202
plt.subplots_adjust(right=0.75)
196203

197-
plt.savefig("logs/mem_vs_size_all_methods.png", bbox_inches="tight")
204+
plt.savefig("logs/memory_profiling.png", bbox_inches="tight")
198205
plt.show()

shell/load_modules.sh

Lines changed: 21 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,22 @@
11
#!/bin/bash
2-
module load cmake/3.29.1
3-
module load intel/2021.2
4-
module load openmpi3/3.1.4
2+
3+
# Usage: ./setup_modules.sh [env]
4+
# where [env] can be: Ulysses, workstation
5+
6+
env_arg="$1"
7+
8+
if [[ "$env_arg" == "Ulysses" ]]; then
9+
echo "Loading modules for Ulysses cluster..."
10+
module load cmake/3.29.1
11+
module load intel/2021.2
12+
module load openmpi3/3.1.4
13+
14+
elif [[ "$env_arg" == "2" || "$env_arg" == "workstation" ]]; then
15+
echo "Loading modules for local workstation..."
16+
module load intel/2022.2.1
17+
module load openmpi4/4.1.4
18+
19+
else
20+
echo "Usage: $0 [Ulysses|workstation]"
21+
exit 1
22+
fi

shell/submit.sbatch

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,10 @@
33
#SBATCH --partition=regular1
44
#SBATCH --job-name=dtsc
55
#SBATCH --nodes=1
6-
#SBATCH --ntasks=4
6+
#SBATCH --ntasks=8
77
#SBATCH --cpus-per-task=1
88
#SBATCH --mem=10000
9-
#SBATCH --time=06:00:00
9+
#SBATCH --time=01:00:00
1010
1111
#SBATCH --output=%x.o%j.%N
1212
#SBATCH --error=%x.e%j.%N
@@ -22,7 +22,7 @@ echo '------------------------------------------------------'
2222
# ==== End of Info part (say things) ===== #
2323
#
2424

25-
cd $SLURM_SUBMIT_DIR
25+
cd $SLURM_SUBMIT_DIR
2626

2727
module load cmake/3.29.1
2828
module load intel/2021.2
@@ -32,8 +32,8 @@ conda init
3232
conda activate devtools_scicomp
3333

3434
# Ranges over which we iterate
35-
n_processes=(1 2 4)
36-
matrix_sizes=(10 15 20)
35+
n_processes=(1 2 4 8)
36+
matrix_sizes=(10 50 100 500 1000)
3737

3838
last_dim="${matrix_sizes[-1]}"
3939
last_nproc="${n_processes[-1]}"
@@ -51,7 +51,6 @@ for dim in "${matrix_sizes[@]}"; do
5151
echo "------------------"
5252

5353
sed -i "s/^dim: .*/dim: $dim/" $CONFIG_FILE
54-
sed -i "s/^n_processes: .*/n_processes: $n_p/" $CONFIG_FILE
5554
sed -i "s/^plot: .*/plot: false/" $CONFIG_FILE
5655
echo "Running with size=$dim and n_processes=$n_p"
5756

@@ -62,7 +61,7 @@ for dim in "${matrix_sizes[@]}"; do
6261
sed -i "s/^plot: .*/plot: true/" $CONFIG_FILE
6362
fi
6463

65-
python scripts/profiling_memory.py
64+
srun --mpi=openmpi -n ${n_p} python scripts/profiling_memory.py
6665
done
6766
done
6867

shell/submit.sh

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
#!/bin/bash
22

33
# Ranges over which we iterate
4-
n_processes=(1 2)
5-
matrix_sizes=(10 15 20)
4+
n_processes=(1 2 4 8)
5+
matrix_sizes=(10 50 100 500 1000)
66

77
last_dim="${matrix_sizes[-1]}"
88
last_nproc="${n_processes[-1]}"
@@ -20,7 +20,6 @@ for dim in "${matrix_sizes[@]}"; do
2020
echo "------------------"
2121

2222
sed -i "s/^dim: .*/dim: $dim/" $CONFIG_FILE
23-
sed -i "s/^n_processes: .*/n_processes: $n_p/" $CONFIG_FILE
2423
sed -i "s/^plot: .*/plot: false/" $CONFIG_FILE
2524
echo "Running with size=$dim and n_processes=$n_p"
2625

0 commit comments

Comments
 (0)