|
743 | 743 | "source": [ |
744 | 744 | "# Profiling of Divide and Conquer (memory and time)\n", |
745 | 745 | "We discuss the results of the profiling of the Divide et Impera algorithm.\n", |
746 | | - "Profiling is performed using the `submit.sh` file in the `shell` folder, which internally \n", |
| 746 | + "Profiling is performed using the `submit.sh` file in the `shell` folder, which internally calls `scripts/profiling_memory_and_time.py`. \n", |
747 | 747 | "We begin by discussing the memory consumption of the method, studying how it varies with respect to the matrix size and number of processes, and comparing it to `numpy`'s and `scipy`'s `eig` built-in function.\n", |
748 | 748 | "\n", |
749 | 749 | "*IMPORTANT*: please notice that we did not use `scipy.sparse`'s solver as it cannot be used to retrieve all the eigenvalues, which would have make the comparison unfair.\n", |
750 | 750 | "\n", |
751 | 751 | "\n", |
752 | 752 | "\n", |
753 | | - "It is possible to see that cumulative memory consumption does not really depend on the number of processes, and that for low values of $n$ it behaves better than `numpy` and `scipy`, while performance degradates for high values of $n$.\n", |
| 753 | + "It is possible to see that cumulative memory consumption increases as the number of processes does.\n", |
754 | 754 | "\n", |
755 | 755 | "Now we do the same for runtime vs matrix size and number of processes.\n", |
756 | 756 | "\n", |
757 | | - "\n", |
| 757 | + "\n", |
758 | 758 | "\n", |
759 | | - "Based on this plot, we would be tempted to say that not only the execution time is much bigger that it is for `numpy` and `scipy`, but it might also seem that our method does not scale with respect to the number of processes.\n", |
| 759 | + "Based on the previous plot, we would be tempted to say that not only the execution time is much bigger that it is for `numpy` and `scipy`, but also that our method does not scale with respect to the number of processes.\n", |
760 | 760 | "However, running a single time the file `shell/time_profile.sh`, we notice that this is likely a problem related to how `time.time()` saves the results.\n", |
761 | 761 | "\n", |
762 | 762 | "Running, for instance,\n", |
|
765 | 765 | "```\n", |
766 | 766 | "we get the following results:\n", |
767 | 767 | "```mermaid\n", |
768 | | - "Some results\n", |
| 768 | + "[D&I] Total execution time: 0.3199 s\n", |
| 769 | + "[NumPy] Total execution time: 0.0388 s\n", |
| 770 | + "[SciPy] Total execution time: 0.0690 s\n", |
769 | 771 | "```\n", |
770 | 772 | "Re-running with `n_procs=2`, we obtain\n", |
771 | 773 | "```mermaid\n", |
772 | | - "Even more results\n", |
| 774 | + "[D&I] Total execution time: 0.2230 s\n", |
| 775 | + "[NumPy] Total execution time: 0.8741 s\n", |
| 776 | + "[SciPy] Total execution time: 0.0364 s\n", |
773 | 777 | "```\n", |
774 | 778 | "Finally, for `n_procs=4`, we obtain\n", |
775 | 779 | "```mermaid\n", |
776 | | - "Final results\n", |
| 780 | + "[D&I] Total execution time: 0.1842 s\n", |
| 781 | + "[NumPy] Total execution time: 0.0768 s\n", |
| 782 | + "[SciPy] Total execution time: 0.0362 s\n", |
777 | 783 | "```\n", |
| 784 | + "(notice that there is some variance in the times taken by the other two methods as a result of the fact that `time.time()` is not extremely robust).\n", |
| 785 | + "\n", |
778 | 786 | "The previous results suggest that the method scales well with the number of processes, and that the performance (while worse than `numpy` and `scipy`) is such that the comparison goes much better than it seemed to do earlier.\n", |
779 | 787 | "We believe that the reason for such a behavior is related to the execution of multiple scripts, which can have an impact on execution times as measured with `time.time()`.\n", |
780 | 788 | "\n", |
781 | 789 | "Notice that we parallelized everything that could be parallelized (except for the secular solver, which usually takes no more than $5\\%$ of the total time): the bottleneck is given by the Lanczos method, which cannot be parallelized.\n", |
782 | | - "If the Lanczos method is not needed (that is, if the matrix $A$ of which we want to compute the eigenvalues and eigenvectors is already tridiagonal), then the execution time of our solver becomes comparable to the one of `numpy` and `scipy`." |
| 790 | + "If the Lanczos method is not needed (that is, if the matrix $A$ of which we want to compute the eigenvalues and eigenvectors is already tridiagonal), then the execution time of our solver becomes comparable to the one of `numpy` and `scipy`.\n", |
| 791 | + "\n", |
| 792 | + "*Remark*: in the plot used to profile execution times, the Lanczos method takes bigger values than D&I when just one process is used.\n", |
| 793 | + "Of course this is not possible, since D&I includes Lanczos.\n", |
| 794 | + "However, the value that we plot for all the functions not depending on `n_procs` (including the ones of `numpy` and `scipy` and Lanczos) is the average across all the runs with different numbers of processes.\n", |
| 795 | + "As a result, similar to what was remarked earlier for `numpy`'s eigenvalues solver, the execution time for large values of `n_procs` seems to increase, causing the average to become bigger, eventually getting bigger than D&I. \n", |
| 796 | + "However, notice that also this time running a single simulation with `shell/time_profile.sh` tells us that this is not truly the case, and that the execution time of the Lanczos algorithm remains pretty much the same as the number of processes increases. " |
783 | 797 | ] |
784 | 798 | }, |
785 | 799 | { |
|
0 commit comments