Force use of fork in multiprocessing

From Tomasz Balawajder:
"Since we are using a Java service to launch the Python process, its behavior differs from running the script directly on the cluster.

By default, Dask uses fork() to create worker processes. However, when running under the JVM, the start method defaults to spawn, which does not share memory between processes. This caused the slowdown and unexpected behavior.

I’ve forced Python to use fork() in the configuration, and now the application completes in the same time as when executed with sbatch."
This commit is contained in:
2025-09-23 11:41:08 +02:00
parent fe9d886499
commit deb7005604

View File

@@ -69,6 +69,7 @@ def main(catalog_file, mc_file, pdf_file, m_file, m_select, mag_label, mc, m_max
from matplotlib.contour import ContourSet from matplotlib.contour import ContourSet
import xml.etree.ElementTree as ET import xml.etree.ElementTree as ET
import json import json
import multiprocessing as mp
logger = getDefaultLogger('igfash') logger = getDefaultLogger('igfash')
@@ -448,9 +449,10 @@ verbose: {verbose}")
start = timer() start = timer()
use_pp = False use_pp = True
if use_pp: # use dask parallel computing if use_pp: # use dask parallel computing
mp.set_start_method("fork", force=True)
pbar = ProgressBar() pbar = ProgressBar()
pbar.register() pbar.register()
iter = indices iter = indices