본문 바로가기

Coding/Machine Learning

Example : Multiprocessing with shared large numpy array in Jupyter, Windows 10

In mmd.py:

def pool_init(image_base, image_shape):
    global image
    image = np.ctypeslib.as_array(image_base.get_obj())
    image = image.reshape(*image_shape)
    
def pool_job(row_number):
    image[row_number, :] = row_number

 

In notebook:

import numpy as np
from mmd import pool_init, pool_job
from multiprocessing import Array, cpu_count, Pool, freeze_support


if __name__ == "__main__":
    freeze_support()
    
    image = np.random.randint(0, 5, size = (35000, 49000, 3), dtype=np.uint8)
    image_shape = image.shape

    image_base = Array(ctypes.c_uint, 35000 * 49000 * 3)
    shared_image = np.ctypeslib.as_array(image_base.get_obj())
    shared_image = shared_image.reshape(*image.shape)
    shared_image[:] = image[:]
    print("Complete making `image_base`.")
    
    
    with Pool(cpu_count(), init=pool_init, initargs = (image_base, image_shape) as p:
    	r = p.map_async(pool_job, list(range(tile_rows)))

        r.wait()

        pool_result = r.get()

        print("")
        print("Multiprocessing job done.")
    	



- When multiprocessing, Unix uses fork method, Windows uses spawn method by default.

 

 

Some conditions for Windows:

  •  All codes using multiprocessing should be guarded by `if __name__ == "__main__"` unlike Unix (Unless Pool will be called recursively).

  •  If you want to use shared variables, you should use `init`, `initargs` arguments with calling `Pool`, in Windows.

 

Some conditions for Jupyter(IPython):

  •  `pool_init` and `pool_job` function should be in the other script(e.g. mmd.py).