Python multiprocessing WHY and HOW

简介:

I am working on a Python script which will migrate data from one database to another. In a simple way, I need selectfrom a database and then insert into another.
In the first version, I designed to use multithreading, just because I am more familiar with it than multiprocessing. But after fewer month, I found several problems in my workaround.

  • can only use one of 24 cpus in case of GIL
  • can not handle singal for each thread. I want to use a simple timeout decarator, to set a signal.SIGALRM for a specified function. But for multithreading, the signal will get caught by a random thread.

So I start to refactor to multiprocessing.

multiprocessing

multiprocessing is a package that supports spawning processes using an API similar to the threading module.

But it's not so elegant and sweet as it described.

multiprocessing.pool

from multiprocessing import Pool

def f(x):
    return x*x

if __name__ == '__main__':
    p = Pool(5)
    print(p.map(func=f, [1, 2, 3]))

It looks like a good solution, but we can not set a bound function as a target func. Because bound function can not be serialized in pickle. And multithreading.pool use pickle to serialize object and send to new processes.pathos.multiprocessing is a good instead. It uses dill as an instead of pickle to give better serialization.

share memory

Memory in multithreading is shared naturally. In a multiprocessing environment, there are some wrappers to wrap a sharing object.

  • multiprocessing.Value and multiprocessing.Array is the most simple way to share Objects between two processes. But it can only contain ctype Objects.
  • multiprocessing.Queue is very useful and use an API similar to Queue.Queue

Python and GCC version

I didn't know that even the GCC version will affect behavior of my code. On my centos5 os, same Python version with different GCC version will have different behaviors.

Python 2.7.2 (default, Jan 10 2012, 11:17:45)
[GCC 3.4.6 20060404 (Red Hat 3.4.6-11)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from multiprocessing.queues import JoinableQueue
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/oracle/dbapython/lib/python2.7/multiprocessing/queues.py", line 48, in <module>
    from multiprocessing.synchronize import Lock, BoundedSemaphore, Semaphore, Condition
  File "/home/oracle/dbapython/lib/python2.7/multiprocessing/synchronize.py", line 59, in <module>
    " function, see issue 3770.")
ImportError: This platform lacks a functioning sem_open implementation, therefore, the required synchronization primitives needed will not function, see issue 3770.


Python 2.7.2 (default, Oct 15 2013, 13:15:26)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-46)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from multiprocessing.queues import JoinableQueue
>>>

目录
相关文章
|
5月前
|
Unix Linux Python
114 python高级 - multiprocessing
114 python高级 - multiprocessing
19 0
|
2月前
|
安全 Python
python多进程multiprocessing使用
如果你想在python中使用线程来实现并发以提高效率,大多数情况下你得到的结果是比串行执行的效率还要慢;这主要是python中GIL(全局解释锁)的缘故,通常情况下线程比较适合高IO低CPU的任务,否则创建线程的耗时可能比串行的还要多。GIL是历史问题,和C解释器有关系。 为了解决这个问题,python中提供了多进程的方式来处理需要并发的任务,可以有效的利用多核cpu达到并行的目的。【2月更文挑战第5天】
45 0
|
2月前
|
并行计算 程序员 API
Python多进程编程:利用multiprocessing模块实现并行计算
Python多进程编程:利用multiprocessing模块实现并行计算
|
3月前
|
安全 Python Windows
Python 的并发编程:在 Python 中如何使用 `threading` 和 `multiprocessing` 模块?
Python 的并发编程:在 Python 中如何使用 `threading` 和 `multiprocessing` 模块?
|
10月前
|
存储 安全 Unix
Python 标准类库-并发执行之multiprocessing-基于进程的并行 2
Python 标准类库-并发执行之multiprocessing-基于进程的并行
216 0
|
10月前
|
安全 Unix 程序员
Python 标准类库-并发执行之multiprocessing-基于进程的并行 1
Python 标准类库-并发执行之multiprocessing-基于进程的并行
106 1
|
10月前
|
机器学习/深度学习 Python
Python应用专题 | 12:用 multiprocessing 处理海量任务
面对海量任务需要高效对其进行消费,而任务之间不存在处理结果的相互依赖,这时可以尝试使用multiprocessing。
|
Java 数据库 芯片
物无定味适口者珍,Python3并发场景(CPU密集/IO密集)任务的并发方式的场景抉择(多线程threading/多进程multiprocessing/协程asyncio)
一般情况下,大家对Python原生的并发/并行工作方式:进程、线程和协程的关系与区别都能讲清楚。甚至具体的对象名称、内置方法都可以如数家珍,这显然是极好的,但我们其实都忽略了一个问题,就是具体应用场景,三者的使用目的是一样的,换句话说,使用结果是一样的,都可以提高程序运行的效率,但到底那种场景用那种方式更好一点?
物无定味适口者珍,Python3并发场景(CPU密集/IO密集)任务的并发方式的场景抉择(多线程threading/多进程multiprocessing/协程asyncio)
|
Python
multiprocessing库:Python像线程一样管理进程(二)
multiprocessing库:Python像线程一样管理进程(二)
105 1
multiprocessing库:Python像线程一样管理进程(二)
|
API Python Windows
multiprocessing库:Python像线程一样管理进程(一)
multiprocessing库:Python像线程一样管理进程(一)
130 1
multiprocessing库:Python像线程一样管理进程(一)

热门文章

最新文章