《Python数据分析》一1.5 一个简单的应用-阿里云开发者社区

《Python数据分析》一1.5 一个简单的应用

2017-05-02 1475

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介：

本节书摘来自异步社区《Python数据分析》一书中的第1章，第1.5节，作者【印尼】Ivan Idris，更多章节内容可以访问云栖社区“异步社区”公众号查看

1.5 一个简单的应用

假设要对向量a和b进行求和。注意，这里“向量”这个词的含义是数学意义上的，即一个一维数组。在第3章“统计学与线性代数”中，将遇到一种表示矩阵的特殊NumPy数组。向量a存放的是整数0到n-1的2次幂。如果n等于3，那么a保存的是0、1和4。向量b存放的是整数0到n的3次幂，所以如果n等于3，那么向量b等于0、1或者8。如果使用普通的Python代码，该怎么做呢？

在我们想出了一个解决方案后，可以拿来与等价的NumPy方案进行比较。

下面的函数没有借助NumPy，而是使用纯Python来解决向量加法问题：

def pythonsum(n):
　 a = range(n)
　 b = range(n)
　 c = []

　 for i in range(len(a)):
　　 a[i] = i ** 2
　　 b[i] = i ** 3
　　 c.append(a[i] + b[i])

　 return c

下面是利用NumPy解决向量加法问题的函数：

def numpysum(n):
　a = numpy.arange(n) ** 2
　b = numpy.arange(n) ** 3
　c = a + b
　return c

注意，numpysum()无需使用for语句。此外，我们使用了来自NumPy的arange()函数，它替我们创建了一个含有整数0到n的NumPy数组。这里的arange()函数也是从NumPy导入的，所以它加上了前缀numpy。

现在到了真正有趣的地方。我们在前言中讲过，NumPy在进行数组运算时，速度是相当快的。可是，到底有多么快呢？下面的程序代码将为我们展示numpysum()和pythonsum()这两个函数的实耗时间，这里以微秒为单位。同时，它还会显示向量sum最后面的两个元素值。下面来看使用Python和NumPy能否得到相同的答案：

#!/usr/bin/env/python

import sys
from datetime import datetime
import numpy as np

"""
 This program demonstrates vector addition the Python way.
 Run from the command line as follows

　python vectorsum.py n

 where n is an integer that specifies the size of the vectors.

 The first vector to be added contains the squares of 0 up to n.
 The second vector contains the cubes of 0 up to n.
 The program prints the last 2 elements of the sum and the elapsed　
time.
"""

def numpysum(n):
　 a = np.arange(n) ** 2
　 b = np.arange(n) ** 3
　 c = a + b

　 return c

def pythonsum(n):
　a = range(n)
　 b = range(n)
　 c = []

　 for i in range(len(a)):
　　 a[i] = i ** 2
　　 b[i] = i ** 3
　　 c.append(a[i] + b[i])

　 return c

size = int(sys.argv[1])
start = datetime.now()
c = pythonsum(size)
delta = datetime.now() - start
print "The last 2 elements of the sum", c[-2:]
print "PythonSum elapsed time in microseconds", delta.microseconds
start = datetime.now()
c = numpysum(size)
delta = datetime.now() - start
print "The last 2 elements of the sum", c[-2:]
print "NumPySum elapsed time in microseconds", delta.microseconds

对于1000个、2000个和3000个向量元素，程序的结果如下所示：

$ python vectorsum.py 1000
The last 2 elements of the sum [995007996, 998001000]
PythonSum elapsed time in microseconds 707
The last 2 elements of the sum [995007996 998001000]
NumPySum elapsed time in microseconds 171

$ python vectorsum.py 2000

The last 2 elements of the sum [7980015996, 7992002000]
PythonSum elapsed time in microseconds 1420
The last 2 elements of the sum [7980015996 7992002000]
NumPySum elapsed time in microseconds 168

$ python vectorsum.py 4000

The last 2 elements of the sum [63920031996, 63968004000]
PythonSum elapsed time in microseconds 2829
The last 2 elements of the sum [63920031996 63968004000]
NumPySum elapsed time in microseconds 274

显而易见，NumPy的速度比等价的常规Python代码要快很多。有一件事情是肯定的：无论是否使用NumPy，计算结果都是相同的。不过，结果的显示形式还是有所差别的：numpysum()函数给出的结果不包含逗号。为什么会这样？别忘了，我们处理的不是Python的列表，而是一个NumPy数组。有关NumPy数组的更多内容，将在第2章“NumPy数组”中详细介绍。

《Python数据分析》一1.5 一个简单的应用

1.5 一个简单的应用

热门文章

最新文章

相关课程

相关电子书

相关实验场景