Python爬虫——解决urlretrieve下载不完整问题且避免用时过长-阿里云开发者社区

Python爬虫——解决urlretrieve下载不完整问题且避免用时过长

2017-08-23 2058

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： 在这篇博客中：http://blog.csdn.net/Innovation_Z/article/details/51106601 ，作者利用递归方法解决了urlretrieve下载文件不完整的方法，其代码如下：de...

在这篇博客中：http://blog.csdn.net/Innovation_Z/article/details/51106601 ，作者利用递归方法解决了urlretrieve下载文件不完整的方法，其代码如下：

def auto_down(url,filename):
    try:
        urllib.urlretrieve(url,filename)
    except urllib.ContentTooShortError:
        print 'Network conditions is not good.Reloading.'
        auto_down(url,filename)

　　但是经笔者测试，下载文件出现urllib.ContentTooShortError且重新下载文件会存在用时过长的问题，而且往往会尝试好几次，甚至十几次，偶尔会陷入死循环，这种情况是非常不理想的。为此，笔者利用socket模块，使得每次重新下载的时间变短，且避免陷入死循环，从而提高运行效率。
　　以下为代码：

import socket
import urllib.request
#设置超时时间为30s
socket.setdefaulttimeout(30)
#解决下载不完全问题且避免陷入死循环
try:
    urllib.request.urlretrieve(url,image_name)
except socket.timeout:
    count = 1
    while count <= 5:
        try:
            urllib.request.urlretrieve(url,image_name)                                                
            break
        except socket.timeout:
            err_info = 'Reloading for %d time'%count if count == 1 else 'Reloading for %d times'%count
            print(err_info)
            count += 1
    if count > 5:
        print("downloading picture fialed!")

本次分享到此结束，如有不足之处，还请批评指正！欢迎大家交流~~

注意：本人现已开通两个微信公众号：因为Python（微信号为：python_math）以及轻松学会Python爬虫（微信号为：easy_web_scrape），欢迎大家关注哦~~

Python爬虫——解决urlretrieve下载不完整问题且避免用时过长

热门文章

最新文章

相关课程

相关电子书

相关实验场景