DC学院数据分析学习笔记(三):基于HTML的网页爬虫

简介: 基于HTML,用BeautifulSoup实现的简单网页爬虫

终于可以用python实践一下html的爬虫了,之前零散的也学过一些,这次希望能通过在DC学院的学习慢慢深入的了解爬虫的理论知识。
OK,来看今天的数据分析学习笔记!

希望能有所收获( ̄︶ ̄)↗ 

from bs4 import BeautifulSoup

html_doc = """ <html><head><title>The Dormouse's story</title></head> <body> <p class="title"><b>The Dormouse's story</b></p> <p class="story">Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well.</p> <p class="story">...</p> """

使用BeautifulSoup解析HTML文档示例

soup = BeautifulSoup(html_doc,'html.parser') 

“html_doc”表示这个文档名称,在上面的代码中已经定义,“html_parser”是解析网页所需的解析器,所以使用BeautifulSoup解析HTML文档的一般格式为soup=BeautifulSoup(网页名称,'html.parser')

用 soup.prettify 打印网页

print(soup.prettify()) 
<html>
 <head>
  <title>
   The Dormouse's story
  </title>
 </head>
 <body>
  <p class="title">
   <b>
    The Dormouse's story
   </b>
  </p>
  <p class="story">
   Once upon a time there were three little sisters; and their names were
   <a class="sister" href="http://example.com/elsie" id="link1">
    Elsie
   </a>
   ,
   <a class="sister" href="http://example.com/lacie" id="link2">
    Lacie
   </a>
   and
   <a class="sister" href="http://example.com/tillie" id="link3">
    Tillie
   </a>
   ; and they lived at the bottom of a well.
  </p>
  <p class="story">
   ...
  </p>
 </body>
</html>

BeautifulSoup 解析网页的一些基本操作

soup.title
<title>The Dormouse's story</title>
soup.title.name
'title'
soup.title.string
"The Dormouse's story"
soup.find_all("a")
[<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
 <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
 <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
soup.find(id="link3")
<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>

爬取“NATIONAL WEATHER”的天气数据

DC学院中提供的示例时旧金山天气页面地址:
http://forecast.weather.gov/MapClick.php?lat=37.77492773500046&lon=-122.41941932299972#.WUnSFhN95E4

小技巧:可以使用浏览其中的开发者工具查看代码

如图:

image

1.通过url.request返回网页内容

import urllib.request as urlrequest
weather_url='http://forecast.weather.gov/MapClick.php?lat=37.7771&lon=-122.4196'
web_page=urlrequest.urlopen(weather_url).read()
## print(web_page) 这个太多了。。。此处省略一万字

2.通过BeautifulSoup抓取网页中的天气信息

from bs4 import BeautifulSoup
soup=BeautifulSoup(web_page,'html.parser')
print(soup.find(id='seven-day-forecast-body').get_text())



Today
SunnyHigh: 74 °F

Tonight
ClearLow: 52 °F

Thursday
SunnyHigh: 73 °F

ThursdayNight
ClearLow: 51 °F

Friday
SunnyHigh: 68 °F

FridayNight
Mostly ClearLow: 50 °F

Saturday
SunnyHigh: 64 °F

SaturdayNight
Mostly ClearLow: 50 °F

Sunday
SunnyHigh: 66 °F

// equalize forecast heights
$(function () {
    var maxh = 0;
    $(".forecast-tombstone .short-desc").each(function () {
        var h = $(this).height();
        if (h > maxh) { maxh = h; }
    });
    $(".forecast-tombstone .short-desc").height(maxh);
});
 

发现上面打印出来的前面部分很完美,但是后面却多了js的代码,那好,怎么去掉呢?

重新打印一下整个的div

from bs4 import BeautifulSoup
soup=BeautifulSoup(web_page,'html.parser')
print(soup.find(id='seven-day-forecast-body'))
<div class="panel-body" id="seven-day-forecast-body">
<div id="seven-day-forecast-container"><ul class="list-unstyled" id="seven-day-forecast-list"><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Today<br/><br/></p>
<p><img alt="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. " class="forecast-icon" src="newimages/medium/skc.png" title="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 74 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Tonight<br/><br/></p>
<p><img alt="Tonight: Clear, with a low around 52. North wind around 6 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Tonight: Clear, with a low around 52. North wind around 6 mph. "/></p><p class="short-desc">Clear</p><p class="temp temp-low">Low: 52 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Thursday<br/><br/></p>
<p><img alt="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. " class="forecast-icon" src="newimages/medium/skc.png" title="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 73 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Thursday<br/>Night</p>
<p><img alt="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. "/></p><p class="short-desc">Clear</p><p class="temp temp-low">Low: 51 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Friday<br/><br/></p>
<p><img alt="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. " class="forecast-icon" src="newimages/medium/few.png" title="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 68 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Friday<br/>Night</p>
<p><img alt="Friday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Friday Night: Mostly clear, with a low around 50."/></p><p class="short-desc">Mostly Clear</p><p class="temp temp-low">Low: 50 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Saturday<br/><br/></p>
<p><img alt="Saturday: Sunny, with a high near 64." class="forecast-icon" src="newimages/medium/few.png" title="Saturday: Sunny, with a high near 64."/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 64 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Saturday<br/>Night</p>
<p><img alt="Saturday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Saturday Night: Mostly clear, with a low around 50."/></p><p class="short-desc">Mostly Clear</p><p class="temp temp-low">Low: 50 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Sunday<br/><br/></p>
<p><img alt="Sunday: Sunny, with a high near 66." class="forecast-icon" src="newimages/medium/few.png" title="Sunday: Sunny, with a high near 66."/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 66 °F</p></div></li></ul></div>
<script type="text/javascript">
// equalize forecast heights
$(function () {
    var maxh = 0;
    $(".forecast-tombstone .short-desc").each(function () {
        var h = $(this).height();
        if (h > maxh) { maxh = h; }
    });
    $(".forecast-tombstone .short-desc").height(maxh);
});
</script> </div>

我们发现在上面的代码最后面,之前多余的js代码是在最外层的div里面的,也就是在div class="panel-body" id="seven-day-forecast-body"这个里面的,而div id="seven-day-forecast-container"之中并没有包含我们不需要的这一段js代码。那就好办了:把id="seven-day-forecast-body"改为id="seven-day-forecast-container"

from bs4 import BeautifulSoup
soup=BeautifulSoup(web_page,'html.parser')
print(soup.find(id='seven-day-forecast-body'))
<div class="panel-body" id="seven-day-forecast-body">
<div id="seven-day-forecast-container"><ul class="list-unstyled" id="seven-day-forecast-list"><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Today<br/><br/></p>
<p><img alt="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. " class="forecast-icon" src="newimages/medium/skc.png" title="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 74 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Tonight<br/><br/></p>
<p><img alt="Tonight: Clear, with a low around 52. North wind around 6 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Tonight: Clear, with a low around 52. North wind around 6 mph. "/></p><p class="short-desc">Clear</p><p class="temp temp-low">Low: 52 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Thursday<br/><br/></p>
<p><img alt="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. " class="forecast-icon" src="newimages/medium/skc.png" title="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 73 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Thursday<br/>Night</p>
<p><img alt="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. "/></p><p class="short-desc">Clear</p><p class="temp temp-low">Low: 51 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Friday<br/><br/></p>
<p><img alt="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. " class="forecast-icon" src="newimages/medium/few.png" title="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 68 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Friday<br/>Night</p>
<p><img alt="Friday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Friday Night: Mostly clear, with a low around 50."/></p><p class="short-desc">Mostly Clear</p><p class="temp temp-low">Low: 50 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Saturday<br/><br/></p>
<p><img alt="Saturday: Sunny, with a high near 64." class="forecast-icon" src="newimages/medium/few.png" title="Saturday: Sunny, with a high near 64."/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 64 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Saturday<br/>Night</p>
<p><img alt="Saturday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Saturday Night: Mostly clear, with a low around 50."/></p><p class="short-desc">Mostly Clear</p><p class="temp temp-low">Low: 50 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Sunday<br/><br/></p>
<p><img alt="Sunday: Sunny, with a high near 66." class="forecast-icon" src="newimages/medium/few.png" title="Sunday: Sunny, with a high near 66."/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 66 °F</p></div></li></ul></div>
<script type="text/javascript">
// equalize forecast heights
$(function () {
    var maxh = 0;
    $(".forecast-tombstone .short-desc").each(function () {
        var h = $(this).height();
        if (h > maxh) { maxh = h; }
    });
    $(".forecast-tombstone .short-desc").height(maxh);
});
</script> </div>

from bs4 import BeautifulSoup
soup=BeautifulSoup(web_page,'html.parser')
print(soup.find(id='seven-day-forecast-container'))
<div id="seven-day-forecast-container"><ul class="list-unstyled" id="seven-day-forecast-list"><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Today<br/><br/></p>
<p><img alt="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. " class="forecast-icon" src="newimages/medium/skc.png" title="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 74 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Tonight<br/><br/></p>
<p><img alt="Tonight: Clear, with a low around 52. North wind around 6 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Tonight: Clear, with a low around 52. North wind around 6 mph. "/></p><p class="short-desc">Clear</p><p class="temp temp-low">Low: 52 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Thursday<br/><br/></p>
<p><img alt="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. " class="forecast-icon" src="newimages/medium/skc.png" title="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 73 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Thursday<br/>Night</p>
<p><img alt="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. "/></p><p class="short-desc">Clear</p><p class="temp temp-low">Low: 51 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Friday<br/><br/></p>
<p><img alt="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. " class="forecast-icon" src="newimages/medium/few.png" title="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 68 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Friday<br/>Night</p>
<p><img alt="Friday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Friday Night: Mostly clear, with a low around 50."/></p><p class="short-desc">Mostly Clear</p><p class="temp temp-low">Low: 50 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Saturday<br/><br/></p>
<p><img alt="Saturday: Sunny, with a high near 64." class="forecast-icon" src="newimages/medium/few.png" title="Saturday: Sunny, with a high near 64."/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 64 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Saturday<br/>Night</p>
<p><img alt="Saturday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Saturday Night: Mostly clear, with a low around 50."/></p><p class="short-desc">Mostly Clear</p><p class="temp temp-low">Low: 50 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Sunday<br/><br/></p>
<p><img alt="Sunday: Sunny, with a high near 66." class="forecast-icon" src="newimages/medium/few.png" title="Sunday: Sunny, with a high near 66."/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 66 °F</p></div></li></ul></div>

这样看着就舒服多了,好了,js代码终于没有了,执行一下之前的操作看看

from bs4 import BeautifulSoup
soup=BeautifulSoup(web_page,'html.parser')
print(soup.find(id='seven-day-forecast-container').get_text())


Today
SunnyHigh: 74 °F

Tonight
ClearLow: 52 °F

Thursday
SunnyHigh: 73 °F

ThursdayNight
ClearLow: 51 °F

Friday
SunnyHigh: 68 °F

FridayNight
Mostly ClearLow: 50 °F

Saturday
SunnyHigh: 64 °F

SaturdayNight
Mostly ClearLow: 50 °F

Sunday
SunnyHigh: 66 °F

但这样我们也不太好提取,通过prettify美化一下,再看看怎么提取我们需要的信息

from bs4 import BeautifulSoup
soup=BeautifulSoup(web_page,'html.parser')
print(soup.find(id='seven-day-forecast-container').prettify())
<div id="seven-day-forecast-container">
 <ul class="list-unstyled" id="seven-day-forecast-list">
  <li class="forecast-tombstone">
   <div class="tombstone-container">
    <p class="period-name">
     Today
     <br/>
     <br/>
    </p>
    <p>
     <img alt="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. " class="forecast-icon" src="newimages/medium/skc.png" title="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. "/>
    </p>
    <p class="short-desc">
     Sunny
    </p>
    <p class="temp temp-high">
     High: 74 °F
    </p>
   </div>
  </li>
  <li class="forecast-tombstone">
   <div class="tombstone-container">
    <p class="period-name">
     Tonight
     <br/>
     <br/>
    </p>
    <p>
     <img alt="Tonight: Clear, with a low around 52. North wind around 6 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Tonight: Clear, with a low around 52. North wind around 6 mph. "/>
    </p>
    <p class="short-desc">
     Clear
    </p>
    <p class="temp temp-low">
     Low: 52 °F
    </p>
   </div>
  </li>
  <li class="forecast-tombstone">
   <div class="tombstone-container">
    <p class="period-name">
     Thursday
     <br/>
     <br/>
    </p>
    <p>
     <img alt="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. " class="forecast-icon" src="newimages/medium/skc.png" title="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. "/>
    </p>
    <p class="short-desc">
     Sunny
    </p>
    <p class="temp temp-high">
     High: 73 °F
    </p>
   </div>
  </li>
  <li class="forecast-tombstone">
   <div class="tombstone-container">
    <p class="period-name">
     Thursday
     <br/>
     Night
    </p>
    <p>
     <img alt="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. "/>
    </p>
    <p class="short-desc">
     Clear
    </p>
    <p class="temp temp-low">
     Low: 51 °F
    </p>
   </div>
  </li>
  <li class="forecast-tombstone">
   <div class="tombstone-container">
    <p class="period-name">
     Friday
     <br/>
     <br/>
    </p>
    <p>
     <img alt="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. " class="forecast-icon" src="newimages/medium/few.png" title="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. "/>
    </p>
    <p class="short-desc">
     Sunny
    </p>
    <p class="temp temp-high">
     High: 68 °F
    </p>
   </div>
  </li>
  <li class="forecast-tombstone">
   <div class="tombstone-container">
    <p class="period-name">
     Friday
     <br/>
     Night
    </p>
    <p>
     <img alt="Friday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Friday Night: Mostly clear, with a low around 50."/>
    </p>
    <p class="short-desc">
     Mostly Clear
    </p>
    <p class="temp temp-low">
     Low: 50 °F
    </p>
   </div>
  </li>
  <li class="forecast-tombstone">
   <div class="tombstone-container">
    <p class="period-name">
     Saturday
     <br/>
     <br/>
    </p>
    <p>
     <img alt="Saturday: Sunny, with a high near 64." class="forecast-icon" src="newimages/medium/few.png" title="Saturday: Sunny, with a high near 64."/>
    </p>
    <p class="short-desc">
     Sunny
    </p>
    <p class="temp temp-high">
     High: 64 °F
    </p>
   </div>
  </li>
  <li class="forecast-tombstone">
   <div class="tombstone-container">
    <p class="period-name">
     Saturday
     <br/>
     Night
    </p>
    <p>
     <img alt="Saturday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Saturday Night: Mostly clear, with a low around 50."/>
    </p>
    <p class="short-desc">
     Mostly Clear
    </p>
    <p class="temp temp-low">
     Low: 50 °F
    </p>
   </div>
  </li>
  <li class="forecast-tombstone">
   <div class="tombstone-container">
    <p class="period-name">
     Sunday
     <br/>
     <br/>
    </p>
    <p>
     <img alt="Sunday: Sunny, with a high near 66." class="forecast-icon" src="newimages/medium/few.png" title="Sunday: Sunny, with a high near 66."/>
    </p>
    <p class="short-desc">
     Sunny
    </p>
    <p class="temp temp-high">
     High: 66 °F
    </p>
   </div>
  </li>
 </ul>
</div>


从上面的HTML代码来看,我们发现我们需要的信息分别对应三个classperiod-name,short-desc,temp

soup_forecast = soup.find(id='seven-day-forecast-container')
soup_forecast.find_all(class_='period-name')
[<p class="period-name">Today<br/><br/></p>,
 <p class="period-name">Tonight<br/><br/></p>,
 <p class="period-name">Thursday<br/><br/></p>,
 <p class="period-name">Thursday<br/>Night</p>,
 <p class="period-name">Friday<br/><br/></p>,
 <p class="period-name">Friday<br/>Night</p>,
 <p class="period-name">Saturday<br/><br/></p>,
 <p class="period-name">Saturday<br/>Night</p>,
 <p class="period-name">Sunday<br/><br/></p>]

3.最后,将我们需要的信息完整的输出

soup_forecast=soup.find(id='seven-day-forecast-container')
date_list=soup_forecast.find_all(class_='period-name')
desc_list=soup_forecast.find_all(class_='short-desc')
temp_list=soup_forecast.find_all(class_='temp')
for i in range(9):
    date=date_list[i].get_text()
    desc=desc_list[i].get_text()
    temp=temp_list[i].get_text()
    print("{} {} {}".format(date,desc,temp))
Today Sunny High: 74 °F
Tonight Clear Low: 52 °F
Thursday Sunny High: 73 °F
ThursdayNight Clear Low: 51 °F
Friday Sunny High: 68 °F
FridayNight Mostly Clear Low: 50 °F
Saturday Sunny High: 64 °F
SaturdayNight Mostly Clear Low: 50 °F
Sunday Sunny High: 66 °F

完整代码:

#导入需要的包和模块,这里需要的是 urllib.request 和 Beautifulsoup
import urllib.request as urlrequest
from bs4 import BeautifulSoup

#通过urllib来获取我们需要爬取的网页
weather_url='http://forecast.weather.gov/MapClick.php?lat=37.77492773500046&lon=-122.41941932299972'
web_page=urlrequest.urlopen(weather_url).read()

#用 BeautifulSoup 来解析和获取我们想要的内容块
soup=BeautifulSoup(web_page,'html.parser')
soup_forecast=soup.find(id='seven-day-forecast-container')

#找到我们想要的那一部分内容
date_list=soup_forecast.find_all(class_='period-name')
desc_list=soup_forecast.find_all(class_='short-desc')
temp_list=soup_forecast.find_all(class_='temp')

#将获取的内容更好地展示出来,用for循环来实现
for i in range(9):
    date=date_list[i].get_text()
    desc=desc_list[i].get_text()
    temp=temp_list[i].get_text()
    print("{}{}{}".format(date,desc,temp))
TodaySunnyHigh: 74 °F
TonightClearLow: 52 °F
ThursdaySunnyHigh: 73 °F
ThursdayNightClearLow: 51 °F
FridaySunnyHigh: 68 °F
FridayNightMostly ClearLow: 50 °F
SaturdaySunnyHigh: 64 °F
SaturdayNightMostly ClearLow: 50 °F
SundaySunnyHigh: 66 °F

目录
相关文章
|
2月前
|
数据采集 存储 数据挖掘
Python 爬虫实战之爬拼多多商品并做数据分析
Python爬虫可以用来抓取拼多多商品数据,并对这些数据进行数据分析。以下是一个简单的示例,演示如何使用Python爬取拼多多商品数据并进行数据分析。
|
2月前
|
数据采集 数据挖掘 API
主流电商平台数据采集API接口|【Python爬虫+数据分析】采集电商平台数据信息采集
随着电商平台的兴起,越来越多的人开始在网上购物。而对于电商平台来说,商品信息、价格、评论等数据是非常重要的。因此,抓取电商平台的商品信息、价格、评论等数据成为了一项非常有价值的工作。本文将介绍如何使用Python编写爬虫程序,抓取电商平台的商品信息、价格、评论等数据。 当然,如果是电商企业,跨境电商企业,ERP系统搭建,我们经常需要采集的平台多,数据量大,要求数据稳定供应,有并发需求,那就需要通过接入电商API数据采集接口,封装好的数据采集接口更方便稳定高效数据采集。
|
2月前
|
数据采集 存储 数据挖掘
Python 爬虫实战之爬拼多多商品并做数据分析
在上面的代码中,我们使用pandas库创建DataFrame存储商品数据,并计算平均价格和平均销量。最后,我们将计算结果打印出来。此外,我们还可以使用pandas库提供的其他函数和方法来进行更复杂的数据分析和处理。 需要注意的是,爬取拼多多商品数据需要遵守拼多多的使用协议和规定,避免过度请求和滥用数据。
|
2月前
|
Web App开发
某教程学习笔记(一):04、HTML基础
某教程学习笔记(一):04、HTML基础
13 0
|
6月前
|
数据采集 存储 数据挖掘
Python 爬虫实战之爬拼多多商品并做数据分析
在上面的代码中,我们使用pandas库创建DataFrame存储商品数据,并计算平均价格和平均销量。最后,我们将计算结果打印出来。此外,我们还可以使用pandas库提供的其他函数和方法来进行更复杂的数据分析和处理。 需要注意的是,爬取拼多多商品数据需要遵守拼多多的使用协议和规定,避免过度请求和滥用数据。
|
8月前
|
前端开发 UED 容器
|
5月前
|
数据采集 JSON JavaScript
网络爬虫的实战项目:使用JavaScript和Axios爬取Reddit视频并进行数据分析
网络爬虫是一种程序或脚本,用于自动从网页中提取数据。网络爬虫的应用场景非常广泛,例如搜索引擎、数据挖掘、舆情分析等。本文将介绍如何使用JavaScript和Axios这两个工具,实现一个网络爬虫的实战项目,即从Reddit这个社交媒体平台上爬取视频,并进行数据分析。本文的目的是帮助读者了解网络爬虫的基本原理和步骤,以及如何使用代理IP技术,避免被目标网站封禁。
网络爬虫的实战项目:使用JavaScript和Axios爬取Reddit视频并进行数据分析
|
5月前
|
机器学习/深度学习 自然语言处理 算法
Python预测 数据分析与算法 学习笔记(特征工程、时间序列)2
Python预测 数据分析与算法 学习笔记(特征工程、时间序列)
107 0
|
5月前
|
机器学习/深度学习 算法 数据可视化
Python预测 数据分析与算法 学习笔记(特征工程、时间序列)1
Python预测 数据分析与算法 学习笔记(特征工程、时间序列)
71 0
|
7月前
|
前端开发 JavaScript 算法
网络结构与HTML学习笔记
网络结构与HTML学习笔记
148 0
网络结构与HTML学习笔记