无毒AV免费在线观看,一级无码黄片毛片,日韩无码四页久久99色综合

新聞中心

這里有您想知道的互聯(lián)網(wǎng)營銷解決方案

看我如何抓取二手房價數(shù)據(jù)

上次為大家介紹了如何通過 Python 抓取新房樓盤價格信息，很多朋友都在問，那二手房最新的價格信息要如何抓取呢?好!今天就再來為大家講一講，二手房的房價信息要怎么抓取。

模塊安裝

同上次新房一樣，這里需要安裝以下模塊(當(dāng)然如果已安裝就不用再裝了)：

 
 
 
 
  
  
  
  # 安裝引用模塊
  
  
  
  pip3 install bs4
  
  
  
  pip3 install requests
  
  
  
  pip3 install lxml
  
  
  
  pip3 install numpy
  
  
  
  pip3 install pandas

好了，安裝完成后，就可以開始寫代碼了。至于配置請求頭和代理IP地址的代碼，上次介紹新房已經(jīng)說過了，這里不再贅述，下面直接上抓取代碼。

二手房價數(shù)據(jù)對象

在這里我們將二手房的房價信息，創(chuàng)建成一個對象，后續(xù)我們只要將獲取到的數(shù)據(jù)保存成對象，再處理就會方便很多。SecHouse 對象代碼如下所示：

 
 
 
 
  
  
  
  # 二手房信息對象
  
  
  
  class SecHouse(object):
  
  
  
      def __init__(self, district, area, name, price, desc, pic):
  
  
  
          self.district = district
  
  
  
          self.area = area
  
  
  
          self.price = price
  
  
  
          self.name = name
  
  
  
          self.desc = desc
  
  
  
          self.pic = pic
  
  
  
      def text(self):
  
  
  
          return self.district + "," + \
  
  
  
                  self.area + "," + \
  
  
  
                  self.name + "," + \
  
  
  
                  self.price + "," + \
  
  
  
                  self.desc + "," + \
  
  
  
                  self.pic

獲取二手房價信息并保存

準(zhǔn)備好了，下面我們依然以貝殼為例，批量爬取其北京地區(qū)二手房數(shù)據(jù)，并保存到本地。這里我主要想說的是如何抓取數(shù)據(jù)過程，所以這里依然就保存成最簡單的 txt 文本格式。如果想保存到數(shù)據(jù)庫，可以自行修改代碼進行保存數(shù)據(jù)庫處理。

獲取區(qū)縣信息

我們在抓取二手房信息時，肯定想知道這個房源所在地區(qū)，所以這里我寫了個方法把北京市所有區(qū)縣信息抓取下來，并臨時保存至列表變量里，以備后續(xù)程序中使用，代碼如下：

 
 
 
 
  
  
  
  # 獲取區(qū)縣信息
  
  
  
  def get_districts():
  
  
  
      # 請求 URL
  
  
  
      url = 'https://bj.ke.com/xiaoqu/'
  
  
  
      headers = create_headers()
  
  
  
      # 請求獲取數(shù)據(jù)
  
  
  
      response = requests.get(url, timeout=10, headers=headers)
  
  
  
      html = response.content
  
  
  
      root = etree.HTML(html)
  
  
  
      # 處理數(shù)據(jù)
  
  
  
      elements = root.xpath('///div[3]/div[1]/dl[2]/dd/div/div/a')
  
  
  
      en_names = list()
  
  
  
      ch_names = list()
  
  
  
      # 循環(huán)處理對象
  
  
  
      for element in elements:
  
  
  
          link = element.attrib['href']
  
  
  
          en_names.append(link.split('/')[-2])
  
  
  
          ch_names.append(element.text)
  
  
  
  
  
  
  
      # 打印區(qū)縣英文和中文名列表
  
  
  
      for index, name in enumerate(en_names):
  
  
  
          chinese_city_district_dict[name] = ch_names[index]
  
  
  
      return en_names

獲取地區(qū)板塊

除了上面要獲取區(qū)縣信息，我們還應(yīng)該獲取比區(qū)縣更小的板塊區(qū)域信息，同樣的區(qū)縣內(nèi)，不同板塊地區(qū)二手房的價格等信息肯定不一樣，所以板塊對于我們來說也很重要，具有一次參考價值。獲取板塊信息代碼如下：

 
 
 
 
  
  
  
  # 獲取某個區(qū)縣下所有板塊信息
  
  
  
  def get_areas(district):
  
  
  
      # 請求的 URL
  
  
  
      page = "http://bj.ke.com/xiaoqu/{0}".format(district)
  
  
  
      # 板塊列表定義
  
  
  
      areas = list()
  
  
  
      try:
  
  
  
          headers = create_headers()
  
  
  
          response = requests.get(page, timeout=10, headers=headers)
  
  
  
          html = response.content
  
  
  
          root = etree.HTML(html)
  
  
  
          # 獲取標(biāo)簽信息
  
  
  
          links = root.xpath('//div[3]/div[1]/dl[2]/dd/div/div[2]/a')
  
  
  
  
  
  
  
          # 針對list進行處理
  
  
  
          for link in links:
  
  
  
              relative_link = link.attrib['href']
  
  
  
              # 最后"/"去掉
  
  
  
              relative_link = relative_link[:-1]
  
  
  
              # 獲取最后一節(jié)信息
  
  
  
              area = relative_link.split("/")[-1]
  
  
  
              # 去掉區(qū)縣名稱,以防止重復(fù)
  
  
  
              if area != district:
  
  
  
                  chinese_area = link.text
  
  
  
                  chinese_area_dict[area] = chinese_area
  
  
  
                  # 加入板塊信息列表
  
  
  
                  areas.append(area)
  
  
  
          return areas
  
  
  
      except Exception as e:
  
  
  
          print(e)

獲取二手房信息并保存

 
 
 
 
  
  
  
  # 創(chuàng)建文件準(zhǔn)備寫入
  
  
  
  with open("sechouse.txt", "w", encoding='utf-8') as f:
  
  
  
      # 定義變量
  
  
  
      total_page = 1
  
  
  
      # 初始化 list
  
  
  
      sec_house_list = list()
  
  
  
      # 獲取所有區(qū)縣信息
  
  
  
      districts = get_districts()
  
  
  
      # 循環(huán)處理區(qū)縣
  
  
  
      for district in districts:
  
  
  
          # 獲取某一區(qū)縣下所有板塊信息
  
  
  
          arealist = get_areas(district)
  
  
  
          # 循環(huán)遍歷所有板塊下的小區(qū)二手房信息
  
  
  
          for area in arealist:
  
  
  
              # 中文區(qū)縣
  
  
  
              chinese_district = chinese_city_district_dict.get(district, "")
  
  
  
              # 中文版塊
  
  
  
              chinese_area = chinese_area_dict.get(area, "")
  
  
  
              # 請求地址
  
  
  
              page = 'http://bj.ke.com/ershoufang/{0}/'.format(area)
  
  
  
              headers = create_headers()
  
  
  
              response = requests.get(page, timeout=10, headers=headers)
  
  
  
              html = response.content
  
  
  
              # 解析 HTML
  
  
  
              soup = BeautifulSoup(html, "lxml")
  
  
  
  
  
  
  
              # 獲取總頁數(shù)
  
  
  
              try:
  
  
  
                  page_box = soup.find_all('div', class_='page-box')[0]
  
  
  
                  matches = re.search('.*data-total-count="(\d+)".*', str(page_box))
  
  
  
                  # 獲取總頁數(shù)
  
  
  
                  total_page = int(math.ceil(int(matches.group(1)) / 10))
  
  
  
              except Exception as e:
  
  
  
                  print(e)
  
  
  
  
  
  
  
              print(total_page)
  
  
  
              # 設(shè)置請求頭
  
  
  
              headers = create_headers()
  
  
  
              # 從第一頁開始,遍歷到最后一頁
  
  
  
              for i in range(1, total_page + 1):
  
  
  
                  # 請求地址
  
  
  
                  page = 'http://bj.ke.com/ershoufang/{0}/pg{1}'.format(area,i)
  
  
  
                  print(page)
  
  
  
                  # 獲取返回內(nèi)容
  
  
  
                  response = requests.get(page, timeout=10, headers=headers)
  
  
  
                  html = response.content
  
  
  
                  soup = BeautifulSoup(html, "lxml")
  
  
  
  
  
  
  
                  # 獲得二手房查詢列表
  
  
  
                  house_elements = soup.find_all('li', class_="clear")
  
  
  
                  # 遍歷每條信息
  
  
  
                  for house_elem in house_elements:
  
  
  
                      # 價格
  
  
  
                      price = house_elem.find('div', class_="totalPrice")
  
  
  
                      # 標(biāo)題
  
  
  
                      name = house_elem.find('div', class_='title')
  
  
  
                      # 描述
  
  
  
                      desc = house_elem.find('div', class_="houseInfo")
  
  
  
                      # 圖片地址
  
  
  
                      pic = house_elem.find('a', class_="img").find('img', class_="lj-lazy")
  
  
  
  
  
  
  
                      # 清洗數(shù)據(jù)
  
  
  
                      price = price.text.strip()
  
  
  
                      name = name.text.replace("\n", "")
  
  
  
                      desc = desc.text.replace("\n", "").strip()
  
  
  
                      pic = pic.get('data-original').strip()
  
  
  
  
  
  
  
                      # 保存二手房對象
  
  
  
                      sec_house = SecHouse(chinese_district, chinese_area, name, price, desc, pic)
  
  
  
                      print(sec_house.text())
  
  
  
                      sec_house_list.append(sec_house)
  
  
  
              # 循環(huán)遍歷將信息寫入 txt
  
  
  
              for sec_house in sec_house_list:
  
  
  
                  f.write(sec_house.text() + "\n")

到這里代碼就寫好了，現(xiàn)在我們就可以通過命令 python sechouse.py 運行代碼進行數(shù)據(jù)抓取了。抓取的結(jié)果我們可以打開當(dāng)前目錄下 sechouse.txt 文件查看，結(jié)果如下圖所示：

總結(jié)本文為大家介紹了如何通過 Python 將房產(chǎn)網(wǎng)上的二手房數(shù)據(jù)批量抓取下來，經(jīng)過一段時間的抓取，我們就可以將抓取的結(jié)果進行對比分析，看看二手房價最近是漲還是跌?如果喜歡我們的文章，請關(guān)注收藏再看。

本文題目：看我如何抓取二手房價數(shù)據(jù)
轉(zhuǎn)載注明：http://www.5511xx.com/article/cdddhji.html

日韩无码专区无码一级三级片|91人人爱网站中日韩无码电影|厨房大战丰满熟妇|AV高清无码在线免费观看|另类AV日韩少妇熟女|中文日本大黄一级黄色片|色情在线视频免费|亚洲成人特黄a片|黄片wwwav色图欧美|欧亚乱色一区二区三区

新聞中心

其他資訊