新聞中心
很多時(shí)候,我們最終在python中編寫代碼來執(zhí)行遠(yuǎn)程請求或讀取多個(gè)文件或?qū)δ承?shù)據(jù)進(jìn)行處理。在很多這種情況下,我看到程序員使用一個(gè)簡單的程序員for loop,需要永遠(yuǎn)完成執(zhí)行。例如:

import requests
from time import time
url_list = [
"https://via.placeholder.com/400",
"https://via.placeholder.com/410",
"https://via.placeholder.com/420",
"https://via.placeholder.com/430",
"https://via.placeholder.com/440",
"https://via.placeholder.com/450",
"https://via.placeholder.com/460",
"https://via.placeholder.com/470",
"https://via.placeholder.com/480",
"https://via.placeholder.com/490",
"https://via.placeholder.com/500",
"https://via.placeholder.com/510",
"https://via.placeholder.com/520",
"https://via.placeholder.com/530",
]
def download_file(url):
html = requests.get(url, stream=True)
return html.status_code
start = time()
for url in url_list:
print(download_file(url))
print(f'Time taken: {time() - start}')
Output:
<--truncated--> Time taken: 4.128157138824463
這是一個(gè)理智的示例,代碼將打開每個(gè)URL,等待它加載,打印其狀態(tài)代碼,然后轉(zhuǎn)到下一個(gè)URL。這種代碼非常適合多線程。
現(xiàn)代系統(tǒng)可以運(yùn)行大量線程,這意味著您可以使用非常低的開銷一次完成多個(gè)任務(wù)。為什么我們不嘗試使用它來使上述代碼更快地處理這些URL?
我們將利用ThreadPoolExecutor從concurrent.futures庫。它非常易于使用。讓我向您展示一些代碼,然后解釋它是如何工作的。
import requests
from concurrent.futures import ThreadPoolExecutor, as_completed
from time import time
url_list = [
"https://via.placeholder.com/400",
"https://via.placeholder.com/410",
"https://via.placeholder.com/420",
"https://via.placeholder.com/430",
"https://via.placeholder.com/440",
"https://via.placeholder.com/450",
"https://via.placeholder.com/460",
"https://via.placeholder.com/470",
"https://via.placeholder.com/480",
"https://via.placeholder.com/490",
"https://via.placeholder.com/500",
"https://via.placeholder.com/510",
"https://via.placeholder.com/520",
"https://via.placeholder.com/530",
]
def download_file(url):
html = requests.get(url, stream=True)
return html.status_code
start = time()
processes = []
with ThreadPoolExecutor(max_workers=10) as executor:
for url in url_list:
processes.append(executor.submit(download_file, url))
for task in as_completed(processes):
print(task.result())
print(f'Time taken: {time() - start}')
Output:
<--truncated--> Time taken: 0.4583399295806885
我們的代碼加速了近9倍!我們甚至沒有做任何超級參與。如果有更多網(wǎng)址,性能優(yōu)勢會更高。
那么發(fā)生了什么?當(dāng)我們調(diào)用時(shí),executor.submit 我們正在向線程池添加新任務(wù)。我們將該任務(wù)存儲在進(jìn)程列表中。稍后我們迭代過程并打印出結(jié)果。
該as_completed方法在完成后立即從進(jìn)程列表中生成項(xiàng)(任務(wù))。任務(wù)可以進(jìn)入完成狀態(tài)有兩個(gè)原因。它已完成執(zhí)行或已取消。我們也可以傳入一個(gè)timeout參數(shù)as_completed,如果任務(wù)花費(fèi)的時(shí)間超過了那個(gè)時(shí)間段,那么as_completed就會產(chǎn)生這個(gè)任務(wù)。
您應(yīng)該多探索多線程。對于瑣碎的項(xiàng)目,它是加快代碼速度的最快方法。如果你想學(xué)習(xí),請閱讀官方文檔https://docs.python.org/3/library/concurrent.futures.html,非常有幫助.
網(wǎng)站欄目:創(chuàng)新互聯(lián)Python教程:使用多線程讓Python應(yīng)用飛起來
本文URL:http://www.5511xx.com/article/dhdopec.html


咨詢
建站咨詢
