频率控制
46.6 频率控制
time.sleep(1) 每次请求间隔,避免对服务器造成压力。
大量爬取需考虑异步、队列、分布式(进阶)。
polite 爬取
# ========================================
# 示例:请求间隔
# ========================================
import requests
import time
urls = ['https://httpbin.org/get?page=1', 'https://httpbin.org/get?page=2']
for url in urls:
r = requests.get(url, timeout=10)
print(r.status_code, url)
time.sleep(1) # 礼貌等待 1 秒