频率控制

46.6 频率控制

time.sleep(1) 每次请求间隔,避免对服务器造成压力。

大量爬取需考虑异步、队列、分布式(进阶)。

polite 爬取

# ========================================
# 示例:请求间隔
# ========================================
import requests
import time

urls = ['https://httpbin.org/get?page=1', 'https://httpbin.org/get?page=2']
for url in urls:
    r = requests.get(url, timeout=10)
    print(r.status_code, url)
    time.sleep(1)  # 礼貌等待 1 秒