How to Monitor Proxy Quality & Performance
Complete guide to effectively monitoring proxy quality and performance. Learn measurement techniques for response time, success rates, and IP quality with automated tool implementation examples.
Proxy quality monitoring requires tracking three key metrics: success rate, response time, and IP quality. Use automated tools for continuous monitoring with automatic removal of underperforming proxies. Implement efficient quality management with Bright Data's monitoring API and Python scripts.
The Critical Importance of Proxy Quality Monitoring
Proxy service quality directly impacts web scraping project success. As explained in our Ultimate Guide to Proxy Services & Web Scraping, without proper monitoring systems, poor-quality proxies can significantly reduce data collection efficiency.
Key Metrics to Monitor
1. Success Rate
Definition: Percentage of successful requests against total requests
Calculation Formula:
Success Rate = (Successful Requests / Total Requests) × 100
Recommended Standards:
- Residential Proxies: 95%+
- Datacenter Proxies: 98%+
- Mobile Proxies: 90%+
2. Response Time
Measurement Items:
- Average response time
- 95th percentile time
- Maximum response time
Recommended Standards:
- Residential Proxies: Under 3 seconds
- Datacenter Proxies: Under 1 second
- Mobile Proxies: Under 5 seconds
3. IP Quality
Evaluation Elements:
- IP reputation
- Blacklist status
- Geographic accuracy
- Anonymity level
Building a Monitoring System
Basic Monitoring System
import time
import requests
import statistics
from datetime import datetime, timedelta
from dataclasses import dataclass
from typing import List, Dict, Optional
@dataclass
class ProxyMetrics:
proxy_id: str
success_rate: float
avg_response_time: float
p95_response_time: float
error_count: int
last_check: datetime
class ProxyMonitor:
def __init__(self, proxy_list: List[str]):
self.proxy_list = proxy_list
self.metrics: Dict[str, ProxyMetrics] = {}
self.test_urls = [
'https://httpbin.org/ip',
'https://httpbin.org/headers',
'https://httpbin.org/user-agent'
]
def test_proxy(self, proxy: str, timeout: int = 10) -> Dict:
"""Test individual proxy"""
results = []
for url in self.test_urls:
start_time = time.time()
try:
response = requests.get(
url,
proxies={'http': proxy, 'https': proxy},
timeout=timeout
)
end_time = time.time()
results.append({
'success': response.status_code == 200,
'response_time': end_time - start_time,
'status_code': response.status_code
})
except Exception as e:
results.append({
'success': False,
'response_time': timeout,
'error': str(e)
})
return self.calculate_metrics(results)
def calculate_metrics(self, results: List[Dict]) -> Dict:
"""Calculate metrics from test results"""
successful = [r for r in results if r['success']]
response_times = [r['response_time'] for r in results]
return {
'success_rate': len(successful) / len(results) * 100,
'avg_response_time': statistics.mean(response_times),
'p95_response_time': statistics.quantiles(response_times, n=20)[18] if len(response_times) > 1 else 0,
'error_count': len(results) - len(successful)
}
def monitor_all_proxies(self) -> Dict[str, ProxyMetrics]:
"""Execute monitoring for all proxies"""
for proxy in self.proxy_list:
metrics = self.test_proxy(proxy)
self.metrics[proxy] = ProxyMetrics(
proxy_id=proxy,
success_rate=metrics['success_rate'],
avg_response_time=metrics['avg_response_time'],
p95_response_time=metrics['p95_response_time'],
error_count=metrics['error_count'],
last_check=datetime.now()
)
return self.metrics
Advanced Monitoring System
import asyncio
import aiohttp
from concurrent.futures import ThreadPoolExecutor
import sqlite3
from typing import AsyncIterator
class AdvancedProxyMonitor:
def __init__(self, proxy_list: List[str], db_path: str = 'proxy_metrics.db'):
self.proxy_list = proxy_list
self.db_path = db_path
self.setup_database()
def setup_database(self):
"""Initialize SQLite database"""
conn = sqlite3.connect(self.db_path)
cursor = conn.cursor()
cursor.execute('''
CREATE TABLE IF NOT EXISTS proxy_metrics (
id INTEGER PRIMARY KEY AUTOINCREMENT,
proxy_id TEXT,
success_rate REAL,
avg_response_time REAL,
p95_response_time REAL,
error_count INTEGER,
timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
)
''')
conn.commit()
conn.close()
async def test_proxy_async(self, session: aiohttp.ClientSession, proxy: str) -> Dict:
"""Asynchronous proxy testing"""
results = []
for url in self.test_urls:
start_time = time.time()
try:
async with session.get(
url,
proxy=proxy,
timeout=aiohttp.ClientTimeout(total=10)
) as response:
end_time = time.time()
results.append({
'success': response.status == 200,
'response_time': end_time - start_time,
'status_code': response.status
})
except Exception as e:
results.append({
'success': False,
'response_time': 10,
'error': str(e)
})
return self.calculate_metrics(results)
async def monitor_proxies_async(self) -> List[ProxyMetrics]:
"""Asynchronous monitoring of all proxies"""
async with aiohttp.ClientSession() as session:
tasks = [
self.test_proxy_async(session, proxy)
for proxy in self.proxy_list
]
results = await asyncio.gather(*tasks)
# Save to database
self.save_metrics(results)
return results
def save_metrics(self, metrics: List[Dict]):
"""Save metrics to database"""
conn = sqlite3.connect(self.db_path)
cursor = conn.cursor()
for i, metric in enumerate(metrics):
cursor.execute('''
INSERT INTO proxy_metrics (
proxy_id, success_rate, avg_response_time,
p95_response_time, error_count
) VALUES (?, ?, ?, ?, ?)
''', (
self.proxy_list[i],
metric['success_rate'],
metric['avg_response_time'],
metric['p95_response_time'],
metric['error_count']
))
conn.commit()
conn.close()
Automation and Alert Features
Threshold-Based Alerts
import smtplib
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
class ProxyAlertSystem:
def __init__(self, smtp_config: Dict):
self.smtp_config = smtp_config
self.thresholds = {
'success_rate': 90.0,
'avg_response_time': 5.0,
'p95_response_time': 10.0
}
def check_thresholds(self, metrics: ProxyMetrics) -> List[str]:
"""Check thresholds"""
alerts = []
if metrics.success_rate < self.thresholds['success_rate']:
alerts.append(f"Success rate declined: {metrics.success_rate:.2f}%")
if metrics.avg_response_time > self.thresholds['avg_response_time']:
alerts.append(f"Average response time increased: {metrics.avg_response_time:.2f}s")
if metrics.p95_response_time > self.thresholds['p95_response_time']:
alerts.append(f"95th percentile response time increased: {metrics.p95_response_time:.2f}s")
return alerts
def send_alert(self, proxy_id: str, alerts: List[str]):
"""Send alert email"""
msg = MIMEMultipart()
msg['From'] = self.smtp_config['from_email']
msg['To'] = self.smtp_config['to_email']
msg['Subject'] = f"Proxy Quality Alert: {proxy_id}"
body = f"""
The following issues were detected with proxy {proxy_id}:
{chr(10).join(alerts)}
Please investigate.
"""
msg.attach(MIMEText(body, 'plain'))
try:
server = smtplib.SMTP(self.smtp_config['smtp_server'], self.smtp_config['port'])
server.starttls()
server.login(self.smtp_config['username'], self.smtp_config['password'])
server.send_message(msg)
server.quit()
except Exception as e:
print(f"Email sending error: {e}")
Automatic Proxy Removal System
class ProxyPoolManager:
def __init__(self, initial_proxies: List[str]):
self.active_proxies = set(initial_proxies)
self.quarantine_proxies = set()
self.monitor = AdvancedProxyMonitor(list(self.active_proxies))
def evaluate_proxy_health(self, proxy: str, metrics: ProxyMetrics) -> bool:
"""Evaluate proxy health"""
if metrics.success_rate < 80:
return False
if metrics.avg_response_time > 10:
return False
if metrics.error_count > 5:
return False
return True
def update_proxy_pool(self):
"""Update proxy pool"""
current_metrics = self.monitor.monitor_all_proxies()
for proxy, metrics in current_metrics.items():
if not self.evaluate_proxy_health(proxy, metrics):
if proxy in self.active_proxies:
self.active_proxies.remove(proxy)
self.quarantine_proxies.add(proxy)
print(f"Quarantined proxy: {proxy}")
else:
if proxy in self.quarantine_proxies:
self.quarantine_proxies.remove(proxy)
self.active_proxies.add(proxy)
print(f"Restored proxy: {proxy}")
def get_healthy_proxies(self) -> List[str]:
"""Get list of healthy proxies"""
return list(self.active_proxies)
Practical Monitoring Recipes
Regional Quality Monitoring
class RegionalProxyMonitor:
def __init__(self, proxy_config: Dict[str, List[str]]):
self.proxy_config = proxy_config # {'US': [proxy1, proxy2], 'JP': [proxy3]}
self.regional_metrics = {}
def test_regional_access(self, region: str, proxies: List[str]) -> Dict:
"""Regional access testing"""
test_sites = {
'US': ['https://www.google.com', 'https://www.amazon.com'],
'JP': ['https://www.google.co.jp', 'https://www.amazon.co.jp'],
'EU': ['https://www.google.de', 'https://www.amazon.de']
}
results = []
for proxy in proxies:
for site in test_sites.get(region, []):
try:
response = requests.get(
site,
proxies={'http': proxy, 'https': proxy},
timeout=10
)
# Regional restriction check
is_region_accessible = self.check_region_access(response, region)
results.append({
'proxy': proxy,
'site': site,
'accessible': is_region_accessible,
'response_time': response.elapsed.total_seconds()
})
except Exception as e:
results.append({
'proxy': proxy,
'site': site,
'accessible': False,
'error': str(e)
})
return self.analyze_regional_results(results)
def check_region_access(self, response: requests.Response, region: str) -> bool:
"""Check regional access availability"""
# Check for common regional restriction error messages
error_indicators = [
'not available in your country',
'access denied',
'geographic restriction',
'region not supported'
]
content = response.text.lower()
return not any(indicator in content for indicator in error_indicators)
Performance Optimization
class PerformanceOptimizer:
def __init__(self, proxy_list: List[str]):
self.proxy_list = proxy_list
self.performance_history = {}
def benchmark_proxies(self, test_duration: int = 300) -> Dict:
"""Benchmark proxies"""
results = {}
for proxy in self.proxy_list:
start_time = time.time()
request_count = 0
successful_requests = 0
total_response_time = 0
while time.time() - start_time < test_duration:
try:
response = requests.get(
'https://httpbin.org/ip',
proxies={'http': proxy, 'https': proxy},
timeout=10
)
request_count += 1
if response.status_code == 200:
successful_requests += 1
total_response_time += response.elapsed.total_seconds()
time.sleep(0.1) # 100ms interval
except Exception:
request_count += 1
results[proxy] = {
'requests_per_second': request_count / test_duration,
'success_rate': successful_requests / request_count * 100,
'avg_response_time': total_response_time / successful_requests if successful_requests > 0 else 0
}
return results
def rank_proxies(self, benchmark_results: Dict) -> List[str]:
"""Rank proxies"""
def calculate_score(metrics: Dict) -> float:
"""Calculate composite score"""
return (
metrics['success_rate'] * 0.4 +
(1 / metrics['avg_response_time'] if metrics['avg_response_time'] > 0 else 0) * 0.3 +
metrics['requests_per_second'] * 0.3
)
ranked = sorted(
benchmark_results.items(),
key=lambda x: calculate_score(x[1]),
reverse=True
)
return [proxy for proxy, _ in ranked]
Leveraging Bright Data Monitoring API
class BrightDataMonitor:
def __init__(self, api_token: str):
self.api_token = api_token
self.base_url = 'https://brightdata.com/api/v1'
def get_zone_statistics(self, zone_id: str) -> Dict:
"""Get zone statistics"""
headers = {
'Authorization': f'Bearer {self.api_token}',
'Content-Type': 'application/json'
}
response = requests.get(
f'{self.base_url}/zones/{zone_id}/statistics',
headers=headers
)
if response.status_code == 200:
return response.json()
else:
raise Exception(f"API Error: {response.status_code}")
def get_usage_metrics(self, zone_id: str, period: str = '24h') -> Dict:
"""Get usage metrics"""
headers = {
'Authorization': f'Bearer {self.api_token}',
'Content-Type': 'application/json'
}
params = {'period': period}
response = requests.get(
f'{self.base_url}/zones/{zone_id}/usage',
headers=headers,
params=params
)
return response.json() if response.status_code == 200 else {}
def analyze_bright_data_metrics(self, zone_id: str) -> Dict:
"""Analyze Bright Data metrics"""
stats = self.get_zone_statistics(zone_id)
usage = self.get_usage_metrics(zone_id)
return {
'success_rate': stats.get('success_rate', 0),
'avg_response_time': stats.get('avg_response_time', 0),
'bandwidth_usage': usage.get('bandwidth_mb', 0),
'request_count': usage.get('requests', 0),
'cost_per_gb': usage.get('cost_per_gb', 0)
}
Frequently Asked Questions
Q: What's the appropriate monitoring frequency? A: For critical projects, 5-10 minute intervals are recommended; for general use, 30 minutes to 1 hour intervals are suitable.
Q: How to handle temporarily underperforming proxies? A: Quarantine after 3 consecutive failures, then retest after 1 hour to determine restoration is an effective approach.
Q: How long should monitoring data be retained? A: We recommend keeping detailed data for 1 month and summary data for 1 year.
Q: What are good baseline alert thresholds? A: Use 90% success rate, 5-second average response time, and 10-second 95th percentile response time as baselines, adjusting for your specific use case.
Q: Can multiple proxy providers be monitored simultaneously? A: Yes, the same monitoring system can manage multiple providers in an integrated manner.
Monitoring System Operation Tips
Continuous Improvement
- Regular threshold review: Set optimal thresholds based on historical data
- New feature testing: Expand monitoring scope and add new metrics
- Cost optimization: Apply techniques from Bright Data Cost Optimization
Troubleshooting
- Sudden response time degradation: Check for proxy provider outages
- Declining success rates: Investigate target site changes or access restrictions
- Regional access failures: Verify changes in geographic restrictions
Conclusion
Effective proxy quality monitoring systems are essential for building stable web scraping environments. By implementing the techniques presented in this article and customizing them for your project requirements, you can achieve high-quality data collection systems.
Through continuous monitoring and improvement, maintain proxy quality and achieve efficient data collection.
Bright Data
30日間無料世界最大のプロキシネットワーク
References: