Performance Tuning
Optimize your Alactic AGI deployment for maximum speed, efficiency, and throughput. This guide covers performance optimization strategies across infrastructure, application settings, and operational practices.
Performance Metrics
Key Performance Indicators
Processing Speed:
- Target: 10-20 seconds per 10-page PDF
- Measurement: Time from job submission to completion
- Factors: VM SKU, model choice, analysis depth, document complexity
Throughput:
- Target: Process 80-90% of plan quota per month
- Measurement: Documents processed per hour/day/month
- Factors: Concurrent processing capacity, batch sizes
Resource Utilization:
- Target: CPU 60-75%, Memory 65-80% during processing
- Measurement: Azure Monitor metrics
- Factors: Workload patterns, VM sizing
Success Rate:
- Target: More than 95% successful processing
- Measurement: Completed jobs / total jobs
- Factors: Document quality, configuration settings
Infrastructure Optimization
VM Performance
Understand VM Capabilities
Free Plan (Standard_B2s):
- Burstable Performance: Uses CPU credits
- Sustained Load: Performance degrades when credits exhausted
- Best For: Intermittent workloads
- Limitation: Not suitable for consistent heavy processing
Credits exhaustion symptoms:
- First 10 docs: 12-15 seconds each
- After 50 docs: 25-35 seconds each (2-3x slower)
- Recovery: Credits regenerate during idle time
Recommendation: Spread processing throughout month, avoid large batches.
Pro Plan (Standard_D2s_v3):
- Dedicated Performance: No CPU credits
- Sustained Load: Consistent performance
- Best For: Regular production workloads
- Concurrent Processing: 5 jobs simultaneously
Pro+ Plan (Standard_D4s_v3):
- 2x vCPUs: Double compute power
- 2x RAM: Better for large documents
- Concurrent Processing: 10-12 jobs simultaneously
- Best For: High-volume operations
Optimize VM Configuration
Enable Accelerated Networking (Pro+ and Enterprise):
# Check if enabled
az network nic show --resource-group alactic-rg --name alactic-nic \
--query "enableAcceleratedNetworking"
# Enable if false
az network nic update --resource-group alactic-rg --name alactic-nic \
--accelerated-networking true
Benefits:
- Lower network latency (50% reduction)
- Higher throughput
- Better CPU utilization
- Impact: 5-10% faster document processing
Configure Premium SSD:
Already included in all plans, but verify:
az disk show --resource-group alactic-rg --name alactic-vm-disk \
--query "sku.name"
# Should return: Premium_LRS
Benefits:
- Faster read/write operations
- Lower latency for database access
- Better for large documents
Database Performance
Cosmos DB Optimization
Query Optimization:
Alactic uses optimized queries, but you can monitor:
Azure Portal → Cosmos DB → Metrics → Request Units
Target: Less than 2,000 RU/s average (serverless limit: 5,000 RU/s)
If approaching limit:
- Archive old documents (reduce database size)
- Disable vector storage if not using search
- Consider dedicated throughput tier (Enterprise)
Indexing:
Default indexing is optimized for Alactic workloads. No changes needed.
Connection Pooling:
Alactic maintains connection pools for efficiency:
- Default: 50 connections
- Under load: Up to 100 connections
- Automatically managed
Storage Performance
Premium SSD Configuration:
Already optimal, but you can verify IOPS:
az disk show --resource-group alactic-rg --name alactic-vm-disk \
--query "{IOPS: diskIOPSReadWrite, Throughput: diskMBpsReadWrite}"
Pro Plan (128 GB Premium SSD):
- IOPS: 500
- Throughput: 100 MB/s
- Sufficient for moderate workloads
Pro+ Plan (256 GB Premium SSD):
- IOPS: 1,100
- Throughput: 125 MB/s
- Better for high-volume processing
Optimize Storage Access:
# Check disk cache settings
az vm show --resource-group alactic-rg --name alactic-vm \
--query "storageProfile.osDisk.caching"
# Should return: ReadWrite (optimal for Alactic)
Application-Level Optimization
Processing Settings
Choose Optimal Analysis Depth
Three levels with different performance profiles:
| Depth | Processing Time | Use Case | Output Quality |
|---|---|---|---|
| Quick Extract | 8-12s | Text extraction only | Text only |
| Standard Analysis | 15-20s | Summary + key points | Balanced |
| Deep Analysis | 25-35s | Full analysis + entities | Comprehensive |
Optimization Strategy:
Default to Standard, use others selectively:
def choose_analysis_depth(document_type):
if document_type == "simple_invoice":
return "quick" # Just need text
elif document_type == "technical_report":
return "deep" # Need detailed analysis
else:
return "standard" # Balanced default
Performance Impact:
Processing 100 documents:
- All Deep: 2,500 seconds (~42 minutes)
- All Standard: 1,800 seconds (~30 minutes)
- All Quick: 1,000 seconds (~17 minutes)
Savings: 25-40% faster with appropriate depth selection
Model Selection for Speed
Processing time comparison (10-page PDF):
| Model | Avg Time | 95th Percentile | Use When |
|---|---|---|---|
| GPT-4o mini | 15-18s | 22s | Speed priority |
| GPT-4o | 18-22s | 28s | Quality priority |
GPT-4o mini is 15-20% faster.
Speed Optimization:
- Use mini for time-sensitive processing
- Use GPT-4o when quality is critical
- Consider cascade strategy (mini first, GPT-4o if needed)
Vector Storage Settings
Impact on performance:
With vectors enabled:
- Additional 2-3 seconds per document
- Embedding API calls required
- Extra database writes
Without vectors:
- 2-3 seconds faster
- No embedding costs
- Smaller database footprint
Recommendation:
Disable if not using semantic search:
curl -X POST https://your-vm-ip/api/v1/process \
-H "X-Deployment-Key: ak-xxxxx" \
-F "file=@document.pdf" \
-F "enable_vectors=false"
Or set as default:
Dashboard → Settings → Processing → Enable Vectors → Off
Performance gain: 15-20% faster processing
Concurrent Processing
Optimize Concurrency Levels
Concurrent job limits by plan:
| Plan | Max Concurrent | Optimal Concurrent | Queue Limit |
|---|---|---|---|
| Free | 2 | 1-2 | 50 |
| Pro | 5 | 3-4 | 200 |
| Pro+ | 12 | 8-10 | 500 |
| Enterprise | 50+ | 30-40 | 5,000 |
Why not max out concurrency?
- Leave headroom for system overhead
- Better stability and responsiveness
- Prevents resource contention
Optimal concurrency formula:
Optimal = (vCPUs * 2) - 1
Pro (2 vCPUs): 3 concurrent jobs
Pro+ (4 vCPUs): 7 concurrent jobs
Implementation:
from concurrent.futures import ThreadPoolExecutor
def process_batch(documents, max_workers=3):
with ThreadPoolExecutor(max_workers=max_workers) as executor:
futures = [executor.submit(process_doc, doc) for doc in documents]
results = [f.result() for f in futures]
return results
Performance Impact:
Processing 100 documents on Pro Plan:
- Sequential: 100 × 18s = 1,800s (30 min)
- Concurrent (3 workers): 100 × 18s / 3 = 600s (10 min)
- 3x faster with optimal concurrency
Batch Size Optimization
Recommended batch sizes:
| Plan | Optimal Batch | Max Batch | Processing Time |
|---|---|---|---|
| Free | 10 PDFs | 10 PDFs | ~3 minutes |
| Pro | 25 PDFs | 25 PDFs | ~8 minutes |
| Pro+ | 50 PDFs | 50 PDFs | ~10 minutes |
| Enterprise | 100-200 PDFs | 500 PDFs | ~20-40 minutes |
Why these sizes?
- Balance between efficiency and manageability
- Fit in typical HTTP timeout windows
- Allow for retries on failure
- Don't overwhelm queue
Example:
Process 500 PDFs on Pro+ Plan:
- Option 1: 10 batches of 50 (recommended)
- Option 2: 20 batches of 25 (slower, more overhead)
- Option 3: 1 batch of 500 (risky, long timeout)
Best practice: Stick to recommended batch sizes.
Network Optimization
Reduce Upload Times
For large PDFs:
Strategy 1: Upload during off-peak hours
- Less network congestion
- Faster uploads
- Same processing speed
Strategy 2: Compress PDFs before upload
# Using Ghostscript
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 \
-dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dBATCH \
-sOutputFile=compressed.pdf original.pdf
Results:
- Original: 50 MB (5-second upload)
- Compressed: 10 MB (1-second upload)
- 4 seconds saved per document
Strategy 3: Use Azure VM in same region
If uploading from Azure VM:
- Choose same region as Alactic deployment
- Use Azure internal network
- Much faster than public internet
Impact:
- Same region: Less than 1ms latency
- Different region: 50-200ms latency
- 50-200ms saved per API call
Optimize API Calls
Use batch endpoints:
Bad:
for doc in documents:
result = api.process(doc) # 100 separate API calls
Good:
results = api.batch_process(documents) # 1 API call
Savings:
- 100 separate calls: 100 × 100ms = 10 seconds overhead
- 1 batch call: 100ms overhead
- 9.9 seconds saved
Use HTTP/2 connection reuse:
import requests
from requests.adapters import HTTPAdapter
# Reuse connections
session = requests.Session()
adapter = HTTPAdapter(pool_connections=10, pool_maxsize=20)
session.mount('https://', adapter)
# All requests use same connection
for doc in documents:
session.post(url, ...) # Faster due to connection reuse
Impact: 20-30% faster API calls
Monitoring and Diagnosis
Identify Performance Bottlenecks
Processing Time Breakdown:
Typical 10-page PDF (18 seconds total):
- Upload and validation: 1.2s (7%)
- PDF parsing: 3.8s (21%)
- Text extraction: 2.1s (12%)
- Model inference (GPT-4o mini): 0.4s (2%)
- Summary generation: 8.5s (47%)
- Vector embedding: 1.8s (10%)
- Database write: 0.7s (4%)
Bottleneck: Summary generation (47% of time)
Optimization:
- Use Quick Extract if summary not needed (save 8.5s)
- Use Standard vs Deep to reduce summary complexity
Performance Monitoring
Track key metrics:
Dashboard → Settings → Performance Analytics
Average Processing Times (Last 7 Days):
By Document Size:
1-5 pages: 12.3s avg (target: 10-15s) ✓
6-10 pages: 18.7s avg (target: 15-20s) ✓
11-20 pages: 35.2s avg (target: 30-40s) ✓
21-50 pages: 2m 18s avg (target: 2-3 min) ✓
By Model:
GPT-4o mini: 16.1s avg
GPT-4o: 20.3s avg
By Analysis Depth:
Quick: 10.2s avg
Standard: 18.5s avg
Deep: 28.9s avg
Throughput:
Documents/hour: 180 (peak), 120 (avg)
Success rate: 97.2%
What to watch:
- Sudden increases in processing time (more than 20%)
- Degrading success rate (less than 95%)
- Decreasing throughput
Azure Monitor Metrics:
// Average processing time over time
Syslog
| where ProcessName == "alactic-worker"
| where SyslogMessage contains "completed"
| extend ProcessingTime = extract("time: ([0-9.]+)s", 1, SyslogMessage)
| summarize avg(todouble(ProcessingTime)) by bin(TimeGenerated, 1h)
| render timechart
Performance Testing
Benchmark your deployment:
import time
import requests
def benchmark_processing(num_documents=10):
start_time = time.time()
results = []
for i in range(num_documents):
doc_start = time.time()
result = process_document(f"test_doc_{i}.pdf")
doc_time = time.time() - doc_start
results.append(doc_time)
total_time = time.time() - start_time
avg_time = sum(results) / len(results)
print(f"Total time: {total_time:.2f}s")
print(f"Average per doc: {avg_time:.2f}s")
print(f"Throughput: {num_documents / total_time * 3600:.0f} docs/hour")
return results
Expected results:
Pro Plan (10 documents):
Total time: 185.3s
Average per doc: 18.5s
Throughput: 194 docs/hour
If significantly slower:
- Check VM CPU usage (may be throttled)
- Verify network connectivity
- Check Azure OpenAI status
- Review recent configuration changes
Advanced Optimization Techniques
Caching Strategies
Cache frequently processed document types:
import hashlib
import json
# Calculate document hash
def get_document_hash(file_content):
return hashlib.sha256(file_content).hexdigest()
# Check cache before processing
def process_with_cache(document):
doc_hash = get_document_hash(document)
# Check if already processed
cached_result = cache.get(doc_hash)
if cached_result:
return cached_result # Instant return
# Process and cache
result = process_document(document)
cache.set(doc_hash, result, ttl=86400) # Cache for 24 hours
return result
Use cases:
- Template documents processed repeatedly
- Standard forms with minor variations
- Recurring reports with same format
Performance gain:
- Cache hit: Less than 100ms (vs 18 seconds)
- 99.4% faster for cached documents
Preprocessing Documents
Optimize documents before processing:
Strategy 1: Remove unnecessary pages
from PyPDF2 import PdfReader, PdfWriter
def extract_relevant_pages(input_pdf, pages_to_keep):
reader = PdfReader(input_pdf)
writer = PdfWriter()
for page_num in pages_to_keep:
writer.add_page(reader.pages[page_num])
output_pdf = "processed.pdf"
with open(output_pdf, "wb") as f:
writer.write(f)
return output_pdf
Example:
- Original: 50-page PDF with 5 relevant pages
- Processed: 5-page PDF
- Processing time: 3 minutes → 20 seconds
- 89% faster
Strategy 2: OCR enhancement
For scanned PDFs with poor OCR:
from pdf2image import convert_from_path
import pytesseract
def enhance_ocr(input_pdf):
# Convert to images
images = convert_from_path(input_pdf, dpi=300)
# OCR with Tesseract (better quality)
text = ""
for img in images:
text += pytesseract.image_to_string(img)
# Create new PDF with enhanced text
# ... (save as new PDF)
return enhanced_pdf
Benefit: Better extraction quality, potentially faster processing
Load Balancing (Enterprise)
For very high volumes:
Deploy multiple Alactic instances with load balancer:
┌─────────────┐
Internet ──→│ Load Balancer│
└─────────────┘
│
┌───────────────┼───────────────┐
↓ ↓ ↓
┌─────────┐ ┌─────────┐ ┌─────────┐
│Alactic 1│ │Alactic 2│ │Alactic 3│
└─────────┘ └─────────┘ └─────────┘
Benefits:
- 3x throughput
- High availability
- No single point of failure
- Horizontal scaling
Cost:
- 3x infrastructure costs
- Worth it for more than 5,000 docs/month
Plan-Specific Optimization
Free Plan Optimization
Challenge: Burstable VM with CPU credits
Strategy:
-
Spread processing throughout month:
- 70 docs over 30 days = 2-3 docs/day
- Avoid processing all at once
-
Process during idle recovery:
- Process in morning (credits regenerated overnight)
- Avoid back-to-back processing
-
Use Quick Extract aggressively:
- Reduces CPU load
- Conserves CPU credits
- Still gets text extraction
Expected performance:
- First 5 docs/day: 12-15s each
- Additional docs: 20-30s each (credits depleting)
Pro Plan Optimization
Challenge: Balance throughput with limited concurrency
Strategy:
-
Optimal concurrency: 3-4 jobs
- Don't max out at 5
- Leave headroom for stability
-
Batch size: 25 PDFs
- Optimal for Pro Plan
- ~8-10 minute batches
-
Model distribution:
- 85% GPT-4o mini (fast)
- 15% GPT-4o (quality)
Expected performance:
- 300 docs/month: 10 batches of 30
- Total processing time: 2-3 hours/month
- Average: 18-20s per document
Pro+ Plan Optimization
Challenge: Maximize high-capacity infrastructure
Strategy:
-
Optimal concurrency: 8-10 jobs
- Leverage 4 vCPUs
- Significantly faster than Pro
-
Batch size: 50 PDFs
- Take advantage of higher limits
- ~10-12 minute batches
-
Enable all features:
- Vector storage (plenty of resources)
- Deep analysis when needed
- Advanced processing
Expected performance:
- 1,500 docs/month: 30 batches of 50
- Total processing time: 5-6 hours/month
- Average: 14-16s per document
- Throughput: 300-350 docs/hour (peak)
Troubleshooting Slow Performance
Diagnosis Checklist
Step 1: Check VM health
# SSH into VM
ssh -i ~/.ssh/id_rsa appuser@your-vm-ip
# Check CPU usage
top
# Look for processes using more than 90% CPU
# Check memory
free -h
# Should have at least 1 GB available
# Check disk
df -h
# Should have at least 10% free
Step 2: Check service status
sudo systemctl status alactic-api
sudo systemctl status alactic-worker
# Look for recent errors
sudo journalctl -u alactic-worker -n 100
Step 3: Check Azure OpenAI connectivity
curl -H "X-Deployment-Key: ak-xxxxx" \
https://your-vm-ip/api/v1/health
# Check "azure_openai" status
# Should show "connected" with low latency (less than 500ms)
Step 4: Review recent changes
- Did you change analysis depth settings?
- Did you enable vector storage recently?
- Are you processing larger documents than before?
- Did Azure region have issues?
Common Performance Issues
Issue 1: Slow processing (more than 30s per document)
Causes:
- Azure OpenAI throttling (HTTP 429 errors)
- VM resource exhaustion
- Network connectivity issues
- Corrupted/complex PDFs
Solutions:
- Check Azure OpenAI quota and request increase
- Review VM metrics for CPU/memory exhaustion
- Test network connectivity to Azure OpenAI
- Try processing simpler test document
Issue 2: Degrading performance over time
Causes:
- CPU credit depletion (Free Plan)
- Database bloat (too many documents)
- Disk space exhaustion
- Memory leaks (rare)
Solutions:
- For Free Plan: Space out processing
- Delete old documents to reduce database size
- Clean up disk space
- Restart services:
sudo systemctl restart alactic-worker
Issue 3: Inconsistent processing times
Causes:
- Variable document complexity
- Azure OpenAI variable latency
- Other workloads on VM
- Network congestion
Solutions:
- Group similar documents together
- Process during off-peak hours
- Ensure no other intensive processes running
- Consider upgrading to higher plan
Performance Checklist
Weekly Tasks
- Monitor average processing time
- Check resource utilization (CPU, memory, disk)
- Review success rate
- Identify any performance degradation
Monthly Tasks
- Analyze performance trends
- Benchmark against baselines
- Clean up old documents
- Review and optimize settings
Quarterly Tasks
- Comprehensive performance audit
- Consider plan upgrade if needed
- Implement new optimization strategies
- Update performance baselines