Performance Tuning

Optimize your Alactic AGI deployment for maximum speed, efficiency, and throughput. This guide covers performance optimization strategies across infrastructure, application settings, and operational practices.

Performance Metrics

Key Performance Indicators

Processing Speed:

Target: 10-20 seconds per 10-page PDF
Measurement: Time from job submission to completion
Factors: VM SKU, model choice, analysis depth, document complexity

Throughput:

Target: Process 80-90% of plan quota per month
Measurement: Documents processed per hour/day/month
Factors: Concurrent processing capacity, batch sizes

Resource Utilization:

Target: CPU 60-75%, Memory 65-80% during processing
Measurement: Azure Monitor metrics
Factors: Workload patterns, VM sizing

Success Rate:

Target: More than 95% successful processing
Measurement: Completed jobs / total jobs
Factors: Document quality, configuration settings

Infrastructure Optimization

VM Performance

Understand VM Capabilities

Free Plan (Standard_B2s):

Burstable Performance: Uses CPU credits
Sustained Load: Performance degrades when credits exhausted
Best For: Intermittent workloads
Limitation: Not suitable for consistent heavy processing

Credits exhaustion symptoms:

First 10 docs: 12-15 seconds each
After 50 docs: 25-35 seconds each (2-3x slower)
Recovery: Credits regenerate during idle time

Recommendation: Spread processing throughout month, avoid large batches.

Pro Plan (Standard_D2s_v3):

Dedicated Performance: No CPU credits
Sustained Load: Consistent performance
Best For: Regular production workloads
Concurrent Processing: 5 jobs simultaneously

Pro+ Plan (Standard_D4s_v3):

2x vCPUs: Double compute power
2x RAM: Better for large documents
Concurrent Processing: 10-12 jobs simultaneously
Best For: High-volume operations

Optimize VM Configuration

Enable Accelerated Networking (Pro+ and Enterprise):

# Check if enabled
az network nic show --resource-group alactic-rg --name alactic-nic \
  --query "enableAcceleratedNetworking"

# Enable if false
az network nic update --resource-group alactic-rg --name alactic-nic \
  --accelerated-networking true

Benefits:

Lower network latency (50% reduction)
Higher throughput
Better CPU utilization
Impact: 5-10% faster document processing

Configure Premium SSD:

Already included in all plans, but verify:

az disk show --resource-group alactic-rg --name alactic-vm-disk \
  --query "sku.name"
# Should return: Premium_LRS

Benefits:

Faster read/write operations
Lower latency for database access
Better for large documents

Database Performance

Cosmos DB Optimization

Query Optimization:

Alactic uses optimized queries, but you can monitor:

Azure Portal → Cosmos DB → Metrics → Request Units

Target: Less than 2,000 RU/s average (serverless limit: 5,000 RU/s)

If approaching limit:

Archive old documents (reduce database size)
Disable vector storage if not using search
Consider dedicated throughput tier (Enterprise)

Indexing:

Default indexing is optimized for Alactic workloads. No changes needed.

Connection Pooling:

Alactic maintains connection pools for efficiency:

Default: 50 connections
Under load: Up to 100 connections
Automatically managed

Storage Performance

Premium SSD Configuration:

Already optimal, but you can verify IOPS:

az disk show --resource-group alactic-rg --name alactic-vm-disk \
  --query "{IOPS: diskIOPSReadWrite, Throughput: diskMBpsReadWrite}"

Pro Plan (128 GB Premium SSD):

IOPS: 500
Throughput: 100 MB/s
Sufficient for moderate workloads

Pro+ Plan (256 GB Premium SSD):

IOPS: 1,100
Throughput: 125 MB/s
Better for high-volume processing

Optimize Storage Access:

# Check disk cache settings
az vm show --resource-group alactic-rg --name alactic-vm \
  --query "storageProfile.osDisk.caching"
# Should return: ReadWrite (optimal for Alactic)

Application-Level Optimization

Processing Settings

Choose Optimal Analysis Depth

Three levels with different performance profiles:

Depth	Processing Time	Use Case	Output Quality
Quick Extract	8-12s	Text extraction only	Text only
Standard Analysis	15-20s	Summary + key points	Balanced
Deep Analysis	25-35s	Full analysis + entities	Comprehensive

Optimization Strategy:

Default to Standard, use others selectively:

def choose_analysis_depth(document_type):
    if document_type == "simple_invoice":
        return "quick"  # Just need text
    elif document_type == "technical_report":
        return "deep"  # Need detailed analysis
    else:
        return "standard"  # Balanced default

Performance Impact:

Processing 100 documents:

All Deep: 2,500 seconds (~42 minutes)
All Standard: 1,800 seconds (~30 minutes)
All Quick: 1,000 seconds (~17 minutes)

Savings: 25-40% faster with appropriate depth selection

Model Selection for Speed

Processing time comparison (10-page PDF):

Model	Avg Time	95th Percentile	Use When
GPT-4o mini	15-18s	22s	Speed priority
GPT-4o	18-22s	28s	Quality priority

GPT-4o mini is 15-20% faster.

Speed Optimization:

Use mini for time-sensitive processing
Use GPT-4o when quality is critical
Consider cascade strategy (mini first, GPT-4o if needed)

Vector Storage Settings

Impact on performance:

With vectors enabled:

Additional 2-3 seconds per document
Embedding API calls required
Extra database writes

Without vectors:

2-3 seconds faster
No embedding costs
Smaller database footprint

Recommendation:

Disable if not using semantic search:

curl -X POST https://your-vm-ip/api/v1/process \
  -H "X-Deployment-Key: ak-xxxxx" \
  -F "file=@document.pdf" \
  -F "enable_vectors=false"

Or set as default:

Dashboard → Settings → Processing → Enable Vectors → Off

Performance gain: 15-20% faster processing

Concurrent Processing

Optimize Concurrency Levels

Concurrent job limits by plan:

Plan	Max Concurrent	Optimal Concurrent	Queue Limit
Free	2	1-2	50
Pro	5	3-4	200
Pro+	12	8-10	500
Enterprise	50+	30-40	5,000

Why not max out concurrency?

Leave headroom for system overhead
Better stability and responsiveness
Prevents resource contention

Optimal concurrency formula:

Optimal = (vCPUs * 2) - 1

Pro (2 vCPUs): 3 concurrent jobs
Pro+ (4 vCPUs): 7 concurrent jobs

Implementation:

from concurrent.futures import ThreadPoolExecutor

def process_batch(documents, max_workers=3):
    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        futures = [executor.submit(process_doc, doc) for doc in documents]
        results = [f.result() for f in futures]
    return results

Performance Impact:

Processing 100 documents on Pro Plan:

Sequential: 100 × 18s = 1,800s (30 min)
Concurrent (3 workers): 100 × 18s / 3 = 600s (10 min)
3x faster with optimal concurrency

Batch Size Optimization

Recommended batch sizes:

Plan	Optimal Batch	Max Batch	Processing Time
Free	10 PDFs	10 PDFs	~3 minutes
Pro	25 PDFs	25 PDFs	~8 minutes
Pro+	50 PDFs	50 PDFs	~10 minutes
Enterprise	100-200 PDFs	500 PDFs	~20-40 minutes

Why these sizes?

Balance between efficiency and manageability
Fit in typical HTTP timeout windows
Allow for retries on failure
Don't overwhelm queue

Example:

Process 500 PDFs on Pro+ Plan:

Option 1: 10 batches of 50 (recommended)
Option 2: 20 batches of 25 (slower, more overhead)
Option 3: 1 batch of 500 (risky, long timeout)

Best practice: Stick to recommended batch sizes.

Network Optimization

Reduce Upload Times

For large PDFs:

Strategy 1: Upload during off-peak hours

Less network congestion
Faster uploads
Same processing speed

Strategy 2: Compress PDFs before upload

# Using Ghostscript
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 \
   -dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dBATCH \
   -sOutputFile=compressed.pdf original.pdf

Results:

Original: 50 MB (5-second upload)
Compressed: 10 MB (1-second upload)
4 seconds saved per document

Strategy 3: Use Azure VM in same region

If uploading from Azure VM:

Choose same region as Alactic deployment
Use Azure internal network
Much faster than public internet

Impact:

Same region: Less than 1ms latency
Different region: 50-200ms latency
50-200ms saved per API call

Optimize API Calls

Use batch endpoints:

Bad:

for doc in documents:
    result = api.process(doc)  # 100 separate API calls

Good:

results = api.batch_process(documents)  # 1 API call

Savings:

100 separate calls: 100 × 100ms = 10 seconds overhead
1 batch call: 100ms overhead
9.9 seconds saved

Use HTTP/2 connection reuse:

import requests
from requests.adapters import HTTPAdapter

# Reuse connections
session = requests.Session()
adapter = HTTPAdapter(pool_connections=10, pool_maxsize=20)
session.mount('https://', adapter)

# All requests use same connection
for doc in documents:
    session.post(url, ...)  # Faster due to connection reuse

Impact: 20-30% faster API calls

Monitoring and Diagnosis

Identify Performance Bottlenecks

Processing Time Breakdown:

Typical 10-page PDF (18 seconds total):

Upload and validation: 1.2s (7%)
PDF parsing: 3.8s (21%)
Text extraction: 2.1s (12%)
Model inference (GPT-4o mini): 0.4s (2%)
Summary generation: 8.5s (47%)
Vector embedding: 1.8s (10%)
Database write: 0.7s (4%)

Bottleneck: Summary generation (47% of time)

Optimization:

Use Quick Extract if summary not needed (save 8.5s)
Use Standard vs Deep to reduce summary complexity

Performance Monitoring

Track key metrics:

Dashboard → Settings → Performance Analytics

Average Processing Times (Last 7 Days):

By Document Size:
  1-5 pages: 12.3s avg (target: 10-15s) ✓
  6-10 pages: 18.7s avg (target: 15-20s) ✓
  11-20 pages: 35.2s avg (target: 30-40s) ✓
  21-50 pages: 2m 18s avg (target: 2-3 min) ✓

By Model:
  GPT-4o mini: 16.1s avg
  GPT-4o: 20.3s avg

By Analysis Depth:
  Quick: 10.2s avg
  Standard: 18.5s avg
  Deep: 28.9s avg

Throughput:
  Documents/hour: 180 (peak), 120 (avg)
  Success rate: 97.2%

What to watch:

Sudden increases in processing time (more than 20%)
Degrading success rate (less than 95%)
Decreasing throughput

Azure Monitor Metrics:

// Average processing time over time
Syslog
| where ProcessName == "alactic-worker"
| where SyslogMessage contains "completed"
| extend ProcessingTime = extract("time: ([0-9.]+)s", 1, SyslogMessage)
| summarize avg(todouble(ProcessingTime)) by bin(TimeGenerated, 1h)
| render timechart

Performance Testing

Benchmark your deployment:

import time
import requests

def benchmark_processing(num_documents=10):
    start_time = time.time()
    
    results = []
    for i in range(num_documents):
        doc_start = time.time()
        result = process_document(f"test_doc_{i}.pdf")
        doc_time = time.time() - doc_start
        results.append(doc_time)
    
    total_time = time.time() - start_time
    avg_time = sum(results) / len(results)
    
    print(f"Total time: {total_time:.2f}s")
    print(f"Average per doc: {avg_time:.2f}s")
    print(f"Throughput: {num_documents / total_time * 3600:.0f} docs/hour")
    
    return results

Expected results:

Pro Plan (10 documents):

Total time: 185.3s
Average per doc: 18.5s
Throughput: 194 docs/hour

If significantly slower:

Check VM CPU usage (may be throttled)
Verify network connectivity
Check Azure OpenAI status
Review recent configuration changes

Advanced Optimization Techniques

Caching Strategies

Cache frequently processed document types:

import hashlib
import json

# Calculate document hash
def get_document_hash(file_content):
    return hashlib.sha256(file_content).hexdigest()

# Check cache before processing
def process_with_cache(document):
    doc_hash = get_document_hash(document)
    
    # Check if already processed
    cached_result = cache.get(doc_hash)
    if cached_result:
        return cached_result  # Instant return
    
    # Process and cache
    result = process_document(document)
    cache.set(doc_hash, result, ttl=86400)  # Cache for 24 hours
    return result

Use cases:

Template documents processed repeatedly
Standard forms with minor variations
Recurring reports with same format

Performance gain:

Cache hit: Less than 100ms (vs 18 seconds)
99.4% faster for cached documents

Preprocessing Documents

Optimize documents before processing:

Strategy 1: Remove unnecessary pages

from PyPDF2 import PdfReader, PdfWriter

def extract_relevant_pages(input_pdf, pages_to_keep):
    reader = PdfReader(input_pdf)
    writer = PdfWriter()
    
    for page_num in pages_to_keep:
        writer.add_page(reader.pages[page_num])
    
    output_pdf = "processed.pdf"
    with open(output_pdf, "wb") as f:
        writer.write(f)
    
    return output_pdf

Example:

Original: 50-page PDF with 5 relevant pages
Processed: 5-page PDF
Processing time: 3 minutes → 20 seconds
89% faster

Strategy 2: OCR enhancement

For scanned PDFs with poor OCR:

from pdf2image import convert_from_path
import pytesseract

def enhance_ocr(input_pdf):
    # Convert to images
    images = convert_from_path(input_pdf, dpi=300)
    
    # OCR with Tesseract (better quality)
    text = ""
    for img in images:
        text += pytesseract.image_to_string(img)
    
    # Create new PDF with enhanced text
    # ... (save as new PDF)
    
    return enhanced_pdf

Benefit: Better extraction quality, potentially faster processing

Load Balancing (Enterprise)

For very high volumes:

Deploy multiple Alactic instances with load balancer:

                  ┌─────────────┐
      Internet ──→│ Load Balancer│
                  └─────────────┘
                         │
         ┌───────────────┼───────────────┐
         ↓               ↓               ↓
    ┌─────────┐     ┌─────────┐     ┌─────────┐
    │Alactic 1│     │Alactic 2│     │Alactic 3│
    └─────────┘     └─────────┘     └─────────┘

Benefits:

3x throughput
High availability
No single point of failure
Horizontal scaling

Cost:

3x infrastructure costs
Worth it for more than 5,000 docs/month

Plan-Specific Optimization

Free Plan Optimization

Challenge: Burstable VM with CPU credits

Strategy:

Spread processing throughout month:
- 70 docs over 30 days = 2-3 docs/day
- Avoid processing all at once
Process during idle recovery:
- Process in morning (credits regenerated overnight)
- Avoid back-to-back processing
Use Quick Extract aggressively:
- Reduces CPU load
- Conserves CPU credits
- Still gets text extraction

Expected performance:

First 5 docs/day: 12-15s each
Additional docs: 20-30s each (credits depleting)

Pro Plan Optimization

Challenge: Balance throughput with limited concurrency

Strategy:

Optimal concurrency: 3-4 jobs
- Don't max out at 5
- Leave headroom for stability
Batch size: 25 PDFs
- Optimal for Pro Plan
- ~8-10 minute batches
Model distribution:
- 85% GPT-4o mini (fast)
- 15% GPT-4o (quality)

Expected performance:

300 docs/month: 10 batches of 30
Total processing time: 2-3 hours/month
Average: 18-20s per document

Pro+ Plan Optimization

Challenge: Maximize high-capacity infrastructure

Strategy:

Optimal concurrency: 8-10 jobs
- Leverage 4 vCPUs
- Significantly faster than Pro
Batch size: 50 PDFs
- Take advantage of higher limits
- ~10-12 minute batches
Enable all features:
- Vector storage (plenty of resources)
- Deep analysis when needed
- Advanced processing

Expected performance:

1,500 docs/month: 30 batches of 50
Total processing time: 5-6 hours/month
Average: 14-16s per document
Throughput: 300-350 docs/hour (peak)

Troubleshooting Slow Performance

Diagnosis Checklist

Step 1: Check VM health

# SSH into VM
ssh -i ~/.ssh/id_rsa appuser@your-vm-ip

# Check CPU usage
top
# Look for processes using more than 90% CPU

# Check memory
free -h
# Should have at least 1 GB available

# Check disk
df -h
# Should have at least 10% free

Step 2: Check service status

sudo systemctl status alactic-api
sudo systemctl status alactic-worker

# Look for recent errors
sudo journalctl -u alactic-worker -n 100

Step 3: Check Azure OpenAI connectivity

curl -H "X-Deployment-Key: ak-xxxxx" \
     https://your-vm-ip/api/v1/health

# Check "azure_openai" status
# Should show "connected" with low latency (less than 500ms)

Step 4: Review recent changes

Did you change analysis depth settings?
Did you enable vector storage recently?
Are you processing larger documents than before?
Did Azure region have issues?

Common Performance Issues

Issue 1: Slow processing (more than 30s per document)

Causes:

Azure OpenAI throttling (HTTP 429 errors)
VM resource exhaustion
Network connectivity issues
Corrupted/complex PDFs

Solutions:

Check Azure OpenAI quota and request increase
Review VM metrics for CPU/memory exhaustion
Test network connectivity to Azure OpenAI
Try processing simpler test document

Issue 2: Degrading performance over time

Causes:

CPU credit depletion (Free Plan)
Database bloat (too many documents)
Disk space exhaustion
Memory leaks (rare)

Solutions:

For Free Plan: Space out processing
Delete old documents to reduce database size
Clean up disk space
Restart services: sudo systemctl restart alactic-worker

Issue 3: Inconsistent processing times

Causes:

Variable document complexity
Azure OpenAI variable latency
Other workloads on VM
Network congestion

Solutions:

Group similar documents together
Process during off-peak hours
Ensure no other intensive processes running
Consider upgrading to higher plan

Performance Checklist

Weekly Tasks

Monitor average processing time
Check resource utilization (CPU, memory, disk)
Review success rate
Identify any performance degradation

Monthly Tasks

Analyze performance trends
Benchmark against baselines
Clean up old documents
Review and optimize settings

Quarterly Tasks

Comprehensive performance audit
Consider plan upgrade if needed
Implement new optimization strategies
Update performance baselines

Performance Metrics​

Key Performance Indicators​

Infrastructure Optimization​

VM Performance​

Understand VM Capabilities​

Optimize VM Configuration​

Database Performance​

Cosmos DB Optimization​

Storage Performance​

Application-Level Optimization​

Processing Settings​

Choose Optimal Analysis Depth​

Model Selection for Speed​

Vector Storage Settings​

Concurrent Processing​

Optimize Concurrency Levels​

Batch Size Optimization​

Network Optimization​

Reduce Upload Times​

Optimize API Calls​

Monitoring and Diagnosis​

Identify Performance Bottlenecks​

Performance Monitoring​

Performance Testing​

Advanced Optimization Techniques​

Caching Strategies​

Preprocessing Documents​

Load Balancing (Enterprise)​

Plan-Specific Optimization​

Free Plan Optimization​

Pro Plan Optimization​

Pro+ Plan Optimization​

Troubleshooting Slow Performance​

Diagnosis Checklist​

Common Performance Issues​

Performance Checklist​

Weekly Tasks​

Monthly Tasks​

Quarterly Tasks​

Related Documentation​

Performance Metrics

Key Performance Indicators

Infrastructure Optimization

VM Performance

Understand VM Capabilities

Optimize VM Configuration

Database Performance

Cosmos DB Optimization

Storage Performance

Application-Level Optimization

Processing Settings

Choose Optimal Analysis Depth

Model Selection for Speed

Vector Storage Settings

Concurrent Processing

Optimize Concurrency Levels

Batch Size Optimization

Network Optimization

Reduce Upload Times

Optimize API Calls

Monitoring and Diagnosis

Identify Performance Bottlenecks

Performance Monitoring

Performance Testing

Advanced Optimization Techniques

Caching Strategies

Preprocessing Documents

Load Balancing (Enterprise)

Plan-Specific Optimization

Free Plan Optimization

Pro Plan Optimization

Pro+ Plan Optimization

Troubleshooting Slow Performance

Diagnosis Checklist

Common Performance Issues

Performance Checklist

Weekly Tasks

Monthly Tasks

Quarterly Tasks

Related Documentation