Skip to main content

Performance Tuning

Optimize your Alactic AGI deployment for maximum speed, efficiency, and throughput. This guide covers performance optimization strategies across infrastructure, application settings, and operational practices.

Performance Metrics

Key Performance Indicators

Processing Speed:

  • Target: 10-20 seconds per 10-page PDF
  • Measurement: Time from job submission to completion
  • Factors: VM SKU, model choice, analysis depth, document complexity

Throughput:

  • Target: Process 80-90% of plan quota per month
  • Measurement: Documents processed per hour/day/month
  • Factors: Concurrent processing capacity, batch sizes

Resource Utilization:

  • Target: CPU 60-75%, Memory 65-80% during processing
  • Measurement: Azure Monitor metrics
  • Factors: Workload patterns, VM sizing

Success Rate:

  • Target: More than 95% successful processing
  • Measurement: Completed jobs / total jobs
  • Factors: Document quality, configuration settings

Infrastructure Optimization

VM Performance

Understand VM Capabilities

Free Plan (Standard_B2s):

  • Burstable Performance: Uses CPU credits
  • Sustained Load: Performance degrades when credits exhausted
  • Best For: Intermittent workloads
  • Limitation: Not suitable for consistent heavy processing

Credits exhaustion symptoms:

  • First 10 docs: 12-15 seconds each
  • After 50 docs: 25-35 seconds each (2-3x slower)
  • Recovery: Credits regenerate during idle time

Recommendation: Spread processing throughout month, avoid large batches.

Pro Plan (Standard_D2s_v3):

  • Dedicated Performance: No CPU credits
  • Sustained Load: Consistent performance
  • Best For: Regular production workloads
  • Concurrent Processing: 5 jobs simultaneously

Pro+ Plan (Standard_D4s_v3):

  • 2x vCPUs: Double compute power
  • 2x RAM: Better for large documents
  • Concurrent Processing: 10-12 jobs simultaneously
  • Best For: High-volume operations

Optimize VM Configuration

Enable Accelerated Networking (Pro+ and Enterprise):

# Check if enabled
az network nic show --resource-group alactic-rg --name alactic-nic \
--query "enableAcceleratedNetworking"

# Enable if false
az network nic update --resource-group alactic-rg --name alactic-nic \
--accelerated-networking true

Benefits:

  • Lower network latency (50% reduction)
  • Higher throughput
  • Better CPU utilization
  • Impact: 5-10% faster document processing

Configure Premium SSD:

Already included in all plans, but verify:

az disk show --resource-group alactic-rg --name alactic-vm-disk \
--query "sku.name"
# Should return: Premium_LRS

Benefits:

  • Faster read/write operations
  • Lower latency for database access
  • Better for large documents

Database Performance

Cosmos DB Optimization

Query Optimization:

Alactic uses optimized queries, but you can monitor:

Azure Portal → Cosmos DB → Metrics → Request Units

Target: Less than 2,000 RU/s average (serverless limit: 5,000 RU/s)

If approaching limit:

  1. Archive old documents (reduce database size)
  2. Disable vector storage if not using search
  3. Consider dedicated throughput tier (Enterprise)

Indexing:

Default indexing is optimized for Alactic workloads. No changes needed.

Connection Pooling:

Alactic maintains connection pools for efficiency:

  • Default: 50 connections
  • Under load: Up to 100 connections
  • Automatically managed

Storage Performance

Premium SSD Configuration:

Already optimal, but you can verify IOPS:

az disk show --resource-group alactic-rg --name alactic-vm-disk \
--query "{IOPS: diskIOPSReadWrite, Throughput: diskMBpsReadWrite}"

Pro Plan (128 GB Premium SSD):

  • IOPS: 500
  • Throughput: 100 MB/s
  • Sufficient for moderate workloads

Pro+ Plan (256 GB Premium SSD):

  • IOPS: 1,100
  • Throughput: 125 MB/s
  • Better for high-volume processing

Optimize Storage Access:

# Check disk cache settings
az vm show --resource-group alactic-rg --name alactic-vm \
--query "storageProfile.osDisk.caching"
# Should return: ReadWrite (optimal for Alactic)

Application-Level Optimization

Processing Settings

Choose Optimal Analysis Depth

Three levels with different performance profiles:

DepthProcessing TimeUse CaseOutput Quality
Quick Extract8-12sText extraction onlyText only
Standard Analysis15-20sSummary + key pointsBalanced
Deep Analysis25-35sFull analysis + entitiesComprehensive

Optimization Strategy:

Default to Standard, use others selectively:

def choose_analysis_depth(document_type):
if document_type == "simple_invoice":
return "quick" # Just need text
elif document_type == "technical_report":
return "deep" # Need detailed analysis
else:
return "standard" # Balanced default

Performance Impact:

Processing 100 documents:

  • All Deep: 2,500 seconds (~42 minutes)
  • All Standard: 1,800 seconds (~30 minutes)
  • All Quick: 1,000 seconds (~17 minutes)

Savings: 25-40% faster with appropriate depth selection

Model Selection for Speed

Processing time comparison (10-page PDF):

ModelAvg Time95th PercentileUse When
GPT-4o mini15-18s22sSpeed priority
GPT-4o18-22s28sQuality priority

GPT-4o mini is 15-20% faster.

Speed Optimization:

  • Use mini for time-sensitive processing
  • Use GPT-4o when quality is critical
  • Consider cascade strategy (mini first, GPT-4o if needed)

Vector Storage Settings

Impact on performance:

With vectors enabled:

  • Additional 2-3 seconds per document
  • Embedding API calls required
  • Extra database writes

Without vectors:

  • 2-3 seconds faster
  • No embedding costs
  • Smaller database footprint

Recommendation:

Disable if not using semantic search:

curl -X POST https://your-vm-ip/api/v1/process \
-H "X-Deployment-Key: ak-xxxxx" \
-F "file=@document.pdf" \
-F "enable_vectors=false"

Or set as default:

Dashboard → Settings → Processing → Enable Vectors → Off

Performance gain: 15-20% faster processing

Concurrent Processing

Optimize Concurrency Levels

Concurrent job limits by plan:

PlanMax ConcurrentOptimal ConcurrentQueue Limit
Free21-250
Pro53-4200
Pro+128-10500
Enterprise50+30-405,000

Why not max out concurrency?

  • Leave headroom for system overhead
  • Better stability and responsiveness
  • Prevents resource contention

Optimal concurrency formula:

Optimal = (vCPUs * 2) - 1

Pro (2 vCPUs): 3 concurrent jobs
Pro+ (4 vCPUs): 7 concurrent jobs

Implementation:

from concurrent.futures import ThreadPoolExecutor

def process_batch(documents, max_workers=3):
with ThreadPoolExecutor(max_workers=max_workers) as executor:
futures = [executor.submit(process_doc, doc) for doc in documents]
results = [f.result() for f in futures]
return results

Performance Impact:

Processing 100 documents on Pro Plan:

  • Sequential: 100 × 18s = 1,800s (30 min)
  • Concurrent (3 workers): 100 × 18s / 3 = 600s (10 min)
  • 3x faster with optimal concurrency

Batch Size Optimization

Recommended batch sizes:

PlanOptimal BatchMax BatchProcessing Time
Free10 PDFs10 PDFs~3 minutes
Pro25 PDFs25 PDFs~8 minutes
Pro+50 PDFs50 PDFs~10 minutes
Enterprise100-200 PDFs500 PDFs~20-40 minutes

Why these sizes?

  • Balance between efficiency and manageability
  • Fit in typical HTTP timeout windows
  • Allow for retries on failure
  • Don't overwhelm queue

Example:

Process 500 PDFs on Pro+ Plan:

  • Option 1: 10 batches of 50 (recommended)
  • Option 2: 20 batches of 25 (slower, more overhead)
  • Option 3: 1 batch of 500 (risky, long timeout)

Best practice: Stick to recommended batch sizes.

Network Optimization

Reduce Upload Times

For large PDFs:

Strategy 1: Upload during off-peak hours

  • Less network congestion
  • Faster uploads
  • Same processing speed

Strategy 2: Compress PDFs before upload

# Using Ghostscript
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 \
-dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dBATCH \
-sOutputFile=compressed.pdf original.pdf

Results:

  • Original: 50 MB (5-second upload)
  • Compressed: 10 MB (1-second upload)
  • 4 seconds saved per document

Strategy 3: Use Azure VM in same region

If uploading from Azure VM:

  • Choose same region as Alactic deployment
  • Use Azure internal network
  • Much faster than public internet

Impact:

  • Same region: Less than 1ms latency
  • Different region: 50-200ms latency
  • 50-200ms saved per API call

Optimize API Calls

Use batch endpoints:

Bad:

for doc in documents:
result = api.process(doc) # 100 separate API calls

Good:

results = api.batch_process(documents)  # 1 API call

Savings:

  • 100 separate calls: 100 × 100ms = 10 seconds overhead
  • 1 batch call: 100ms overhead
  • 9.9 seconds saved

Use HTTP/2 connection reuse:

import requests
from requests.adapters import HTTPAdapter

# Reuse connections
session = requests.Session()
adapter = HTTPAdapter(pool_connections=10, pool_maxsize=20)
session.mount('https://', adapter)

# All requests use same connection
for doc in documents:
session.post(url, ...) # Faster due to connection reuse

Impact: 20-30% faster API calls

Monitoring and Diagnosis

Identify Performance Bottlenecks

Processing Time Breakdown:

Typical 10-page PDF (18 seconds total):

  1. Upload and validation: 1.2s (7%)
  2. PDF parsing: 3.8s (21%)
  3. Text extraction: 2.1s (12%)
  4. Model inference (GPT-4o mini): 0.4s (2%)
  5. Summary generation: 8.5s (47%)
  6. Vector embedding: 1.8s (10%)
  7. Database write: 0.7s (4%)

Bottleneck: Summary generation (47% of time)

Optimization:

  • Use Quick Extract if summary not needed (save 8.5s)
  • Use Standard vs Deep to reduce summary complexity

Performance Monitoring

Track key metrics:

Dashboard → Settings → Performance Analytics

Average Processing Times (Last 7 Days):

By Document Size:
1-5 pages: 12.3s avg (target: 10-15s) ✓
6-10 pages: 18.7s avg (target: 15-20s) ✓
11-20 pages: 35.2s avg (target: 30-40s) ✓
21-50 pages: 2m 18s avg (target: 2-3 min) ✓

By Model:
GPT-4o mini: 16.1s avg
GPT-4o: 20.3s avg

By Analysis Depth:
Quick: 10.2s avg
Standard: 18.5s avg
Deep: 28.9s avg

Throughput:
Documents/hour: 180 (peak), 120 (avg)
Success rate: 97.2%

What to watch:

  • Sudden increases in processing time (more than 20%)
  • Degrading success rate (less than 95%)
  • Decreasing throughput

Azure Monitor Metrics:

// Average processing time over time
Syslog
| where ProcessName == "alactic-worker"
| where SyslogMessage contains "completed"
| extend ProcessingTime = extract("time: ([0-9.]+)s", 1, SyslogMessage)
| summarize avg(todouble(ProcessingTime)) by bin(TimeGenerated, 1h)
| render timechart

Performance Testing

Benchmark your deployment:

import time
import requests

def benchmark_processing(num_documents=10):
start_time = time.time()

results = []
for i in range(num_documents):
doc_start = time.time()
result = process_document(f"test_doc_{i}.pdf")
doc_time = time.time() - doc_start
results.append(doc_time)

total_time = time.time() - start_time
avg_time = sum(results) / len(results)

print(f"Total time: {total_time:.2f}s")
print(f"Average per doc: {avg_time:.2f}s")
print(f"Throughput: {num_documents / total_time * 3600:.0f} docs/hour")

return results

Expected results:

Pro Plan (10 documents):

Total time: 185.3s
Average per doc: 18.5s
Throughput: 194 docs/hour

If significantly slower:

  • Check VM CPU usage (may be throttled)
  • Verify network connectivity
  • Check Azure OpenAI status
  • Review recent configuration changes

Advanced Optimization Techniques

Caching Strategies

Cache frequently processed document types:

import hashlib
import json

# Calculate document hash
def get_document_hash(file_content):
return hashlib.sha256(file_content).hexdigest()

# Check cache before processing
def process_with_cache(document):
doc_hash = get_document_hash(document)

# Check if already processed
cached_result = cache.get(doc_hash)
if cached_result:
return cached_result # Instant return

# Process and cache
result = process_document(document)
cache.set(doc_hash, result, ttl=86400) # Cache for 24 hours
return result

Use cases:

  • Template documents processed repeatedly
  • Standard forms with minor variations
  • Recurring reports with same format

Performance gain:

  • Cache hit: Less than 100ms (vs 18 seconds)
  • 99.4% faster for cached documents

Preprocessing Documents

Optimize documents before processing:

Strategy 1: Remove unnecessary pages

from PyPDF2 import PdfReader, PdfWriter

def extract_relevant_pages(input_pdf, pages_to_keep):
reader = PdfReader(input_pdf)
writer = PdfWriter()

for page_num in pages_to_keep:
writer.add_page(reader.pages[page_num])

output_pdf = "processed.pdf"
with open(output_pdf, "wb") as f:
writer.write(f)

return output_pdf

Example:

  • Original: 50-page PDF with 5 relevant pages
  • Processed: 5-page PDF
  • Processing time: 3 minutes → 20 seconds
  • 89% faster

Strategy 2: OCR enhancement

For scanned PDFs with poor OCR:

from pdf2image import convert_from_path
import pytesseract

def enhance_ocr(input_pdf):
# Convert to images
images = convert_from_path(input_pdf, dpi=300)

# OCR with Tesseract (better quality)
text = ""
for img in images:
text += pytesseract.image_to_string(img)

# Create new PDF with enhanced text
# ... (save as new PDF)

return enhanced_pdf

Benefit: Better extraction quality, potentially faster processing

Load Balancing (Enterprise)

For very high volumes:

Deploy multiple Alactic instances with load balancer:

                  ┌─────────────┐
Internet ──→│ Load Balancer│
└─────────────┘

┌───────────────┼───────────────┐
↓ ↓ ↓
┌─────────┐ ┌─────────┐ ┌─────────┐
│Alactic 1│ │Alactic 2│ │Alactic 3│
└─────────┘ └─────────┘ └─────────┘

Benefits:

  • 3x throughput
  • High availability
  • No single point of failure
  • Horizontal scaling

Cost:

  • 3x infrastructure costs
  • Worth it for more than 5,000 docs/month

Plan-Specific Optimization

Free Plan Optimization

Challenge: Burstable VM with CPU credits

Strategy:

  1. Spread processing throughout month:

    • 70 docs over 30 days = 2-3 docs/day
    • Avoid processing all at once
  2. Process during idle recovery:

    • Process in morning (credits regenerated overnight)
    • Avoid back-to-back processing
  3. Use Quick Extract aggressively:

    • Reduces CPU load
    • Conserves CPU credits
    • Still gets text extraction

Expected performance:

  • First 5 docs/day: 12-15s each
  • Additional docs: 20-30s each (credits depleting)

Pro Plan Optimization

Challenge: Balance throughput with limited concurrency

Strategy:

  1. Optimal concurrency: 3-4 jobs

    • Don't max out at 5
    • Leave headroom for stability
  2. Batch size: 25 PDFs

    • Optimal for Pro Plan
    • ~8-10 minute batches
  3. Model distribution:

    • 85% GPT-4o mini (fast)
    • 15% GPT-4o (quality)

Expected performance:

  • 300 docs/month: 10 batches of 30
  • Total processing time: 2-3 hours/month
  • Average: 18-20s per document

Pro+ Plan Optimization

Challenge: Maximize high-capacity infrastructure

Strategy:

  1. Optimal concurrency: 8-10 jobs

    • Leverage 4 vCPUs
    • Significantly faster than Pro
  2. Batch size: 50 PDFs

    • Take advantage of higher limits
    • ~10-12 minute batches
  3. Enable all features:

    • Vector storage (plenty of resources)
    • Deep analysis when needed
    • Advanced processing

Expected performance:

  • 1,500 docs/month: 30 batches of 50
  • Total processing time: 5-6 hours/month
  • Average: 14-16s per document
  • Throughput: 300-350 docs/hour (peak)

Troubleshooting Slow Performance

Diagnosis Checklist

Step 1: Check VM health

# SSH into VM
ssh -i ~/.ssh/id_rsa appuser@your-vm-ip

# Check CPU usage
top
# Look for processes using more than 90% CPU

# Check memory
free -h
# Should have at least 1 GB available

# Check disk
df -h
# Should have at least 10% free

Step 2: Check service status

sudo systemctl status alactic-api
sudo systemctl status alactic-worker

# Look for recent errors
sudo journalctl -u alactic-worker -n 100

Step 3: Check Azure OpenAI connectivity

curl -H "X-Deployment-Key: ak-xxxxx" \
https://your-vm-ip/api/v1/health

# Check "azure_openai" status
# Should show "connected" with low latency (less than 500ms)

Step 4: Review recent changes

  • Did you change analysis depth settings?
  • Did you enable vector storage recently?
  • Are you processing larger documents than before?
  • Did Azure region have issues?

Common Performance Issues

Issue 1: Slow processing (more than 30s per document)

Causes:

  • Azure OpenAI throttling (HTTP 429 errors)
  • VM resource exhaustion
  • Network connectivity issues
  • Corrupted/complex PDFs

Solutions:

  • Check Azure OpenAI quota and request increase
  • Review VM metrics for CPU/memory exhaustion
  • Test network connectivity to Azure OpenAI
  • Try processing simpler test document

Issue 2: Degrading performance over time

Causes:

  • CPU credit depletion (Free Plan)
  • Database bloat (too many documents)
  • Disk space exhaustion
  • Memory leaks (rare)

Solutions:

  • For Free Plan: Space out processing
  • Delete old documents to reduce database size
  • Clean up disk space
  • Restart services: sudo systemctl restart alactic-worker

Issue 3: Inconsistent processing times

Causes:

  • Variable document complexity
  • Azure OpenAI variable latency
  • Other workloads on VM
  • Network congestion

Solutions:

  • Group similar documents together
  • Process during off-peak hours
  • Ensure no other intensive processes running
  • Consider upgrading to higher plan

Performance Checklist

Weekly Tasks

  • Monitor average processing time
  • Check resource utilization (CPU, memory, disk)
  • Review success rate
  • Identify any performance degradation

Monthly Tasks

  • Analyze performance trends
  • Benchmark against baselines
  • Clean up old documents
  • Review and optimize settings

Quarterly Tasks

  • Comprehensive performance audit
  • Consider plan upgrade if needed
  • Implement new optimization strategies
  • Update performance baselines