Skip to main content

Cost Optimization

Maximize the value of your Alactic AGI deployment by optimizing costs across infrastructure, model usage, and operational efficiency. This comprehensive guide covers strategies to reduce spending while maintaining quality and performance.

Cost Structure Overview

Fixed vs Variable Costs

Fixed Costs (Monthly Infrastructure):

PlanFixed CostPercentage of Total
Free$7298% (minimal model costs)
Pro$14798%
Pro+$29598%
EnterpriseCustom95-98%

Key Insight: Infrastructure is the dominant cost. Model usage (variable cost) is typically less than 2-3% of total.

Variable Costs:

  • Model API calls (Azure OpenAI tokens)
  • Typically $0.0015 to $0.025 per document
  • Scales with volume and model choice

Implication: Focus on maximizing document volume per infrastructure dollar rather than microoptimizing model costs.

Infrastructure Optimization

Right-Sizing Your Plan

Choose the appropriate plan based on actual usage:

Decision Matrix:

Monthly DocumentsStorage NeedRecommended PlanMonthly Cost
Less than 70Less than 100 MBFree$72
70-300100 MB - 5 GBPro$147
300-1,5005-20 GBPro+$295
More than 1,500More than 20 GBEnterpriseCustom

Avoid over-provisioning: Don't upgrade to Pro+ if you only process 200 docs/month. Pro is sufficient and saves $148/month.

Example:

  • Currently on Pro+ ($295/month)
  • Average usage: 250 documents/month
  • Action: Downgrade to Pro ($147/month)
  • Savings: $148/month = $1,776/year

VM Management Strategies

Deallocate During Downtime

For non-production deployments:

If you have predictable downtime periods (nights, weekends), deallocate the VM to save compute costs.

Savings Calculation:

Pro Plan VM (Standard_D2s_v3): $105/month = $3.50/day

If deallocated 50% of time:

  • Active time: $52.50/month
  • Savings: $52.50/month
  • Remaining costs: Storage ($19.20), Cosmos DB ($12), etc.

How to Deallocate:

# Deallocate VM (stop billing for compute)
az vm deallocate --resource-group alactic-rg --name alactic-vm

# Start VM when needed
az vm start --resource-group alactic-rg --name alactic-vm

Automation with Azure Automation:

Schedule automatic start/stop:

Start: Monday-Friday 8 AM
Stop: Monday-Friday 6 PM
Weekend: Deallocated

Savings: ~60% of VM compute costs

Caution: Not recommended for production deployments with SLA requirements.

Resize to Lower SKU During Low Usage

Temporary downgrade during low-activity periods:

Example:

  • Normal: Pro+ Plan (D4s_v3, $210/month)
  • Low season: Downgrade to Pro Plan (D2s_v3, $105/month)
  • Duration: 3 months
  • Savings: $315 over low season

How to Resize:

# Stop VM
az vm deallocate --resource-group alactic-rg --name alactic-vm

# Resize to smaller SKU
az vm resize --resource-group alactic-rg --name alactic-vm --size Standard_D2s_v3

# Start VM
az vm start --resource-group alactic-rg --name alactic-vm

Downtime: 5-15 minutes

Storage Optimization

Regular Cleanup

Delete old processed documents:

Strategy 1: Age-based cleanup

# Delete documents older than 90 days via API
curl -X DELETE "https://your-vm-ip/api/v1/documents?older_than=90d" \
-H "X-Deployment-Key: ak-xxxxx"

Strategy 2: Export and archive

# Export to cheaper storage before deleting
import requests
import boto3 # or Azure Blob Storage

def archive_and_delete(document_id):
# Download result
result = requests.get(
f"https://your-vm-ip/api/v1/results/{document_id}",
headers={"X-Deployment-Key": "ak-xxxxx"}
).json()

# Upload to S3 or Azure Blob (much cheaper)
s3.put_object(
Bucket='alactic-archive',
Key=f'results/{document_id}.json',
Body=json.dumps(result)
)

# Delete from Alactic
requests.delete(
f"https://your-vm-ip/api/v1/documents/{document_id}",
headers={"X-Deployment-Key": "ak-xxxxx"}
)

Cost comparison:

Storage TypeCost per GB/Month100 GB Cost
Alactic (Premium SSD)$2.30$230
Azure Blob (Hot)$0.18$18
Azure Blob (Cool)$0.01$1
AWS S3 Glacier$0.004$0.40

Savings: Archive 100 GB of old results = Save $229/month

Disable Vector Storage

If not using semantic search:

curl -X POST https://your-vm-ip/api/v1/process \
-H "X-Deployment-Key: ak-xxxxx" \
-F "file=@document.pdf" \
-F "enable_vectors=false"

Savings:

  • Vector embeddings: ~2 KB per page
  • 1,000 pages = 2 MB saved per processing
  • Also saves embedding API costs
  • Faster processing (2-3 seconds per doc)

Recommendation: Disable unless specifically using semantic search features.

Compress PDFs Before Upload

Reduce storage and bandwidth costs:

# Using Ghostscript
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook \
-dNOPAUSE -dQUIET -dBATCH \
-sOutputFile=compressed.pdf input.pdf

Typical compression:

  • Original: 10 MB
  • Compressed: 2-3 MB
  • 70-80% size reduction
  • Same text extraction quality

Benefits:

  • Less storage used
  • Faster upload
  • Faster processing
  • Lower bandwidth costs

Model Cost Optimization

Strategic Model Selection

Cost per document by model:

ModelTypical 10-Page Doc CostUse Case
GPT-4o mini$0.0015Simple extraction, routine processing
GPT-4o$0.025Complex analysis, critical accuracy

GPT-4o is 16.7x more expensive than mini.

Optimization Strategies

Strategy 1: Default to Mini

Set GPT-4o mini as default for all processing:

Dashboard → Settings → Processing → Default Model → GPT-4o mini

Override to GPT-4o only when necessary:

  • Legal contracts requiring nuanced understanding
  • Financial analysis requiring precision
  • Medical documents requiring critical accuracy
  • Complex technical specifications

Impact:

If processing 500 documents/month:

  • All GPT-4o: 500 × $0.025 = $12.50/month
  • All mini: 500 × $0.0015 = $0.75/month
  • Savings: $11.75/month = $141/year

Strategy 2: Cascade Approach

Process with mini first, escalate if needed:

def smart_process(document):
# Try with GPT-4o mini first
result = process_document(document, model="gpt-4o-mini")

# Check confidence score
if result["confidence"] < 0.85:
# Low confidence - reprocess with GPT-4o
result = process_document(document, model="gpt-4o")

return result

Typical outcome:

  • 85% of documents: GPT-4o mini only
  • 15% of documents: Reprocessed with GPT-4o

Cost calculation (500 docs):

  • 425 docs: GPT-4o mini only = $0.64
  • 75 docs: Mini + GPT-4o = $0.11 + $1.88 = $1.99
  • Total: $2.63/month
  • vs All GPT-4o: $12.50/month
  • Savings: $9.87/month = $118/year

Strategy 3: Document Type-Based Routing

Route documents to appropriate model based on type:

def route_by_type(document):
doc_type = identify_document_type(document)

if doc_type in ["contract", "financial_report", "medical"]:
model = "gpt-4o" # High stakes
else:
model = "gpt-4o-mini" # Standard processing

return process_document(document, model=model)

Example distribution (500 docs/month):

  • 50 contracts: GPT-4o ($1.25)
  • 100 financial reports: GPT-4o ($2.50)
  • 350 other docs: GPT-4o mini ($0.53)
  • Total: $4.28/month
  • Savings: $8.22/month vs all GPT-4o

Optimize Analysis Depth

Three analysis modes with different costs:

ModeOutput TokensCost (10-page doc)Processing Time
Quick Extract~50$0.00038s
Standard Analysis~500$0.001515s
Deep Analysis~1,500$0.004525s

Deep Analysis costs 3x more than Standard.

Optimization:

Use appropriate depth for task:

  • Quick Extract: Only need text, no analysis
  • Standard: Need summary and key points (most common)
  • Deep: Need entities, sentiment, detailed analysis

Impact (500 docs with mini):

ScenarioMonthly CostSavings
All Deep$2.25Baseline
All Standard$0.75$1.50/month
80% Standard, 20% Deep$1.05$1.20/month

Recommendation: Default to Standard, use Deep selectively.

Minimize Reprocessing

Avoid processing the same document multiple times:

Common causes of reprocessing:

  • Wrong settings first time
  • Experimenting with different models
  • Testing different analysis depths
  • User errors (wrong file uploaded)

Prevention strategies:

  1. Validate settings before processing:
def validate_before_process(file, settings):
# Check file is correct
print(f"Processing: {file.name}")
print(f"Model: {settings['model']}")
print(f"Analysis depth: {settings['depth']}")

confirm = input("Proceed? (y/n): ")
if confirm.lower() != 'y':
return None

return process_document(file, **settings)
  1. Cache results: Store results locally to avoid re-fetching from API

  2. Use staging environment: Test with sample documents before processing full batch

Cost impact:

If 10% of documents reprocessed unnecessarily:

  • 500 docs at $0.0015 = $0.75
  • 50 reprocessed = $0.08 wasted
  • Prevent reprocessing: Save $0.08/month

Small but adds up over time.

Operational Efficiency

Batch Processing

Process documents in batches for efficiency:

Benefits:

  • Lower API overhead
  • Better resource utilization
  • Faster total processing time
  • Same cost per document

Optimal batch sizes:

PlanOptimal Batch SizeProcessing TimeCost per Doc
Free10 PDFs~3 minutesSame
Pro25 PDFs~15 minutesSame
Pro+50 PDFs~35 minutesSame
Enterprise500 PDFs~5 hoursSame

Example:

Process 500 PDFs:

  • Individual: 500 API calls, 2.5 hours
  • Batched (10 batches of 50): 10 API calls, 2.5 hours
  • Same cost, simpler workflow

Scheduling During Off-Peak

Process during off-peak hours for better performance:

Azure OpenAI has variable load throughout the day:

  • Peak hours (9 AM - 5 PM PT): Higher latency, potential throttling
  • Off-peak (6 PM - 8 AM PT): Lower latency, faster processing

Benefits:

  • Faster processing (20-30% improvement)
  • Lower risk of rate limiting
  • Same cost

Implementation:

from datetime import datetime, time

def is_off_peak():
current_hour = datetime.now().hour
# Off-peak: 6 PM to 8 AM PT
return current_hour >= 18 or current_hour < 8

def schedule_processing(documents):
if is_off_peak():
process_immediately(documents)
else:
schedule_for_evening(documents)

Optimize API Usage

Reduce unnecessary API calls:

1. Use batch endpoints instead of individual:

Bad:

for doc in documents:
result = api.process(doc) # 100 API calls

Good:

results = api.batch_process(documents)  # 1 API call

2. Poll less frequently:

Bad:

while not complete:
status = check_status(job_id)
time.sleep(1) # Check every second

Good:

while not complete:
status = check_status(job_id)
time.sleep(10) # Check every 10 seconds

3. Use webhooks instead of polling:

# Set webhook URL
result = api.process(doc, webhook_url="https://yourapp.com/webhook")

# Receive notification when done
# No polling needed

Savings: Minimal cost impact but improves efficiency and reduces load.

Monitoring and Cost Control

Set Budget Alerts

Configure alerts to prevent overspending:

Azure Budget Configuration:

  1. Azure Portal → Cost Management → Budgets
  2. Create budget:
    • Amount: $200/month (Pro Plan with buffer)
    • Period: Monthly
  3. Add alerts:
    • 75% ($150): Warning email
    • 90% ($180): Critical email
    • 100% ($200): Action (email + consider stopping processing)

Benefits:

  • Early warning of unexpected costs
  • Prevent budget overruns
  • Identify cost anomalies quickly

Track Cost per Document

Monitor cost efficiency over time:

Formula:

Cost per Document = (Infrastructure + Model Costs) / Documents Processed

Target cost per document:

PlanTarget Cost/DocGood PerformanceNeeds Optimization
Free$1.03 (70 docs)Less than $1.20More than $1.50
Pro$0.49 (300 docs)Less than $0.55More than $0.70
Pro+$0.20 (1,500 docs)Less than $0.22More than $0.30

If cost per document is high:

  • Not processing enough documents (underutilized infrastructure)
  • Consider downgrading plan
  • Or increase document volume to amortize fixed costs

Usage Analytics

Review monthly cost breakdown:

Dashboard → Settings → Usage Statistics → Cost Analysis

Key metrics to track:

  1. Infrastructure costs: Should be constant
  2. Model costs: Should scale linearly with volume
  3. Cost per document trend: Should decrease as volume increases
  4. Model distribution: Percentage using GPT-4o vs mini

Example report:

March 2024 Cost Analysis:

Infrastructure: $147.00 (98%)
Model API: $2.85 (2%)
Total: $149.85

Documents Processed: 285
Cost per Document: $0.53

Model Distribution:
GPT-4o mini: 245 docs (86%) - $0.37
GPT-4o: 40 docs (14%) - $1.00

Trend: +12% docs vs February
Cost efficiency: Improved 8%

Action items:

  • On track for quota
  • Good model distribution
  • Cost efficiency improving

Cost Comparison: Build vs Buy

Building Custom Solution

Estimated costs for equivalent functionality:

Development Costs:

  • Backend API: 200 hours × $100/hr = $20,000
  • Frontend dashboard: 150 hours × $100/hr = $15,000
  • DevOps/Infrastructure: 80 hours × $100/hr = $8,000
  • Testing and QA: 100 hours × $100/hr = $10,000
  • Total Development: $53,000

Ongoing Monthly Costs:

  • VM compute: $105-210/month
  • Azure OpenAI API: $5-50/month
  • Database (Cosmos DB): $12-30/month
  • Storage: $8-15/month
  • Monitoring tools: $10-30/month
  • Maintenance (20 hrs/month): $2,000/month
  • Total Monthly: $2,140-2,335

Annual Total:

  • Year 1: $53,000 + $25,680 = $78,680
  • Year 2+: $25,680/year

Using Alactic AGI

Costs:

  • Deployment: $0
  • Pro Plan: $147/month = $1,764/year
  • Pro+ Plan: $295/month = $3,540/year

Savings:

  • vs Custom (Year 1): $75,140 (Pro) or $75,140 (Pro+)
  • vs Custom (Year 2+): $23,916 (Pro) or $22,140 (Pro+)

ROI: Alactic AGI pays for itself in first month.

Using Direct OpenAI API

Estimated costs for 500 documents/month:

Infrastructure (Self-Managed):

  • VM: $105/month
  • Storage: $8/month
  • Database: $12/month
  • Subtotal: $125/month

OpenAI API Costs:

  • GPT-4o mini: $0.0015 × 500 = $0.75
  • Or GPT-4o: $0.025 × 500 = $12.50

Total: $125.75 - $137.50/month

Missing from direct API:

  • No PDF parsing (must implement)
  • No URL scraping (must implement)
  • No vector storage (must implement)
  • No dashboard UI (must implement)
  • No batch processing (must implement)
  • No usage tracking (must implement)

Development to add these: $20,000-40,000

Alactic AGI Pro Plan:

  • $147/month
  • All features included
  • No development required

Value: $20,000-40,000 in features

Optimization Checklist

Monthly Tasks

Week 1:

  • Review previous month costs
  • Check cost per document trend
  • Verify model distribution (aim for 80%+ mini)
  • Clean up old documents (more than 90 days)

Week 2:

  • Analyze usage patterns
  • Optimize model selection rules
  • Review processing efficiency
  • Check storage usage

Week 3:

  • Test cost optimization strategies
  • Implement improvements
  • Update documentation
  • Review budget alerts

Week 4:

  • Month-end cost analysis
  • Plan for next month
  • Adjust quotas if needed
  • Consider plan changes

Quarterly Tasks

  • Evaluate plan suitability: Right-sized?
  • Review architecture: Any optimizations?
  • Benchmark performance: Meeting targets?
  • Cost trend analysis: Improving or degrading?
  • Budget planning: Adjust for next quarter

Annual Tasks

  • Comprehensive cost review: Full year analysis
  • ROI assessment: Value delivered vs cost
  • Technology updates: New features to leverage?
  • Long-term planning: Scaling requirements?

Advanced Cost Strategies

Multi-Tenant Cost Allocation

For agencies serving multiple clients:

Track costs per client:

def process_with_client_tag(document, client_id):
result = api.process(
document,
metadata={"client_id": client_id}
)

# Track cost per client
log_cost(client_id, result["cost"])

Generate client cost reports:

Client A: 50 docs, $25.50
Client B: 120 docs, $61.20
Client C: 30 docs, $15.30

Allocate infrastructure costs:

Fixed cost: $295/month (Pro+)
Total docs: 200
Cost per doc: $1.48

Client A (50 docs): $74
Client B (120 docs): $177.60
Client C (30 docs): $44.40

Set pricing: Charge $3-5 per document, profit margin 50-70%.

Reserved Capacity Planning

For Enterprise customers:

Negotiate with Azure for reserved instances:

  • Commit to 1-year or 3-year VM usage
  • Receive 30-40% discount on compute
  • Best for stable, predictable workloads

Example:

  • D4s_v3 on-demand: $210/month
  • D4s_v3 reserved (1-year): $140/month
  • Savings: $70/month = $840/year

Break-even: Worth it if running consistently for more than 8 months.

Spot Instance Strategy

For non-production or flexible workloads:

Use Azure Spot VMs for development/testing:

  • 60-90% discount vs on-demand
  • Can be evicted with 30-second notice
  • Only for non-critical workloads

Example:

  • Pro Plan VM (D2s_v3): $105/month
  • Spot price: $15-30/month
  • Savings: $75-90/month for dev environment