Cost Optimization

Maximize the value of your Alactic AGI deployment by optimizing costs across infrastructure, model usage, and operational efficiency. This comprehensive guide covers strategies to reduce spending while maintaining quality and performance.

Cost Structure Overview

Fixed vs Variable Costs

Fixed Costs (Monthly Infrastructure):

Plan	Fixed Cost	Percentage of Total
Free	$72	98% (minimal model costs)
Pro	$147	98%
Pro+	$295	98%
Enterprise	Custom	95-98%

Key Insight: Infrastructure is the dominant cost. Model usage (variable cost) is typically less than 2-3% of total.

Variable Costs:

Model API calls (Azure OpenAI tokens)
Typically $0.0015 to $0.025 per document
Scales with volume and model choice

Implication: Focus on maximizing document volume per infrastructure dollar rather than microoptimizing model costs.

Infrastructure Optimization

Right-Sizing Your Plan

Choose the appropriate plan based on actual usage:

Decision Matrix:

Monthly Documents	Storage Need	Recommended Plan	Monthly Cost
Less than 70	Less than 100 MB	Free	$72
70-300	100 MB - 5 GB	Pro	$147
300-1,500	5-20 GB	Pro+	$295
More than 1,500	More than 20 GB	Enterprise	Custom

Avoid over-provisioning: Don't upgrade to Pro+ if you only process 200 docs/month. Pro is sufficient and saves $148/month.

Example:

Currently on Pro+ ($295/month)
Average usage: 250 documents/month
Action: Downgrade to Pro ($147/month)
Savings: $148/month = $1,776/year

VM Management Strategies

Deallocate During Downtime

For non-production deployments:

If you have predictable downtime periods (nights, weekends), deallocate the VM to save compute costs.

Savings Calculation:

Pro Plan VM (Standard_D2s_v3): $105/month = $3.50/day

If deallocated 50% of time:

Active time: $52.50/month
Savings: $52.50/month
Remaining costs: Storage ($19.20), Cosmos DB ($12), etc.

How to Deallocate:

# Deallocate VM (stop billing for compute)
az vm deallocate --resource-group alactic-rg --name alactic-vm

# Start VM when needed
az vm start --resource-group alactic-rg --name alactic-vm

Automation with Azure Automation:

Schedule automatic start/stop:

Start: Monday-Friday 8 AM
Stop: Monday-Friday 6 PM
Weekend: Deallocated

Savings: ~60% of VM compute costs

Caution: Not recommended for production deployments with SLA requirements.

Resize to Lower SKU During Low Usage

Temporary downgrade during low-activity periods:

Example:

Normal: Pro+ Plan (D4s_v3, $210/month)
Low season: Downgrade to Pro Plan (D2s_v3, $105/month)
Duration: 3 months
Savings: $315 over low season

How to Resize:

# Stop VM
az vm deallocate --resource-group alactic-rg --name alactic-vm

# Resize to smaller SKU
az vm resize --resource-group alactic-rg --name alactic-vm --size Standard_D2s_v3

# Start VM
az vm start --resource-group alactic-rg --name alactic-vm

Downtime: 5-15 minutes

Storage Optimization

Regular Cleanup

Delete old processed documents:

Strategy 1: Age-based cleanup

# Delete documents older than 90 days via API
curl -X DELETE "https://your-vm-ip/api/v1/documents?older_than=90d" \
  -H "X-Deployment-Key: ak-xxxxx"

Strategy 2: Export and archive

# Export to cheaper storage before deleting
import requests
import boto3  # or Azure Blob Storage

def archive_and_delete(document_id):
    # Download result
    result = requests.get(
        f"https://your-vm-ip/api/v1/results/{document_id}",
        headers={"X-Deployment-Key": "ak-xxxxx"}
    ).json()
    
    # Upload to S3 or Azure Blob (much cheaper)
    s3.put_object(
        Bucket='alactic-archive',
        Key=f'results/{document_id}.json',
        Body=json.dumps(result)
    )
    
    # Delete from Alactic
    requests.delete(
        f"https://your-vm-ip/api/v1/documents/{document_id}",
        headers={"X-Deployment-Key": "ak-xxxxx"}
    )

Cost comparison:

Storage Type	Cost per GB/Month	100 GB Cost
Alactic (Premium SSD)	$2.30	$230
Azure Blob (Hot)	$0.18	$18
Azure Blob (Cool)	$0.01	$1
AWS S3 Glacier	$0.004	$0.40

Savings: Archive 100 GB of old results = Save $229/month

Disable Vector Storage

If not using semantic search:

curl -X POST https://your-vm-ip/api/v1/process \
  -H "X-Deployment-Key: ak-xxxxx" \
  -F "file=@document.pdf" \
  -F "enable_vectors=false"

Savings:

Vector embeddings: ~2 KB per page
1,000 pages = 2 MB saved per processing
Also saves embedding API costs
Faster processing (2-3 seconds per doc)

Recommendation: Disable unless specifically using semantic search features.

Compress PDFs Before Upload

Reduce storage and bandwidth costs:

# Using Ghostscript
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook \
   -dNOPAUSE -dQUIET -dBATCH \
   -sOutputFile=compressed.pdf input.pdf

Typical compression:

Original: 10 MB
Compressed: 2-3 MB
70-80% size reduction
Same text extraction quality

Benefits:

Less storage used
Faster upload
Faster processing
Lower bandwidth costs

Model Cost Optimization

Strategic Model Selection

Cost per document by model:

Model	Typical 10-Page Doc Cost	Use Case
GPT-4o mini	$0.0015	Simple extraction, routine processing
GPT-4o	$0.025	Complex analysis, critical accuracy

GPT-4o is 16.7x more expensive than mini.

Optimization Strategies

Strategy 1: Default to Mini

Set GPT-4o mini as default for all processing:

Dashboard → Settings → Processing → Default Model → GPT-4o mini

Override to GPT-4o only when necessary:

Legal contracts requiring nuanced understanding
Financial analysis requiring precision
Medical documents requiring critical accuracy
Complex technical specifications

Impact:

If processing 500 documents/month:

All GPT-4o: 500 × $0.025 = $12.50/month
All mini: 500 × $0.0015 = $0.75/month
Savings: $11.75/month = $141/year

Strategy 2: Cascade Approach

Process with mini first, escalate if needed:

def smart_process(document):
    # Try with GPT-4o mini first
    result = process_document(document, model="gpt-4o-mini")
    
    # Check confidence score
    if result["confidence"] < 0.85:
        # Low confidence - reprocess with GPT-4o
        result = process_document(document, model="gpt-4o")
    
    return result

Typical outcome:

85% of documents: GPT-4o mini only
15% of documents: Reprocessed with GPT-4o

Cost calculation (500 docs):

425 docs: GPT-4o mini only = $0.64
75 docs: Mini + GPT-4o = $0.11 + $1.88 = $1.99
Total: $2.63/month
vs All GPT-4o: $12.50/month
Savings: $9.87/month = $118/year

Strategy 3: Document Type-Based Routing

Route documents to appropriate model based on type:

def route_by_type(document):
    doc_type = identify_document_type(document)
    
    if doc_type in ["contract", "financial_report", "medical"]:
        model = "gpt-4o"  # High stakes
    else:
        model = "gpt-4o-mini"  # Standard processing
    
    return process_document(document, model=model)

Example distribution (500 docs/month):

50 contracts: GPT-4o ($1.25)
100 financial reports: GPT-4o ($2.50)
350 other docs: GPT-4o mini ($0.53)
Total: $4.28/month
Savings: $8.22/month vs all GPT-4o

Optimize Analysis Depth

Three analysis modes with different costs:

Mode	Output Tokens	Cost (10-page doc)	Processing Time
Quick Extract	~50	$0.0003	8s
Standard Analysis	~500	$0.0015	15s
Deep Analysis	~1,500	$0.0045	25s

Deep Analysis costs 3x more than Standard.

Optimization:

Use appropriate depth for task:

Quick Extract: Only need text, no analysis
Standard: Need summary and key points (most common)
Deep: Need entities, sentiment, detailed analysis

Impact (500 docs with mini):

Scenario	Monthly Cost	Savings
All Deep	$2.25	Baseline
All Standard	$0.75	$1.50/month
80% Standard, 20% Deep	$1.05	$1.20/month

Recommendation: Default to Standard, use Deep selectively.

Minimize Reprocessing

Avoid processing the same document multiple times:

Common causes of reprocessing:

Wrong settings first time
Experimenting with different models
Testing different analysis depths
User errors (wrong file uploaded)

Prevention strategies:

Validate settings before processing:

def validate_before_process(file, settings):
    # Check file is correct
    print(f"Processing: {file.name}")
    print(f"Model: {settings['model']}")
    print(f"Analysis depth: {settings['depth']}")
    
    confirm = input("Proceed? (y/n): ")
    if confirm.lower() != 'y':
        return None
    
    return process_document(file, **settings)

Cache results: Store results locally to avoid re-fetching from API
Use staging environment: Test with sample documents before processing full batch

Cost impact:

If 10% of documents reprocessed unnecessarily:

500 docs at $0.0015 = $0.75
50 reprocessed = $0.08 wasted
Prevent reprocessing: Save $0.08/month

Small but adds up over time.

Operational Efficiency

Batch Processing

Process documents in batches for efficiency:

Benefits:

Lower API overhead
Better resource utilization
Faster total processing time
Same cost per document

Optimal batch sizes:

Plan	Optimal Batch Size	Processing Time	Cost per Doc
Free	10 PDFs	~3 minutes	Same
Pro	25 PDFs	~15 minutes	Same
Pro+	50 PDFs	~35 minutes	Same
Enterprise	500 PDFs	~5 hours	Same

Example:

Process 500 PDFs:

Individual: 500 API calls, 2.5 hours
Batched (10 batches of 50): 10 API calls, 2.5 hours
Same cost, simpler workflow

Scheduling During Off-Peak

Process during off-peak hours for better performance:

Azure OpenAI has variable load throughout the day:

Peak hours (9 AM - 5 PM PT): Higher latency, potential throttling
Off-peak (6 PM - 8 AM PT): Lower latency, faster processing

Benefits:

Faster processing (20-30% improvement)
Lower risk of rate limiting
Same cost

Implementation:

from datetime import datetime, time

def is_off_peak():
    current_hour = datetime.now().hour
    # Off-peak: 6 PM to 8 AM PT
    return current_hour >= 18 or current_hour < 8

def schedule_processing(documents):
    if is_off_peak():
        process_immediately(documents)
    else:
        schedule_for_evening(documents)

Optimize API Usage

Reduce unnecessary API calls:

1. Use batch endpoints instead of individual:

Bad:

for doc in documents:
    result = api.process(doc)  # 100 API calls

Good:

results = api.batch_process(documents)  # 1 API call

2. Poll less frequently:

Bad:

while not complete:
    status = check_status(job_id)
    time.sleep(1)  # Check every second

Good:

while not complete:
    status = check_status(job_id)
    time.sleep(10)  # Check every 10 seconds

3. Use webhooks instead of polling:

# Set webhook URL
result = api.process(doc, webhook_url="https://yourapp.com/webhook")

# Receive notification when done
# No polling needed

Savings: Minimal cost impact but improves efficiency and reduces load.

Monitoring and Cost Control

Set Budget Alerts

Configure alerts to prevent overspending:

Azure Budget Configuration:

Azure Portal → Cost Management → Budgets
Create budget:
- Amount: $200/month (Pro Plan with buffer)
- Period: Monthly
Add alerts:
- 75% ($150): Warning email
- 90% ($180): Critical email
- 100% ($200): Action (email + consider stopping processing)

Benefits:

Early warning of unexpected costs
Prevent budget overruns
Identify cost anomalies quickly

Track Cost per Document

Monitor cost efficiency over time:

Formula:

Cost per Document = (Infrastructure + Model Costs) / Documents Processed

Target cost per document:

Plan	Target Cost/Doc	Good Performance	Needs Optimization
Free	$1.03 (70 docs)	Less than $1.20	More than $1.50
Pro	$0.49 (300 docs)	Less than $0.55	More than $0.70
Pro+	$0.20 (1,500 docs)	Less than $0.22	More than $0.30

If cost per document is high:

Not processing enough documents (underutilized infrastructure)
Consider downgrading plan
Or increase document volume to amortize fixed costs

Usage Analytics

Review monthly cost breakdown:

Dashboard → Settings → Usage Statistics → Cost Analysis

Key metrics to track:

Infrastructure costs: Should be constant
Model costs: Should scale linearly with volume
Cost per document trend: Should decrease as volume increases
Model distribution: Percentage using GPT-4o vs mini

Example report:

March 2024 Cost Analysis:

Infrastructure: $147.00 (98%)
Model API: $2.85 (2%)
Total: $149.85

Documents Processed: 285
Cost per Document: $0.53

Model Distribution:
  GPT-4o mini: 245 docs (86%) - $0.37
  GPT-4o: 40 docs (14%) - $1.00

Trend: +12% docs vs February
Cost efficiency: Improved 8%

Action items:

On track for quota
Good model distribution
Cost efficiency improving

Cost Comparison: Build vs Buy

Building Custom Solution

Estimated costs for equivalent functionality:

Development Costs:

Backend API: 200 hours × $100/hr = $20,000
Frontend dashboard: 150 hours × $100/hr = $15,000
DevOps/Infrastructure: 80 hours × $100/hr = $8,000
Testing and QA: 100 hours × $100/hr = $10,000
Total Development: $53,000

Ongoing Monthly Costs:

VM compute: $105-210/month
Azure OpenAI API: $5-50/month
Database (Cosmos DB): $12-30/month
Storage: $8-15/month
Monitoring tools: $10-30/month
Maintenance (20 hrs/month): $2,000/month
Total Monthly: $2,140-2,335

Annual Total:

Year 1: $53,000 + $25,680 = $78,680
Year 2+: $25,680/year

Using Alactic AGI

Costs:

Deployment: $0
Pro Plan: $147/month = $1,764/year
Pro+ Plan: $295/month = $3,540/year

Savings:

vs Custom (Year 1): $75,140 (Pro) or $75,140 (Pro+)
vs Custom (Year 2+): $23,916 (Pro) or $22,140 (Pro+)

ROI: Alactic AGI pays for itself in first month.

Using Direct OpenAI API

Estimated costs for 500 documents/month:

Infrastructure (Self-Managed):

VM: $105/month
Storage: $8/month
Database: $12/month
Subtotal: $125/month

OpenAI API Costs:

GPT-4o mini: $0.0015 × 500 = $0.75
Or GPT-4o: $0.025 × 500 = $12.50

Total: $125.75 - $137.50/month

Missing from direct API:

No PDF parsing (must implement)
No URL scraping (must implement)
No vector storage (must implement)
No dashboard UI (must implement)
No batch processing (must implement)
No usage tracking (must implement)

Development to add these: $20,000-40,000

Alactic AGI Pro Plan:

$147/month
All features included
No development required

Value: $20,000-40,000 in features

Optimization Checklist

Monthly Tasks

Week 1:

Review previous month costs
Check cost per document trend
Verify model distribution (aim for 80%+ mini)
Clean up old documents (more than 90 days)

Week 2:

Analyze usage patterns
Optimize model selection rules
Review processing efficiency
Check storage usage

Week 3:

Test cost optimization strategies
Implement improvements
Update documentation
Review budget alerts

Week 4:

Month-end cost analysis
Plan for next month
Adjust quotas if needed
Consider plan changes

Quarterly Tasks

Evaluate plan suitability: Right-sized?
Review architecture: Any optimizations?
Benchmark performance: Meeting targets?
Cost trend analysis: Improving or degrading?
Budget planning: Adjust for next quarter

Annual Tasks

Comprehensive cost review: Full year analysis
ROI assessment: Value delivered vs cost
Technology updates: New features to leverage?
Long-term planning: Scaling requirements?

Advanced Cost Strategies

Multi-Tenant Cost Allocation

For agencies serving multiple clients:

Track costs per client:

def process_with_client_tag(document, client_id):
    result = api.process(
        document,
        metadata={"client_id": client_id}
    )
    
    # Track cost per client
    log_cost(client_id, result["cost"])

Generate client cost reports:

Client A: 50 docs, $25.50
Client B: 120 docs, $61.20
Client C: 30 docs, $15.30

Allocate infrastructure costs:

Fixed cost: $295/month (Pro+)
Total docs: 200
Cost per doc: $1.48

Client A (50 docs): $74
Client B (120 docs): $177.60
Client C (30 docs): $44.40

Set pricing: Charge $3-5 per document, profit margin 50-70%.

Reserved Capacity Planning

For Enterprise customers:

Negotiate with Azure for reserved instances:

Commit to 1-year or 3-year VM usage
Receive 30-40% discount on compute
Best for stable, predictable workloads

Example:

D4s_v3 on-demand: $210/month
D4s_v3 reserved (1-year): $140/month
Savings: $70/month = $840/year

Break-even: Worth it if running consistently for more than 8 months.

Spot Instance Strategy

For non-production or flexible workloads:

Use Azure Spot VMs for development/testing:

60-90% discount vs on-demand
Can be evicted with 30-second notice
Only for non-critical workloads

Example:

Pro Plan VM (D2s_v3): $105/month
Spot price: $15-30/month
Savings: $75-90/month for dev environment

Cost Structure Overview​

Fixed vs Variable Costs​

Infrastructure Optimization​

Right-Sizing Your Plan​

VM Management Strategies​

Deallocate During Downtime​

Resize to Lower SKU During Low Usage​

Storage Optimization​

Regular Cleanup​

Disable Vector Storage​

Compress PDFs Before Upload​

Model Cost Optimization​

Strategic Model Selection​

Optimization Strategies​

Strategy 1: Default to Mini​

Strategy 2: Cascade Approach​

Strategy 3: Document Type-Based Routing​

Optimize Analysis Depth​

Minimize Reprocessing​

Operational Efficiency​

Batch Processing​

Scheduling During Off-Peak​

Optimize API Usage​

Monitoring and Cost Control​

Set Budget Alerts​

Track Cost per Document​

Usage Analytics​

Cost Comparison: Build vs Buy​

Building Custom Solution​

Using Alactic AGI​

Using Direct OpenAI API​

Optimization Checklist​

Monthly Tasks​

Quarterly Tasks​

Annual Tasks​

Advanced Cost Strategies​

Multi-Tenant Cost Allocation​

Reserved Capacity Planning​

Spot Instance Strategy​

Related Documentation​

Cost Structure Overview

Fixed vs Variable Costs

Infrastructure Optimization

Right-Sizing Your Plan

VM Management Strategies

Deallocate During Downtime

Resize to Lower SKU During Low Usage

Storage Optimization

Regular Cleanup

Disable Vector Storage

Compress PDFs Before Upload

Model Cost Optimization

Strategic Model Selection

Optimization Strategies

Strategy 1: Default to Mini

Strategy 2: Cascade Approach

Strategy 3: Document Type-Based Routing

Optimize Analysis Depth

Minimize Reprocessing

Operational Efficiency

Batch Processing

Scheduling During Off-Peak

Optimize API Usage

Monitoring and Cost Control

Set Budget Alerts

Track Cost per Document

Usage Analytics

Cost Comparison: Build vs Buy

Building Custom Solution

Using Alactic AGI

Using Direct OpenAI API

Optimization Checklist

Monthly Tasks

Quarterly Tasks

Annual Tasks

Advanced Cost Strategies

Multi-Tenant Cost Allocation

Reserved Capacity Planning

Spot Instance Strategy

Related Documentation