Cost Optimization
Maximize the value of your Alactic AGI deployment by optimizing costs across infrastructure, model usage, and operational efficiency. This comprehensive guide covers strategies to reduce spending while maintaining quality and performance.
Cost Structure Overview
Fixed vs Variable Costs
Fixed Costs (Monthly Infrastructure):
| Plan | Fixed Cost | Percentage of Total |
|---|---|---|
| Free | $72 | 98% (minimal model costs) |
| Pro | $147 | 98% |
| Pro+ | $295 | 98% |
| Enterprise | Custom | 95-98% |
Key Insight: Infrastructure is the dominant cost. Model usage (variable cost) is typically less than 2-3% of total.
Variable Costs:
- Model API calls (Azure OpenAI tokens)
- Typically $0.0015 to $0.025 per document
- Scales with volume and model choice
Implication: Focus on maximizing document volume per infrastructure dollar rather than microoptimizing model costs.
Infrastructure Optimization
Right-Sizing Your Plan
Choose the appropriate plan based on actual usage:
Decision Matrix:
| Monthly Documents | Storage Need | Recommended Plan | Monthly Cost |
|---|---|---|---|
| Less than 70 | Less than 100 MB | Free | $72 |
| 70-300 | 100 MB - 5 GB | Pro | $147 |
| 300-1,500 | 5-20 GB | Pro+ | $295 |
| More than 1,500 | More than 20 GB | Enterprise | Custom |
Avoid over-provisioning: Don't upgrade to Pro+ if you only process 200 docs/month. Pro is sufficient and saves $148/month.
Example:
- Currently on Pro+ ($295/month)
- Average usage: 250 documents/month
- Action: Downgrade to Pro ($147/month)
- Savings: $148/month = $1,776/year
VM Management Strategies
Deallocate During Downtime
For non-production deployments:
If you have predictable downtime periods (nights, weekends), deallocate the VM to save compute costs.
Savings Calculation:
Pro Plan VM (Standard_D2s_v3): $105/month = $3.50/day
If deallocated 50% of time:
- Active time: $52.50/month
- Savings: $52.50/month
- Remaining costs: Storage ($19.20), Cosmos DB ($12), etc.
How to Deallocate:
# Deallocate VM (stop billing for compute)
az vm deallocate --resource-group alactic-rg --name alactic-vm
# Start VM when needed
az vm start --resource-group alactic-rg --name alactic-vm
Automation with Azure Automation:
Schedule automatic start/stop:
Start: Monday-Friday 8 AM
Stop: Monday-Friday 6 PM
Weekend: Deallocated
Savings: ~60% of VM compute costs
Caution: Not recommended for production deployments with SLA requirements.
Resize to Lower SKU During Low Usage
Temporary downgrade during low-activity periods:
Example:
- Normal: Pro+ Plan (D4s_v3, $210/month)
- Low season: Downgrade to Pro Plan (D2s_v3, $105/month)
- Duration: 3 months
- Savings: $315 over low season
How to Resize:
# Stop VM
az vm deallocate --resource-group alactic-rg --name alactic-vm
# Resize to smaller SKU
az vm resize --resource-group alactic-rg --name alactic-vm --size Standard_D2s_v3
# Start VM
az vm start --resource-group alactic-rg --name alactic-vm
Downtime: 5-15 minutes
Storage Optimization
Regular Cleanup
Delete old processed documents:
Strategy 1: Age-based cleanup
# Delete documents older than 90 days via API
curl -X DELETE "https://your-vm-ip/api/v1/documents?older_than=90d" \
-H "X-Deployment-Key: ak-xxxxx"
Strategy 2: Export and archive
# Export to cheaper storage before deleting
import requests
import boto3 # or Azure Blob Storage
def archive_and_delete(document_id):
# Download result
result = requests.get(
f"https://your-vm-ip/api/v1/results/{document_id}",
headers={"X-Deployment-Key": "ak-xxxxx"}
).json()
# Upload to S3 or Azure Blob (much cheaper)
s3.put_object(
Bucket='alactic-archive',
Key=f'results/{document_id}.json',
Body=json.dumps(result)
)
# Delete from Alactic
requests.delete(
f"https://your-vm-ip/api/v1/documents/{document_id}",
headers={"X-Deployment-Key": "ak-xxxxx"}
)
Cost comparison:
| Storage Type | Cost per GB/Month | 100 GB Cost |
|---|---|---|
| Alactic (Premium SSD) | $2.30 | $230 |
| Azure Blob (Hot) | $0.18 | $18 |
| Azure Blob (Cool) | $0.01 | $1 |
| AWS S3 Glacier | $0.004 | $0.40 |
Savings: Archive 100 GB of old results = Save $229/month
Disable Vector Storage
If not using semantic search:
curl -X POST https://your-vm-ip/api/v1/process \
-H "X-Deployment-Key: ak-xxxxx" \
-F "file=@document.pdf" \
-F "enable_vectors=false"
Savings:
- Vector embeddings: ~2 KB per page
- 1,000 pages = 2 MB saved per processing
- Also saves embedding API costs
- Faster processing (2-3 seconds per doc)
Recommendation: Disable unless specifically using semantic search features.
Compress PDFs Before Upload
Reduce storage and bandwidth costs:
# Using Ghostscript
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook \
-dNOPAUSE -dQUIET -dBATCH \
-sOutputFile=compressed.pdf input.pdf
Typical compression:
- Original: 10 MB
- Compressed: 2-3 MB
- 70-80% size reduction
- Same text extraction quality
Benefits:
- Less storage used
- Faster upload
- Faster processing
- Lower bandwidth costs
Model Cost Optimization
Strategic Model Selection
Cost per document by model:
| Model | Typical 10-Page Doc Cost | Use Case |
|---|---|---|
| GPT-4o mini | $0.0015 | Simple extraction, routine processing |
| GPT-4o | $0.025 | Complex analysis, critical accuracy |
GPT-4o is 16.7x more expensive than mini.
Optimization Strategies
Strategy 1: Default to Mini
Set GPT-4o mini as default for all processing:
Dashboard → Settings → Processing → Default Model → GPT-4o mini
Override to GPT-4o only when necessary:
- Legal contracts requiring nuanced understanding
- Financial analysis requiring precision
- Medical documents requiring critical accuracy
- Complex technical specifications
Impact:
If processing 500 documents/month:
- All GPT-4o: 500 × $0.025 = $12.50/month
- All mini: 500 × $0.0015 = $0.75/month
- Savings: $11.75/month = $141/year
Strategy 2: Cascade Approach
Process with mini first, escalate if needed:
def smart_process(document):
# Try with GPT-4o mini first
result = process_document(document, model="gpt-4o-mini")
# Check confidence score
if result["confidence"] < 0.85:
# Low confidence - reprocess with GPT-4o
result = process_document(document, model="gpt-4o")
return result
Typical outcome:
- 85% of documents: GPT-4o mini only
- 15% of documents: Reprocessed with GPT-4o
Cost calculation (500 docs):
- 425 docs: GPT-4o mini only = $0.64
- 75 docs: Mini + GPT-4o = $0.11 + $1.88 = $1.99
- Total: $2.63/month
- vs All GPT-4o: $12.50/month
- Savings: $9.87/month = $118/year
Strategy 3: Document Type-Based Routing
Route documents to appropriate model based on type:
def route_by_type(document):
doc_type = identify_document_type(document)
if doc_type in ["contract", "financial_report", "medical"]:
model = "gpt-4o" # High stakes
else:
model = "gpt-4o-mini" # Standard processing
return process_document(document, model=model)
Example distribution (500 docs/month):
- 50 contracts: GPT-4o ($1.25)
- 100 financial reports: GPT-4o ($2.50)
- 350 other docs: GPT-4o mini ($0.53)
- Total: $4.28/month
- Savings: $8.22/month vs all GPT-4o
Optimize Analysis Depth
Three analysis modes with different costs:
| Mode | Output Tokens | Cost (10-page doc) | Processing Time |
|---|---|---|---|
| Quick Extract | ~50 | $0.0003 | 8s |
| Standard Analysis | ~500 | $0.0015 | 15s |
| Deep Analysis | ~1,500 | $0.0045 | 25s |
Deep Analysis costs 3x more than Standard.
Optimization:
Use appropriate depth for task:
- Quick Extract: Only need text, no analysis
- Standard: Need summary and key points (most common)
- Deep: Need entities, sentiment, detailed analysis
Impact (500 docs with mini):
| Scenario | Monthly Cost | Savings |
|---|---|---|
| All Deep | $2.25 | Baseline |
| All Standard | $0.75 | $1.50/month |
| 80% Standard, 20% Deep | $1.05 | $1.20/month |
Recommendation: Default to Standard, use Deep selectively.
Minimize Reprocessing
Avoid processing the same document multiple times:
Common causes of reprocessing:
- Wrong settings first time
- Experimenting with different models
- Testing different analysis depths
- User errors (wrong file uploaded)
Prevention strategies:
- Validate settings before processing:
def validate_before_process(file, settings):
# Check file is correct
print(f"Processing: {file.name}")
print(f"Model: {settings['model']}")
print(f"Analysis depth: {settings['depth']}")
confirm = input("Proceed? (y/n): ")
if confirm.lower() != 'y':
return None
return process_document(file, **settings)
-
Cache results: Store results locally to avoid re-fetching from API
-
Use staging environment: Test with sample documents before processing full batch
Cost impact:
If 10% of documents reprocessed unnecessarily:
- 500 docs at $0.0015 = $0.75
- 50 reprocessed = $0.08 wasted
- Prevent reprocessing: Save $0.08/month
Small but adds up over time.
Operational Efficiency
Batch Processing
Process documents in batches for efficiency:
Benefits:
- Lower API overhead
- Better resource utilization
- Faster total processing time
- Same cost per document
Optimal batch sizes:
| Plan | Optimal Batch Size | Processing Time | Cost per Doc |
|---|---|---|---|
| Free | 10 PDFs | ~3 minutes | Same |
| Pro | 25 PDFs | ~15 minutes | Same |
| Pro+ | 50 PDFs | ~35 minutes | Same |
| Enterprise | 500 PDFs | ~5 hours | Same |
Example:
Process 500 PDFs:
- Individual: 500 API calls, 2.5 hours
- Batched (10 batches of 50): 10 API calls, 2.5 hours
- Same cost, simpler workflow
Scheduling During Off-Peak
Process during off-peak hours for better performance:
Azure OpenAI has variable load throughout the day:
- Peak hours (9 AM - 5 PM PT): Higher latency, potential throttling
- Off-peak (6 PM - 8 AM PT): Lower latency, faster processing
Benefits:
- Faster processing (20-30% improvement)
- Lower risk of rate limiting
- Same cost
Implementation:
from datetime import datetime, time
def is_off_peak():
current_hour = datetime.now().hour
# Off-peak: 6 PM to 8 AM PT
return current_hour >= 18 or current_hour < 8
def schedule_processing(documents):
if is_off_peak():
process_immediately(documents)
else:
schedule_for_evening(documents)
Optimize API Usage
Reduce unnecessary API calls:
1. Use batch endpoints instead of individual:
Bad:
for doc in documents:
result = api.process(doc) # 100 API calls
Good:
results = api.batch_process(documents) # 1 API call
2. Poll less frequently:
Bad:
while not complete:
status = check_status(job_id)
time.sleep(1) # Check every second
Good:
while not complete:
status = check_status(job_id)
time.sleep(10) # Check every 10 seconds
3. Use webhooks instead of polling:
# Set webhook URL
result = api.process(doc, webhook_url="https://yourapp.com/webhook")
# Receive notification when done
# No polling needed
Savings: Minimal cost impact but improves efficiency and reduces load.
Monitoring and Cost Control
Set Budget Alerts
Configure alerts to prevent overspending:
Azure Budget Configuration:
- Azure Portal → Cost Management → Budgets
- Create budget:
- Amount: $200/month (Pro Plan with buffer)
- Period: Monthly
- Add alerts:
- 75% ($150): Warning email
- 90% ($180): Critical email
- 100% ($200): Action (email + consider stopping processing)
Benefits:
- Early warning of unexpected costs
- Prevent budget overruns
- Identify cost anomalies quickly
Track Cost per Document
Monitor cost efficiency over time:
Formula:
Cost per Document = (Infrastructure + Model Costs) / Documents Processed
Target cost per document:
| Plan | Target Cost/Doc | Good Performance | Needs Optimization |
|---|---|---|---|
| Free | $1.03 (70 docs) | Less than $1.20 | More than $1.50 |
| Pro | $0.49 (300 docs) | Less than $0.55 | More than $0.70 |
| Pro+ | $0.20 (1,500 docs) | Less than $0.22 | More than $0.30 |
If cost per document is high:
- Not processing enough documents (underutilized infrastructure)
- Consider downgrading plan
- Or increase document volume to amortize fixed costs
Usage Analytics
Review monthly cost breakdown:
Dashboard → Settings → Usage Statistics → Cost Analysis
Key metrics to track:
- Infrastructure costs: Should be constant
- Model costs: Should scale linearly with volume
- Cost per document trend: Should decrease as volume increases
- Model distribution: Percentage using GPT-4o vs mini
Example report:
March 2024 Cost Analysis:
Infrastructure: $147.00 (98%)
Model API: $2.85 (2%)
Total: $149.85
Documents Processed: 285
Cost per Document: $0.53
Model Distribution:
GPT-4o mini: 245 docs (86%) - $0.37
GPT-4o: 40 docs (14%) - $1.00
Trend: +12% docs vs February
Cost efficiency: Improved 8%
Action items:
- On track for quota
- Good model distribution
- Cost efficiency improving
Cost Comparison: Build vs Buy
Building Custom Solution
Estimated costs for equivalent functionality:
Development Costs:
- Backend API: 200 hours × $100/hr = $20,000
- Frontend dashboard: 150 hours × $100/hr = $15,000
- DevOps/Infrastructure: 80 hours × $100/hr = $8,000
- Testing and QA: 100 hours × $100/hr = $10,000
- Total Development: $53,000
Ongoing Monthly Costs:
- VM compute: $105-210/month
- Azure OpenAI API: $5-50/month
- Database (Cosmos DB): $12-30/month
- Storage: $8-15/month
- Monitoring tools: $10-30/month
- Maintenance (20 hrs/month): $2,000/month
- Total Monthly: $2,140-2,335
Annual Total:
- Year 1: $53,000 + $25,680 = $78,680
- Year 2+: $25,680/year
Using Alactic AGI
Costs:
- Deployment: $0
- Pro Plan: $147/month = $1,764/year
- Pro+ Plan: $295/month = $3,540/year
Savings:
- vs Custom (Year 1): $75,140 (Pro) or $75,140 (Pro+)
- vs Custom (Year 2+): $23,916 (Pro) or $22,140 (Pro+)
ROI: Alactic AGI pays for itself in first month.
Using Direct OpenAI API
Estimated costs for 500 documents/month:
Infrastructure (Self-Managed):
- VM: $105/month
- Storage: $8/month
- Database: $12/month
- Subtotal: $125/month
OpenAI API Costs:
- GPT-4o mini: $0.0015 × 500 = $0.75
- Or GPT-4o: $0.025 × 500 = $12.50
Total: $125.75 - $137.50/month
Missing from direct API:
- No PDF parsing (must implement)
- No URL scraping (must implement)
- No vector storage (must implement)
- No dashboard UI (must implement)
- No batch processing (must implement)
- No usage tracking (must implement)
Development to add these: $20,000-40,000
Alactic AGI Pro Plan:
- $147/month
- All features included
- No development required
Value: $20,000-40,000 in features
Optimization Checklist
Monthly Tasks
Week 1:
- Review previous month costs
- Check cost per document trend
- Verify model distribution (aim for 80%+ mini)
- Clean up old documents (more than 90 days)
Week 2:
- Analyze usage patterns
- Optimize model selection rules
- Review processing efficiency
- Check storage usage
Week 3:
- Test cost optimization strategies
- Implement improvements
- Update documentation
- Review budget alerts
Week 4:
- Month-end cost analysis
- Plan for next month
- Adjust quotas if needed
- Consider plan changes
Quarterly Tasks
- Evaluate plan suitability: Right-sized?
- Review architecture: Any optimizations?
- Benchmark performance: Meeting targets?
- Cost trend analysis: Improving or degrading?
- Budget planning: Adjust for next quarter
Annual Tasks
- Comprehensive cost review: Full year analysis
- ROI assessment: Value delivered vs cost
- Technology updates: New features to leverage?
- Long-term planning: Scaling requirements?
Advanced Cost Strategies
Multi-Tenant Cost Allocation
For agencies serving multiple clients:
Track costs per client:
def process_with_client_tag(document, client_id):
result = api.process(
document,
metadata={"client_id": client_id}
)
# Track cost per client
log_cost(client_id, result["cost"])
Generate client cost reports:
Client A: 50 docs, $25.50
Client B: 120 docs, $61.20
Client C: 30 docs, $15.30
Allocate infrastructure costs:
Fixed cost: $295/month (Pro+)
Total docs: 200
Cost per doc: $1.48
Client A (50 docs): $74
Client B (120 docs): $177.60
Client C (30 docs): $44.40
Set pricing: Charge $3-5 per document, profit margin 50-70%.
Reserved Capacity Planning
For Enterprise customers:
Negotiate with Azure for reserved instances:
- Commit to 1-year or 3-year VM usage
- Receive 30-40% discount on compute
- Best for stable, predictable workloads
Example:
- D4s_v3 on-demand: $210/month
- D4s_v3 reserved (1-year): $140/month
- Savings: $70/month = $840/year
Break-even: Worth it if running consistently for more than 8 months.
Spot Instance Strategy
For non-production or flexible workloads:
Use Azure Spot VMs for development/testing:
- 60-90% discount vs on-demand
- Can be evicted with 30-second notice
- Only for non-critical workloads
Example:
- Pro Plan VM (D2s_v3): $105/month
- Spot price: $15-30/month
- Savings: $75-90/month for dev environment