Batch Processing
Batch processing allows you to process multiple documents or URLs simultaneously, dramatically improving efficiency for large-scale workflows. This guide covers everything from basic batch uploads to advanced automation.
What is Batch Processing?
Batch processing enables you to:
- Upload multiple PDFs at once (up to 50 files)
- Submit multiple URLs together (up to 100 URLs)
- Queue jobs for sequential processing
- Monitor progress across all items
- Download all results together
Benefits:
- Save time: Process 50 documents in same time as 1
- Reduce manual work: One submission instead of 50
- Better tracking: See all jobs in one view
- Bulk downloads: Get all results as ZIP file
Plan Limits for Batch Processing
Batch processing limits vary by plan:
| Feature | Free | Pro | Pro+ | Enterprise |
|---|---|---|---|---|
| Max PDFs per batch | 10 | 25 | 50 | 100 |
| Max URLs per batch | 25 | 50 | 100 | 500 |
| Concurrent jobs | 2 | 5 | 10 | 25 |
| Queue size | 50 | 200 | 500 | Unlimited |
Concurrent jobs: How many documents process simultaneously
Queue size: Total jobs that can wait in queue
Step 1: Prepare Your Batch
For PDF Batch Processing
Organize Your Files:
- Create a folder with all PDFs you want to process
- Name files clearly (e.g.,
invoice_2024_01.pdf,report_q1.pdf) - Check file sizes - ensure within plan limits
- Remove duplicates - avoid processing same file twice
Recommended Folder Structure:
batch_2024_03_20/
├── invoice_001.pdf
├── invoice_002.pdf
├── invoice_003.pdf
├── ...
└── invoice_050.pdf
File Naming Best Practices:
- Use consistent naming convention
- Include dates in filename (YYYY-MM-DD format)
- Avoid special characters (use underscores instead of spaces)
- Keep names under 100 characters
For URL Batch Processing
Create a URL List:
- Open text editor (Notepad, VS Code, etc.)
- Paste one URL per line
- Remove any blank lines
- Save as .txt file (optional, for record keeping)
Example URL List:
https://www.example.com/article-1
https://www.example.com/article-2
https://www.example.com/article-3
https://www.example.com/article-4
https://www.example.com/article-5
URL Validation:
- Each URL must start with
http://orhttps:// - No spaces before or after URL
- No comments or labels (just URLs)
- Maximum 2,000 characters per URL
Step 2: Access Batch Processing
-
Login to Dashboard Navigate to your Alactic AGI dashboard
-
Click "Batch Processing" Tab Located in main content area (third tab)
-
Choose Batch Type
- PDF Batch - For uploading multiple PDF files
- URL Batch - For scraping multiple websites
Step 3: Upload PDF Batch
Select Files
Method 1: Drag and Drop
- Open your folder with PDFs
- Select all files (Ctrl+A or Cmd+A)
- Drag files to batch upload area
- Drop to upload
Method 2: Click to Browse
- Click "Select PDF Files" button
- Navigate to your folder
- Select multiple files:
- Windows: Hold Ctrl, click each file
- Mac: Hold Cmd, click each file
- Or: Select first file, hold Shift, select last file (selects range)
- Click "Open"
Review File List
After upload, you'll see a list of all files:
✓ invoice_001.pdf (124 KB, 1 page)
✓ invoice_002.pdf (156 KB, 1 page)
✓ invoice_003.pdf (98 KB, 1 page)
...
✓ invoice_050.pdf (142 KB, 1 page)
Total: 50 files, 6.2 MB, 50 pages
File Status Indicators:
- ✓ Valid (ready to process)
- ⚠ Warning (large file, may take longer)
- ✗ Invalid (too large, corrupted, wrong format)
Remove Files (Optional)
Remove any files you don't want to process:
- Hover over file in list
- Click "X" button
- File removed from batch
Step 4: Submit URL Batch
Enter URLs
Method 1: Paste List
- Copy your URL list from text file
- Click in URL textarea
- Paste (Ctrl+V or Cmd+V)
- Each URL appears on its own line
Method 2: Manual Entry
- Type or paste first URL
- Press Enter
- Type or paste next URL
- Repeat for all URLs
Review URL List
After entry, you'll see validation:
✓ https://www.example.com/article-1
✓ https://www.example.com/article-2
✓ https://www.example.com/article-3
✗ www.example.com/article-4 (missing http://)
✓ https://www.example.com/article-5
Total: 4 valid URLs, 1 invalid
URL Status Indicators:
- ✓ Valid (proper format, will be scraped)
- ✗ Invalid (missing protocol, malformed)
Fix Invalid URLs
- Click "Edit" on invalid URL
- Fix the issue (add
https://) - Click "Update"
- Status changes to ✓ Valid
Step 5: Configure Batch Options
Before submitting, configure how all documents will be processed:
Model Selection
Apply Same Model to All:
○ Alactic GPT-4o mini
- Faster, lower cost
- Good for straightforward docs
- Recommended for large batches
○ Alactic GPT-4o
- More powerful, higher cost
- Better for complex analysis
- Use for important documents
Cost Estimate:
50 documents × ~3,000 tokens each × $0.150/1M
= 150,000 tokens × $0.150/1M
= $0.0225 (GPT-4o mini)
vs.
50 documents × ~3,000 tokens each × $2.50/1M
= 150,000 tokens × $2.50/1M
= $0.375 (GPT-4o)
Recommendation: Use GPT-4o mini for batches unless documents require deep reasoning.
Analysis Depth
○ Quick Extract
- ~5 seconds per document
- Total batch time: ~4 minutes (50 docs)
- Basic text only
● Standard Analysis (Recommended)
- ~20 seconds per document
- Total batch time: ~17 minutes (50 docs)
- Text + summary + key points
○ Deep Analysis
- ~60 seconds per document
- Total batch time: ~50 minutes (50 docs)
- Full analysis + entities + sentiment
Recommendation: Use Standard Analysis for most batches. Reserve Deep Analysis for critical documents.
Output Format
● JSON (Recommended)
- Structured data
- Easy to parse
- Best for automation
○ Markdown
- Human-readable
- Good for review
○ Plain Text
- Simple text only
- Smallest file size
Recommendation: Use JSON for batch processing (easier to analyze programmatically).
Additional Options
☑ Enable Vector Storage
- Allows semantic search across batch
- Adds ~3 seconds per document
- Recommended: Enable for batches you'll search later
☐ Enable Content Chunking
- For documents with 50+ pages
- Enable if batch has long PDFs
☑ Stop on Error
- Stops entire batch if one document fails
- Disable to process all valid documents
- Recommended: Disable (process all, skip failures)
☐ Send Email When Complete
- Receive email notification when batch finishes
- Useful for very large batches (100+ docs)
Step 6: Submit Batch Job
-
Review Configuration
- Files/URLs: 50
- Model: GPT-4o mini
- Depth: Standard Analysis
- Format: JSON
- Estimated time: 17 minutes
- Estimated cost: $0.0225
-
Click "Process Batch"
- Batch job submitted
- Processing begins immediately
- Progress screen appears
-
Monitor Progress Real-time progress dashboard:
Processing Batch Job
Progress: [████████░░░░░░░░] 8/50 (16%)
Completed: 8 documents
Processing: 2 documents
Queued: 40 documents
Failed: 0 documents
Elapsed: 2m 14s
Estimated Remaining: 15m 23s
Step 7: Monitor Processing
Progress View
The progress screen shows:
Overall Progress Bar:
- Visual progress (0-100%)
- Completed/total count
- Time elapsed and remaining
Individual Document Status:
invoice_001.pdf - Complete (18.2s, $0.0042)
invoice_002.pdf - Complete (21.5s, $0.0051)
invoice_003.pdf - Processing... 45%
invoice_004.pdf - Processing... 12%
invoice_005.pdf - Queued
invoice_006.pdf - Queued
...
Status Icons:
- Complete (green)
- Processing (blue spinner)
- Queued (gray)
- Failed (red)
Processing Strategy
Documents process based on plan's concurrent job limit:
Example (Pro Plan, 5 concurrent jobs):
Time | Processing
------|------------------------------------------
0:00 | Doc 1, 2, 3, 4, 5 start
0:20 | Doc 1 completes, Doc 6 starts
0:25 | Doc 2 completes, Doc 7 starts
0:30 | Doc 3 completes, Doc 8 starts
...
Parallel Processing:
- Free: 2 documents at once
- Pro: 5 documents at once
- Pro+: 10 documents at once
- Enterprise: 25 documents at once
Can I Close Browser?
Yes! Batch processing continues even if you close browser.
To check status later:
- Return to dashboard
- Click "Batch Processing" tab
- See "Active Batches" section
- Click on your batch to view progress
Email Notification: If you enabled "Send Email When Complete", you'll receive notification when batch finishes.
Step 8: Review Batch Results
Once processing completes, review results:
Results Summary
Batch Job Complete!
Processed: 48/50 documents (96%)
Failed: 2/50 documents (4%)
Total Time: 16m 42s
Total Cost: $0.0218
Average Processing Time: 20.1s per document
Average Cost: $0.0045 per document
Success/Failure Breakdown
Successful Documents:
invoice_001.pdf
invoice_002.pdf
invoice_003.pdf
invoice_048.pdf
**Failed Documents:**
invoice_023.pdf - Error: File corrupted invoice_041.pdf - Error: No text extracted (scanned image)
### View Individual Results
Click on any completed document to see:
- Extracted text
- Summary
- Key information
- Metadata
- Token usage
- Cost
Same view as single document processing.
## Step 9: Download Batch Results
Download all results at once:
### Bulk Download Options
1. **Click "Download All Results"**
Dropdown menu appears:
○ ZIP file (all JSONs) ○ ZIP file (all Markdowns) ○ CSV file (metadata only) ○ Excel file (metadata + summaries)
2. **Select Format**
- JSON: Complete data, best for automation
- Markdown: Human-readable, good for review
- CSV: Spreadsheet, good for analysis
- Excel: Formatted spreadsheet with summaries
3. **Click "Download"**
- File downloads to browser downloads folder
- Filename: `batch_2024_03_20_results.zip`
### ZIP File Contents
**For JSON format:**
batch_2024_03_20_results.zip ├── invoice_001_results.json ├── invoice_002_results.json ├── invoice_003_results.json ├── ... ├── invoice_048_results.json └── batch_summary.json
**batch_summary.json includes:**
- Total documents processed
- Success/failure counts
- Total cost
- Total tokens used
- Processing time statistics
- List of failed documents
### CSV Export Structure
**Columns:**
- Document Name
- Status (Success/Failed)
- Processing Time (seconds)
- Input Tokens
- Output Tokens
- Cost (USD)
- Summary (first 500 chars)
- Key Information Extracted
**Best for:**
- Analyzing costs across batch
- Identifying slow documents
- Reviewing summaries in spreadsheet
- Generating reports
## Advanced Batch Processing
### Scheduling Batch Jobs
**Enterprise plans only** - schedule recurring batch processing:
1. **Go to Settings → Batch Scheduler**
2. **Configure Schedule:**
- Frequency: Daily, Weekly, Monthly
- Time: Select hour (in UTC)
- Source: Folder path or URL list
3. **Set Options:**
- Model, analysis depth, output format
- Email notifications
4. **Activate Schedule**
**Use Cases:**
- Daily news article scraping
- Weekly report processing
- Monthly invoice batch processing
### Monitoring Large Batches
For batches with 100+ documents:
**Progress Notifications:**
- Email at 25%, 50%, 75%, 100% complete
- Webhook calls for real-time monitoring
- SMS alerts (Enterprise only)
**Performance Monitoring:**
- Average processing time trending
- Cost tracking per batch
- Failure rate monitoring
- Queue depth alerts
### Reprocessing Failed Documents
If some documents failed:
1. **Click "Reprocess Failed" button**
2. **Review Failed List:**
invoice_023.pdf - File corrupted invoice_041.pdf - Scanned image
3. **Fix Issues:**
- Replace corrupted files
- Use OCR for scanned images
4. **Resubmit Only Failed Docs**
- Only processes documents that failed
- Saves time and cost
### Batch Processing via API
Automate batch processing with API:
```bash
# Submit batch job
curl -X POST https://<your-vm-ip>/api/v1/batch \
-H "X-Deployment-Key: ak-xxxxx" \
-H "Content-Type: application/json" \
-d '{
"urls": [
"https://example.com/doc1.pdf",
"https://example.com/doc2.pdf"
],
"model": "gpt-4o-mini",
"analysis_depth": "standard",
"output_format": "json"
}'
# Returns batch_id
{
"batch_id": "batch_abc123",
"status": "queued",
"total_documents": 2
}
# Check batch status
curl https://<your-vm-ip>/api/v1/batch/batch_abc123 \
-H "X-Deployment-Key: ak-xxxxx"
# Returns progress
{
"batch_id": "batch_abc123",
"status": "processing",
"completed": 1,
"failed": 0,
"total": 2,
"progress_percent": 50
}
Full API documentation: API Reference
Best Practices
Optimize Batch Size
Small Batches (1-10 documents):
- Faster to complete
- Easier to review
- Lower risk if issues occur
- Good for testing
Medium Batches (10-50 documents):
- Good balance
- Complete in reasonable time (20-30 minutes)
- Manageable result set
- Recommended for most use cases
Large Batches (50-200 documents):
- Maximum efficiency
- Longer completion time (1-3 hours)
- Harder to review all results
- Good for bulk processing
Very Large Batches (200+ documents):
- Enterprise plans only
- Consider splitting into multiple batches
- Use scheduling feature
- Monitor progress closely
Optimize for Cost
Use GPT-4o mini by default:
- 16x cheaper than GPT-4o
- Good quality for most documents
- Reserve GPT-4o for complex documents
Use Quick Extract when appropriate:
- If you only need text, not analysis
- 3x faster than Standard Analysis
- Lower token usage
Disable Vector Storage if not needed:
- Saves processing time
- Reduces storage costs
- Only enable if you'll search these documents later
Monitor and Validate
Spot Check Results:
- Review 5-10 random documents from batch
- Verify extraction quality
- Check summary accuracy
- Adjust settings for future batches if needed
Track Failure Patterns:
- If >10% documents fail, investigate why
- Common causes: file corruption, scanned PDFs, password protection
- Fix issues before reprocessing
Monitor Costs:
- Check batch cost after completion
- Compare to estimate
- If significantly higher, investigate why (larger documents, more tokens than expected)
Troubleshooting
Common Issues
Issue: Batch Stuck at 0%
Solution:
- Wait 2-3 minutes (queue processing)
- Refresh page
- Check Settings → Service Status
- Contact support if stuck >5 minutes
Issue: High Failure Rate (>20%)
Causes:
- Corrupted files
- Scanned PDFs (no searchable text)
- Password-protected files
- URLs returning 404 errors
Solution:
- Download failed documents list
- Validate each file individually
- Remove problematic files
- Resubmit cleaned batch
Issue: Processing Much Slower Than Expected
Causes:
- Documents larger than estimated
- Complex documents requiring more processing
- High server load (other users' jobs)
Solution:
- Check individual document processing times
- Consider splitting into smaller batches
- Use Quick Extract for faster processing
- Upgrade plan for more concurrent jobs
Issue: Cost Much Higher Than Expected
Causes:
- Documents had more text than estimated
- Used Deep Analysis (high token usage)
- Used GPT-4o instead of GPT-4o mini
Solution:
- Review token counts in results
- Check which model was used
- Use cost estimation tool before submitting
- Switch to GPT-4o mini for future batches
Next Steps
Now that you understand batch processing:
-
Automate with API
-
Search Your Batch
- Semantic Search
- Find specific documents across large batches
-
Analyze Results
- Export and Analysis
- Generate reports from batch results
-
Optimize Costs
- Cost Optimization Guide
- Reduce per-document costs
Support
Need help with batch processing?
- Email: support@alactic.ai
- GitHub: github.com/Alactic-Inc/alactic-agi/issues
- Community: community.alactic.ai