Batch Processing

Batch processing allows you to process multiple documents or URLs simultaneously, dramatically improving efficiency for large-scale workflows. This guide covers everything from basic batch uploads to advanced automation.

What is Batch Processing?

Batch processing enables you to:

Upload multiple PDFs at once (up to 50 files)
Submit multiple URLs together (up to 100 URLs)
Queue jobs for sequential processing
Monitor progress across all items
Download all results together

Benefits:

Save time: Process 50 documents in same time as 1
Reduce manual work: One submission instead of 50
Better tracking: See all jobs in one view
Bulk downloads: Get all results as ZIP file

Plan Limits for Batch Processing

Batch processing limits vary by plan:

Feature	Free	Pro	Pro+	Enterprise
Max PDFs per batch	10	25	50	100
Max URLs per batch	25	50	100	500
Concurrent jobs	2	5	10	25
Queue size	50	200	500	Unlimited

Concurrent jobs: How many documents process simultaneously
Queue size: Total jobs that can wait in queue

Step 1: Prepare Your Batch

For PDF Batch Processing

Organize Your Files:

Create a folder with all PDFs you want to process
Name files clearly (e.g., invoice_2024_01.pdf, report_q1.pdf)
Check file sizes - ensure within plan limits
Remove duplicates - avoid processing same file twice

Recommended Folder Structure:

batch_2024_03_20/
  ├── invoice_001.pdf
  ├── invoice_002.pdf
  ├── invoice_003.pdf
  ├── ...
  └── invoice_050.pdf

File Naming Best Practices:

Use consistent naming convention
Include dates in filename (YYYY-MM-DD format)
Avoid special characters (use underscores instead of spaces)
Keep names under 100 characters

For URL Batch Processing

Create a URL List:

Open text editor (Notepad, VS Code, etc.)
Paste one URL per line
Remove any blank lines
Save as .txt file (optional, for record keeping)

Example URL List:

https://www.example.com/article-1
https://www.example.com/article-2
https://www.example.com/article-3
https://www.example.com/article-4
https://www.example.com/article-5

URL Validation:

Each URL must start with http:// or https://
No spaces before or after URL
No comments or labels (just URLs)
Maximum 2,000 characters per URL

Step 2: Access Batch Processing

Login to Dashboard Navigate to your Alactic AGI dashboard
Click "Batch Processing" Tab Located in main content area (third tab)
Choose Batch Type
- PDF Batch - For uploading multiple PDF files
- URL Batch - For scraping multiple websites

Step 3: Upload PDF Batch

Select Files

Method 1: Drag and Drop

Open your folder with PDFs
Select all files (Ctrl+A or Cmd+A)
Drag files to batch upload area
Drop to upload

Method 2: Click to Browse

Click "Select PDF Files" button
Navigate to your folder
Select multiple files:
- Windows: Hold Ctrl, click each file
- Mac: Hold Cmd, click each file
- Or: Select first file, hold Shift, select last file (selects range)
Click "Open"

Review File List

After upload, you'll see a list of all files:

✓ invoice_001.pdf (124 KB, 1 page)
✓ invoice_002.pdf (156 KB, 1 page)
✓ invoice_003.pdf (98 KB, 1 page)
...
✓ invoice_050.pdf (142 KB, 1 page)

Total: 50 files, 6.2 MB, 50 pages

File Status Indicators:

✓ Valid (ready to process)
⚠ Warning (large file, may take longer)
✗ Invalid (too large, corrupted, wrong format)

Remove Files (Optional)

Remove any files you don't want to process:

Hover over file in list
Click "X" button
File removed from batch

Step 4: Submit URL Batch

Enter URLs

Method 1: Paste List

Copy your URL list from text file
Click in URL textarea
Paste (Ctrl+V or Cmd+V)
Each URL appears on its own line

Method 2: Manual Entry

Type or paste first URL
Press Enter
Type or paste next URL
Repeat for all URLs

Review URL List

After entry, you'll see validation:

✓ https://www.example.com/article-1
✓ https://www.example.com/article-2
✓ https://www.example.com/article-3
✗ www.example.com/article-4 (missing http://)
✓ https://www.example.com/article-5

Total: 4 valid URLs, 1 invalid

URL Status Indicators:

✓ Valid (proper format, will be scraped)
✗ Invalid (missing protocol, malformed)

Fix Invalid URLs

Click "Edit" on invalid URL
Fix the issue (add https://)
Click "Update"
Status changes to ✓ Valid

Step 5: Configure Batch Options

Before submitting, configure how all documents will be processed:

Model Selection

Apply Same Model to All:

○ Alactic GPT-4o mini
  - Faster, lower cost
  - Good for straightforward docs
  - Recommended for large batches
  
○ Alactic GPT-4o
  - More powerful, higher cost
  - Better for complex analysis
  - Use for important documents

Cost Estimate:

50 documents × ~3,000 tokens each × $0.150/1M
= 150,000 tokens × $0.150/1M
= $0.0225 (GPT-4o mini)

vs.

50 documents × ~3,000 tokens each × $2.50/1M
= 150,000 tokens × $2.50/1M
= $0.375 (GPT-4o)

Recommendation: Use GPT-4o mini for batches unless documents require deep reasoning.

Analysis Depth

○ Quick Extract
  - ~5 seconds per document
  - Total batch time: ~4 minutes (50 docs)
  - Basic text only
  
● Standard Analysis (Recommended)
  - ~20 seconds per document
  - Total batch time: ~17 minutes (50 docs)
  - Text + summary + key points
  
○ Deep Analysis
  - ~60 seconds per document
  - Total batch time: ~50 minutes (50 docs)
  - Full analysis + entities + sentiment

Recommendation: Use Standard Analysis for most batches. Reserve Deep Analysis for critical documents.

Output Format

● JSON (Recommended)
  - Structured data
  - Easy to parse
  - Best for automation
  
○ Markdown
  - Human-readable
  - Good for review
  
○ Plain Text
  - Simple text only
  - Smallest file size

Recommendation: Use JSON for batch processing (easier to analyze programmatically).

Additional Options

☑ Enable Vector Storage
  - Allows semantic search across batch
  - Adds ~3 seconds per document
  - Recommended: Enable for batches you'll search later
  
☐ Enable Content Chunking
  - For documents with 50+ pages
  - Enable if batch has long PDFs
  
☑ Stop on Error
  - Stops entire batch if one document fails
  - Disable to process all valid documents
  - Recommended: Disable (process all, skip failures)
  
☐ Send Email When Complete
  - Receive email notification when batch finishes
  - Useful for very large batches (100+ docs)

Step 6: Submit Batch Job

Review Configuration
- Files/URLs: 50
- Model: GPT-4o mini
- Depth: Standard Analysis
- Format: JSON
- Estimated time: 17 minutes
- Estimated cost: $0.0225
Click "Process Batch"
- Batch job submitted
- Processing begins immediately
- Progress screen appears

Monitor Progress Real-time progress dashboard:

Processing Batch Job

Progress: [████████░░░░░░░░] 8/50 (16%)

Completed: 8 documents
Processing: 2 documents
Queued: 40 documents
Failed: 0 documents

Elapsed: 2m 14s
Estimated Remaining: 15m 23s

Step 7: Monitor Processing

Progress View

The progress screen shows:

Overall Progress Bar:

Visual progress (0-100%)
Completed/total count
Time elapsed and remaining

Individual Document Status:

 invoice_001.pdf - Complete (18.2s, $0.0042)
 invoice_002.pdf - Complete (21.5s, $0.0051)
 invoice_003.pdf - Processing... 45%
 invoice_004.pdf - Processing... 12%
 invoice_005.pdf - Queued
 invoice_006.pdf - Queued
...

Status Icons:

Complete (green)
Processing (blue spinner)
Queued (gray)
Failed (red)

Processing Strategy

Documents process based on plan's concurrent job limit:

Example (Pro Plan, 5 concurrent jobs):

Time  | Processing
------|------------------------------------------
0:00  | Doc 1, 2, 3, 4, 5 start
0:20  | Doc 1 completes, Doc 6 starts
0:25  | Doc 2 completes, Doc 7 starts
0:30  | Doc 3 completes, Doc 8 starts
...

Parallel Processing:

Free: 2 documents at once
Pro: 5 documents at once
Pro+: 10 documents at once
Enterprise: 25 documents at once

Can I Close Browser?

Yes! Batch processing continues even if you close browser.

To check status later:

Return to dashboard
Click "Batch Processing" tab
See "Active Batches" section
Click on your batch to view progress

Email Notification: If you enabled "Send Email When Complete", you'll receive notification when batch finishes.

Step 8: Review Batch Results

Once processing completes, review results:

Results Summary

Batch Job Complete!

Processed: 48/50 documents (96%)
Failed: 2/50 documents (4%)
Total Time: 16m 42s
Total Cost: $0.0218

Average Processing Time: 20.1s per document
Average Cost: $0.0045 per document

Success/Failure Breakdown

Successful Documents:

 invoice_001.pdf
 invoice_002.pdf
 invoice_003.pdf
 invoice_048.pdf

**Failed Documents:**

invoice_023.pdf - Error: File corrupted invoice_041.pdf - Error: No text extracted (scanned image)

### View Individual Results

Click on any completed document to see:
- Extracted text
- Summary
- Key information
- Metadata
- Token usage
- Cost

Same view as single document processing.

## Step 9: Download Batch Results

Download all results at once:

### Bulk Download Options

1. **Click "Download All Results"**
   Dropdown menu appears:

○ ZIP file (all JSONs) ○ ZIP file (all Markdowns) ○ CSV file (metadata only) ○ Excel file (metadata + summaries)

2. **Select Format**
- JSON: Complete data, best for automation
- Markdown: Human-readable, good for review
- CSV: Spreadsheet, good for analysis
- Excel: Formatted spreadsheet with summaries

3. **Click "Download"**
- File downloads to browser downloads folder
- Filename: `batch_2024_03_20_results.zip`

### ZIP File Contents

**For JSON format:**

batch_2024_03_20_results.zip ├── invoice_001_results.json ├── invoice_002_results.json ├── invoice_003_results.json ├── ... ├── invoice_048_results.json └── batch_summary.json

**batch_summary.json includes:**
- Total documents processed
- Success/failure counts
- Total cost
- Total tokens used
- Processing time statistics
- List of failed documents

### CSV Export Structure

**Columns:**
- Document Name
- Status (Success/Failed)
- Processing Time (seconds)
- Input Tokens
- Output Tokens
- Cost (USD)
- Summary (first 500 chars)
- Key Information Extracted

**Best for:**
- Analyzing costs across batch
- Identifying slow documents
- Reviewing summaries in spreadsheet
- Generating reports

## Advanced Batch Processing

### Scheduling Batch Jobs

**Enterprise plans only** - schedule recurring batch processing:

1. **Go to Settings → Batch Scheduler**
2. **Configure Schedule:**
   - Frequency: Daily, Weekly, Monthly
   - Time: Select hour (in UTC)
   - Source: Folder path or URL list
3. **Set Options:**
   - Model, analysis depth, output format
   - Email notifications
4. **Activate Schedule**

**Use Cases:**
- Daily news article scraping
- Weekly report processing
- Monthly invoice batch processing

### Monitoring Large Batches

For batches with 100+ documents:

**Progress Notifications:**
- Email at 25%, 50%, 75%, 100% complete
- Webhook calls for real-time monitoring
- SMS alerts (Enterprise only)

**Performance Monitoring:**
- Average processing time trending
- Cost tracking per batch
- Failure rate monitoring
- Queue depth alerts

### Reprocessing Failed Documents

If some documents failed:

1. **Click "Reprocess Failed" button**
2. **Review Failed List:**

invoice_023.pdf - File corrupted invoice_041.pdf - Scanned image

3. **Fix Issues:**
- Replace corrupted files
- Use OCR for scanned images
4. **Resubmit Only Failed Docs**
- Only processes documents that failed
- Saves time and cost

### Batch Processing via API

Automate batch processing with API:

```bash
# Submit batch job
curl -X POST https://<your-vm-ip>/api/v1/batch \
-H "X-Deployment-Key: ak-xxxxx" \
-H "Content-Type: application/json" \
-d '{
 "urls": [
   "https://example.com/doc1.pdf",
   "https://example.com/doc2.pdf"
 ],
 "model": "gpt-4o-mini",
 "analysis_depth": "standard",
 "output_format": "json"
}'

# Returns batch_id
{
"batch_id": "batch_abc123",
"status": "queued",
"total_documents": 2
}

# Check batch status
curl https://<your-vm-ip>/api/v1/batch/batch_abc123 \
-H "X-Deployment-Key: ak-xxxxx"

# Returns progress
{
"batch_id": "batch_abc123",
"status": "processing",
"completed": 1,
"failed": 0,
"total": 2,
"progress_percent": 50
}

Full API documentation: API Reference

Best Practices

Optimize Batch Size

Small Batches (1-10 documents):

Faster to complete
Easier to review
Lower risk if issues occur
Good for testing

Medium Batches (10-50 documents):

Good balance
Complete in reasonable time (20-30 minutes)
Manageable result set
Recommended for most use cases

Large Batches (50-200 documents):

Maximum efficiency
Longer completion time (1-3 hours)
Harder to review all results
Good for bulk processing

Very Large Batches (200+ documents):

Enterprise plans only
Consider splitting into multiple batches
Use scheduling feature
Monitor progress closely

Optimize for Cost

Use GPT-4o mini by default:

16x cheaper than GPT-4o
Good quality for most documents
Reserve GPT-4o for complex documents

Use Quick Extract when appropriate:

If you only need text, not analysis
3x faster than Standard Analysis
Lower token usage

Disable Vector Storage if not needed:

Saves processing time
Reduces storage costs
Only enable if you'll search these documents later

Monitor and Validate

Spot Check Results:

Review 5-10 random documents from batch
Verify extraction quality
Check summary accuracy
Adjust settings for future batches if needed

Track Failure Patterns:

If >10% documents fail, investigate why
Common causes: file corruption, scanned PDFs, password protection
Fix issues before reprocessing

Monitor Costs:

Check batch cost after completion
Compare to estimate
If significantly higher, investigate why (larger documents, more tokens than expected)

Troubleshooting

Common Issues

Issue: Batch Stuck at 0%

Solution:

Wait 2-3 minutes (queue processing)
Refresh page
Check Settings → Service Status
Contact support if stuck >5 minutes

Issue: High Failure Rate (>20%)

Causes:

Corrupted files
Scanned PDFs (no searchable text)
Password-protected files
URLs returning 404 errors

Solution:

Download failed documents list
Validate each file individually
Remove problematic files
Resubmit cleaned batch

Issue: Processing Much Slower Than Expected

Causes:

Documents larger than estimated
Complex documents requiring more processing
High server load (other users' jobs)

Solution:

Check individual document processing times
Consider splitting into smaller batches
Use Quick Extract for faster processing
Upgrade plan for more concurrent jobs

Issue: Cost Much Higher Than Expected

Causes:

Documents had more text than estimated
Used Deep Analysis (high token usage)
Used GPT-4o instead of GPT-4o mini

Solution:

Review token counts in results
Check which model was used
Use cost estimation tool before submitting
Switch to GPT-4o mini for future batches

Next Steps

Now that you understand batch processing:

Automate with API
- API Authentication
- Batch Processing API
Search Your Batch
- Semantic Search
- Find specific documents across large batches
Analyze Results
- Export and Analysis
- Generate reports from batch results
Optimize Costs
- Cost Optimization Guide
- Reduce per-document costs

Support

Need help with batch processing?

Email: support@alactic.ai
GitHub: github.com/Alactic-Inc/alactic-agi/issues
Community: community.alactic.ai

What is Batch Processing?​

Plan Limits for Batch Processing​

Step 1: Prepare Your Batch​

For PDF Batch Processing​

For URL Batch Processing​

Step 2: Access Batch Processing​

Step 3: Upload PDF Batch​

Select Files​

Review File List​

Remove Files (Optional)​

Step 4: Submit URL Batch​

Enter URLs​

Review URL List​

Fix Invalid URLs​

Step 5: Configure Batch Options​

Model Selection​

Analysis Depth​

Output Format​

Additional Options​

Step 6: Submit Batch Job​

Step 7: Monitor Processing​

Progress View​

Processing Strategy​

Can I Close Browser?​

Step 8: Review Batch Results​

Results Summary​

Success/Failure Breakdown​

Best Practices​

Optimize Batch Size​

Optimize for Cost​

Monitor and Validate​

Troubleshooting​

Common Issues​

Next Steps​

Support​

What is Batch Processing?

Plan Limits for Batch Processing

Step 1: Prepare Your Batch

For PDF Batch Processing

For URL Batch Processing

Step 2: Access Batch Processing

Step 3: Upload PDF Batch

Select Files

Review File List

Remove Files (Optional)

Step 4: Submit URL Batch

Enter URLs

Review URL List

Fix Invalid URLs

Step 5: Configure Batch Options

Model Selection

Analysis Depth

Output Format

Additional Options

Step 6: Submit Batch Job

Step 7: Monitor Processing

Progress View

Processing Strategy

Can I Close Browser?

Step 8: Review Batch Results

Results Summary

Success/Failure Breakdown

Best Practices

Optimize Batch Size

Optimize for Cost

Monitor and Validate

Troubleshooting

Common Issues

Next Steps

Support