Skip to main content

Batch Processing

Batch processing allows you to process multiple documents or URLs simultaneously, dramatically improving efficiency for large-scale workflows. This guide covers everything from basic batch uploads to advanced automation.

What is Batch Processing?

Batch processing enables you to:

  • Upload multiple PDFs at once (up to 50 files)
  • Submit multiple URLs together (up to 100 URLs)
  • Queue jobs for sequential processing
  • Monitor progress across all items
  • Download all results together

Benefits:

  • Save time: Process 50 documents in same time as 1
  • Reduce manual work: One submission instead of 50
  • Better tracking: See all jobs in one view
  • Bulk downloads: Get all results as ZIP file

Plan Limits for Batch Processing

Batch processing limits vary by plan:

FeatureFreeProPro+Enterprise
Max PDFs per batch102550100
Max URLs per batch2550100500
Concurrent jobs251025
Queue size50200500Unlimited

Concurrent jobs: How many documents process simultaneously
Queue size: Total jobs that can wait in queue

Step 1: Prepare Your Batch

For PDF Batch Processing

Organize Your Files:

  1. Create a folder with all PDFs you want to process
  2. Name files clearly (e.g., invoice_2024_01.pdf, report_q1.pdf)
  3. Check file sizes - ensure within plan limits
  4. Remove duplicates - avoid processing same file twice

Recommended Folder Structure:

batch_2024_03_20/
├── invoice_001.pdf
├── invoice_002.pdf
├── invoice_003.pdf
├── ...
└── invoice_050.pdf

File Naming Best Practices:

  • Use consistent naming convention
  • Include dates in filename (YYYY-MM-DD format)
  • Avoid special characters (use underscores instead of spaces)
  • Keep names under 100 characters

For URL Batch Processing

Create a URL List:

  1. Open text editor (Notepad, VS Code, etc.)
  2. Paste one URL per line
  3. Remove any blank lines
  4. Save as .txt file (optional, for record keeping)

Example URL List:

https://www.example.com/article-1
https://www.example.com/article-2
https://www.example.com/article-3
https://www.example.com/article-4
https://www.example.com/article-5

URL Validation:

  • Each URL must start with http:// or https://
  • No spaces before or after URL
  • No comments or labels (just URLs)
  • Maximum 2,000 characters per URL

Step 2: Access Batch Processing

  1. Login to Dashboard Navigate to your Alactic AGI dashboard

  2. Click "Batch Processing" Tab Located in main content area (third tab)

  3. Choose Batch Type

    • PDF Batch - For uploading multiple PDF files
    • URL Batch - For scraping multiple websites

Step 3: Upload PDF Batch

Select Files

Method 1: Drag and Drop

  1. Open your folder with PDFs
  2. Select all files (Ctrl+A or Cmd+A)
  3. Drag files to batch upload area
  4. Drop to upload

Method 2: Click to Browse

  1. Click "Select PDF Files" button
  2. Navigate to your folder
  3. Select multiple files:
    • Windows: Hold Ctrl, click each file
    • Mac: Hold Cmd, click each file
    • Or: Select first file, hold Shift, select last file (selects range)
  4. Click "Open"

Review File List

After upload, you'll see a list of all files:

✓ invoice_001.pdf (124 KB, 1 page)
✓ invoice_002.pdf (156 KB, 1 page)
✓ invoice_003.pdf (98 KB, 1 page)
...
✓ invoice_050.pdf (142 KB, 1 page)

Total: 50 files, 6.2 MB, 50 pages

File Status Indicators:

  • ✓ Valid (ready to process)
  • ⚠ Warning (large file, may take longer)
  • ✗ Invalid (too large, corrupted, wrong format)

Remove Files (Optional)

Remove any files you don't want to process:

  1. Hover over file in list
  2. Click "X" button
  3. File removed from batch

Step 4: Submit URL Batch

Enter URLs

Method 1: Paste List

  1. Copy your URL list from text file
  2. Click in URL textarea
  3. Paste (Ctrl+V or Cmd+V)
  4. Each URL appears on its own line

Method 2: Manual Entry

  1. Type or paste first URL
  2. Press Enter
  3. Type or paste next URL
  4. Repeat for all URLs

Review URL List

After entry, you'll see validation:

✓ https://www.example.com/article-1
✓ https://www.example.com/article-2
✓ https://www.example.com/article-3
✗ www.example.com/article-4 (missing http://)
✓ https://www.example.com/article-5

Total: 4 valid URLs, 1 invalid

URL Status Indicators:

  • ✓ Valid (proper format, will be scraped)
  • ✗ Invalid (missing protocol, malformed)

Fix Invalid URLs

  1. Click "Edit" on invalid URL
  2. Fix the issue (add https://)
  3. Click "Update"
  4. Status changes to ✓ Valid

Step 5: Configure Batch Options

Before submitting, configure how all documents will be processed:

Model Selection

Apply Same Model to All:

○ Alactic GPT-4o mini
- Faster, lower cost
- Good for straightforward docs
- Recommended for large batches

○ Alactic GPT-4o
- More powerful, higher cost
- Better for complex analysis
- Use for important documents

Cost Estimate:

50 documents × ~3,000 tokens each × $0.150/1M
= 150,000 tokens × $0.150/1M
= $0.0225 (GPT-4o mini)

vs.

50 documents × ~3,000 tokens each × $2.50/1M
= 150,000 tokens × $2.50/1M
= $0.375 (GPT-4o)

Recommendation: Use GPT-4o mini for batches unless documents require deep reasoning.

Analysis Depth

○ Quick Extract
- ~5 seconds per document
- Total batch time: ~4 minutes (50 docs)
- Basic text only

● Standard Analysis (Recommended)
- ~20 seconds per document
- Total batch time: ~17 minutes (50 docs)
- Text + summary + key points

○ Deep Analysis
- ~60 seconds per document
- Total batch time: ~50 minutes (50 docs)
- Full analysis + entities + sentiment

Recommendation: Use Standard Analysis for most batches. Reserve Deep Analysis for critical documents.

Output Format

● JSON (Recommended)
- Structured data
- Easy to parse
- Best for automation

○ Markdown
- Human-readable
- Good for review

○ Plain Text
- Simple text only
- Smallest file size

Recommendation: Use JSON for batch processing (easier to analyze programmatically).

Additional Options

☑ Enable Vector Storage
- Allows semantic search across batch
- Adds ~3 seconds per document
- Recommended: Enable for batches you'll search later

☐ Enable Content Chunking
- For documents with 50+ pages
- Enable if batch has long PDFs

☑ Stop on Error
- Stops entire batch if one document fails
- Disable to process all valid documents
- Recommended: Disable (process all, skip failures)

☐ Send Email When Complete
- Receive email notification when batch finishes
- Useful for very large batches (100+ docs)

Step 6: Submit Batch Job

  1. Review Configuration

    • Files/URLs: 50
    • Model: GPT-4o mini
    • Depth: Standard Analysis
    • Format: JSON
    • Estimated time: 17 minutes
    • Estimated cost: $0.0225
  2. Click "Process Batch"

    • Batch job submitted
    • Processing begins immediately
    • Progress screen appears
  3. Monitor Progress Real-time progress dashboard:

    Processing Batch Job

    Progress: [████████░░░░░░░░] 8/50 (16%)

    Completed: 8 documents
    Processing: 2 documents
    Queued: 40 documents
    Failed: 0 documents

    Elapsed: 2m 14s
    Estimated Remaining: 15m 23s

Step 7: Monitor Processing

Progress View

The progress screen shows:

Overall Progress Bar:

  • Visual progress (0-100%)
  • Completed/total count
  • Time elapsed and remaining

Individual Document Status:

 invoice_001.pdf - Complete (18.2s, $0.0042)
invoice_002.pdf - Complete (21.5s, $0.0051)
invoice_003.pdf - Processing... 45%
invoice_004.pdf - Processing... 12%
invoice_005.pdf - Queued
invoice_006.pdf - Queued
...

Status Icons:

  • Complete (green)
  • Processing (blue spinner)
  • Queued (gray)
  • Failed (red)

Processing Strategy

Documents process based on plan's concurrent job limit:

Example (Pro Plan, 5 concurrent jobs):

Time  | Processing
------|------------------------------------------
0:00 | Doc 1, 2, 3, 4, 5 start
0:20 | Doc 1 completes, Doc 6 starts
0:25 | Doc 2 completes, Doc 7 starts
0:30 | Doc 3 completes, Doc 8 starts
...

Parallel Processing:

  • Free: 2 documents at once
  • Pro: 5 documents at once
  • Pro+: 10 documents at once
  • Enterprise: 25 documents at once

Can I Close Browser?

Yes! Batch processing continues even if you close browser.

To check status later:

  1. Return to dashboard
  2. Click "Batch Processing" tab
  3. See "Active Batches" section
  4. Click on your batch to view progress

Email Notification: If you enabled "Send Email When Complete", you'll receive notification when batch finishes.

Step 8: Review Batch Results

Once processing completes, review results:

Results Summary

Batch Job Complete!

Processed: 48/50 documents (96%)
Failed: 2/50 documents (4%)
Total Time: 16m 42s
Total Cost: $0.0218

Average Processing Time: 20.1s per document
Average Cost: $0.0045 per document

Success/Failure Breakdown

Successful Documents:

 invoice_001.pdf
invoice_002.pdf
invoice_003.pdf
invoice_048.pdf

**Failed Documents:**

invoice_023.pdf - Error: File corrupted invoice_041.pdf - Error: No text extracted (scanned image)


### View Individual Results

Click on any completed document to see:
- Extracted text
- Summary
- Key information
- Metadata
- Token usage
- Cost

Same view as single document processing.

## Step 9: Download Batch Results

Download all results at once:

### Bulk Download Options

1. **Click "Download All Results"**
Dropdown menu appears:

○ ZIP file (all JSONs) ○ ZIP file (all Markdowns) ○ CSV file (metadata only) ○ Excel file (metadata + summaries)


2. **Select Format**
- JSON: Complete data, best for automation
- Markdown: Human-readable, good for review
- CSV: Spreadsheet, good for analysis
- Excel: Formatted spreadsheet with summaries

3. **Click "Download"**
- File downloads to browser downloads folder
- Filename: `batch_2024_03_20_results.zip`

### ZIP File Contents

**For JSON format:**

batch_2024_03_20_results.zip ├── invoice_001_results.json ├── invoice_002_results.json ├── invoice_003_results.json ├── ... ├── invoice_048_results.json └── batch_summary.json


**batch_summary.json includes:**
- Total documents processed
- Success/failure counts
- Total cost
- Total tokens used
- Processing time statistics
- List of failed documents

### CSV Export Structure

**Columns:**
- Document Name
- Status (Success/Failed)
- Processing Time (seconds)
- Input Tokens
- Output Tokens
- Cost (USD)
- Summary (first 500 chars)
- Key Information Extracted

**Best for:**
- Analyzing costs across batch
- Identifying slow documents
- Reviewing summaries in spreadsheet
- Generating reports

## Advanced Batch Processing

### Scheduling Batch Jobs

**Enterprise plans only** - schedule recurring batch processing:

1. **Go to Settings → Batch Scheduler**
2. **Configure Schedule:**
- Frequency: Daily, Weekly, Monthly
- Time: Select hour (in UTC)
- Source: Folder path or URL list
3. **Set Options:**
- Model, analysis depth, output format
- Email notifications
4. **Activate Schedule**

**Use Cases:**
- Daily news article scraping
- Weekly report processing
- Monthly invoice batch processing

### Monitoring Large Batches

For batches with 100+ documents:

**Progress Notifications:**
- Email at 25%, 50%, 75%, 100% complete
- Webhook calls for real-time monitoring
- SMS alerts (Enterprise only)

**Performance Monitoring:**
- Average processing time trending
- Cost tracking per batch
- Failure rate monitoring
- Queue depth alerts

### Reprocessing Failed Documents

If some documents failed:

1. **Click "Reprocess Failed" button**
2. **Review Failed List:**

invoice_023.pdf - File corrupted invoice_041.pdf - Scanned image

3. **Fix Issues:**
- Replace corrupted files
- Use OCR for scanned images
4. **Resubmit Only Failed Docs**
- Only processes documents that failed
- Saves time and cost

### Batch Processing via API

Automate batch processing with API:

```bash
# Submit batch job
curl -X POST https://<your-vm-ip>/api/v1/batch \
-H "X-Deployment-Key: ak-xxxxx" \
-H "Content-Type: application/json" \
-d '{
"urls": [
"https://example.com/doc1.pdf",
"https://example.com/doc2.pdf"
],
"model": "gpt-4o-mini",
"analysis_depth": "standard",
"output_format": "json"
}'

# Returns batch_id
{
"batch_id": "batch_abc123",
"status": "queued",
"total_documents": 2
}

# Check batch status
curl https://<your-vm-ip>/api/v1/batch/batch_abc123 \
-H "X-Deployment-Key: ak-xxxxx"

# Returns progress
{
"batch_id": "batch_abc123",
"status": "processing",
"completed": 1,
"failed": 0,
"total": 2,
"progress_percent": 50
}

Full API documentation: API Reference

Best Practices

Optimize Batch Size

Small Batches (1-10 documents):

  • Faster to complete
  • Easier to review
  • Lower risk if issues occur
  • Good for testing

Medium Batches (10-50 documents):

  • Good balance
  • Complete in reasonable time (20-30 minutes)
  • Manageable result set
  • Recommended for most use cases

Large Batches (50-200 documents):

  • Maximum efficiency
  • Longer completion time (1-3 hours)
  • Harder to review all results
  • Good for bulk processing

Very Large Batches (200+ documents):

  • Enterprise plans only
  • Consider splitting into multiple batches
  • Use scheduling feature
  • Monitor progress closely

Optimize for Cost

Use GPT-4o mini by default:

  • 16x cheaper than GPT-4o
  • Good quality for most documents
  • Reserve GPT-4o for complex documents

Use Quick Extract when appropriate:

  • If you only need text, not analysis
  • 3x faster than Standard Analysis
  • Lower token usage

Disable Vector Storage if not needed:

  • Saves processing time
  • Reduces storage costs
  • Only enable if you'll search these documents later

Monitor and Validate

Spot Check Results:

  • Review 5-10 random documents from batch
  • Verify extraction quality
  • Check summary accuracy
  • Adjust settings for future batches if needed

Track Failure Patterns:

  • If >10% documents fail, investigate why
  • Common causes: file corruption, scanned PDFs, password protection
  • Fix issues before reprocessing

Monitor Costs:

  • Check batch cost after completion
  • Compare to estimate
  • If significantly higher, investigate why (larger documents, more tokens than expected)

Troubleshooting

Common Issues

Issue: Batch Stuck at 0%

Solution:

  • Wait 2-3 minutes (queue processing)
  • Refresh page
  • Check Settings → Service Status
  • Contact support if stuck >5 minutes

Issue: High Failure Rate (>20%)

Causes:

  • Corrupted files
  • Scanned PDFs (no searchable text)
  • Password-protected files
  • URLs returning 404 errors

Solution:

  • Download failed documents list
  • Validate each file individually
  • Remove problematic files
  • Resubmit cleaned batch

Issue: Processing Much Slower Than Expected

Causes:

  • Documents larger than estimated
  • Complex documents requiring more processing
  • High server load (other users' jobs)

Solution:

  • Check individual document processing times
  • Consider splitting into smaller batches
  • Use Quick Extract for faster processing
  • Upgrade plan for more concurrent jobs

Issue: Cost Much Higher Than Expected

Causes:

  • Documents had more text than estimated
  • Used Deep Analysis (high token usage)
  • Used GPT-4o instead of GPT-4o mini

Solution:

  • Review token counts in results
  • Check which model was used
  • Use cost estimation tool before submitting
  • Switch to GPT-4o mini for future batches

Next Steps

Now that you understand batch processing:

  1. Automate with API

  2. Search Your Batch

  3. Analyze Results

  4. Optimize Costs

Support

Need help with batch processing?