Processing Your First Document

This guide walks you through uploading and processing your first document with Alactic AGI. You'll learn how to extract content, view results, and understand the processing pipeline.

Prerequisites

Before processing your first document:

Deployment completed successfully
Dashboard accessible at your VM's IP address
Deployment key retrieved and saved
Logged into dashboard

Step 1: Prepare Your Test Document

For your first processing job, choose a simple document:

Recommended Test Documents:

Single-page PDF (invoice, receipt, form)
Short research paper (5-10 pages)
News article URL
Blog post URL

Avoid for First Test:

Very large PDFs (100+ pages)
Scanned documents with poor quality
Password-protected PDFs
Websites with heavy JavaScript

Sample Documents: If you don't have a document ready, try these public URLs:

News article: https://www.bbc.com/news/technology (any recent article)
Research paper: https://arxiv.org/abs/2303.08774 (GPT-4 paper)
Blog post: Any Medium article URL

Step 2: Access the Dashboard

Open your browser Navigate to your dashboard URL:
```
https://<your-vm-ip-address>
```
Login
- Enter your deployment key (format: ak-xxxxx)
- Click "Sign In"
- You'll see the main dashboard
Verify Dashboard Loaded
- Top navigation visible
- Processing tabs visible (PDF Upload, URL Scraping, Batch)
- Status panel shows your plan and usage

Step 3A: Upload a PDF Document

If you're processing a PDF file:

Upload Interface

Click "PDF Upload" tab
- Located in the main content area
- Default tab when dashboard loads
Select Your File

Method 1: Drag and Drop
- Drag PDF file from your computer
- Drop onto the upload area
- File will be added to queue
Method 2: Click to Browse
- Click "Select PDF Files" button
- Browse to your PDF location
- Select file and click "Open"
Verify File Added You'll see a preview card showing:
- File name
- File size
- Page count (if detected)
- Ready to process

Configure Processing Options

Before submitting, configure how you want the document processed:

1. Select Model

Choose the AI model:

○ Alactic GPT-4o mini (Recommended for first test)
  - Faster processing
  - Lower cost ($0.150/1M tokens)
  - Good for straightforward extraction
  
○ Alactic GPT-4o
  - More powerful analysis
  - Higher cost ($2.50/1M tokens)
  - Better for complex documents

For first test: Use GPT-4o mini

2. Select Analysis Depth

○ Quick Extract
  - 5-10 seconds
  - Basic text extraction
  - No analysis
  
● Standard Analysis (Recommended)
  - 15-30 seconds
  - Text extraction + summary
  - Key points identified
  
○ Deep Analysis
  - 45-90 seconds
  - Full content understanding
  - Entity extraction
  - Sentiment analysis

For first test: Use Standard Analysis

3. Select Output Format

● JSON (Recommended)
  - Structured data
  - Easy to review
  - Includes metadata
  
○ Markdown
  - Human-readable
  - Preserves formatting
  
○ Plain Text
  - Simple text only

For first test: Use JSON

4. Optional Settings

☑ Enable Vector Storage
  - Allows semantic search later
  - Adds 2-5 seconds to processing
  - Recommended: Enable
  
☐ Enable Content Chunking
  - For large documents (50+ pages)
  - First test: Leave disabled

Submit Processing Job

Review Configuration
- Model: GPT-4o mini
- Depth: Standard Analysis
- Format: JSON
- Vector Storage: Enabled
Click "Process Document"
- Job will be submitted
- Processing starts immediately
- Progress bar appears
Monitor Progress You'll see real-time updates:
```
Processing document... 25%
Extracting text...
```
Wait for Completion
- Standard analysis: 15-30 seconds
- Don't close browser
- Don't navigate away

Step 3B: Scrape a URL

If you're processing a website URL:

URL Input Interface

Click "URL Scraping" tab
- Second tab in main content area
Enter URL
- Paste URL into input field
- Example: https://www.bbc.com/news/technology-12345678
- Must start with http:// or https://
Verify URL Valid
- Green checkmark appears if valid
- Red X if invalid
- Fix any typos

Configure Processing Options

Same options as PDF processing:

Quick Settings for First URL:

Model: GPT-4o mini
Depth: Standard Analysis
Format: JSON
Vector Storage: Enabled

Submit Scraping Job

Click "Scrape URL"
- Job submitted
- Processing begins

Monitor Progress

Fetching page...
Extracting content...
Processing with AI...

Wait for Completion
- URL scraping: 10-25 seconds
- Depends on page size and complexity

Step 4: View Results

Once processing completes, you'll see results immediately:

Results Card

Document Information:

 Invoice_2024.pdf
Processed: Just now
Model: GPT-4o mini
Status: Complete
Time: 18.3 seconds
Tokens: 2,847 in / 421 out
Cost: $0.0049

Actions:

View - See full content
Download - Save JSON file
Reprocess - Try different options
Delete - Remove result

View Full Content

Click "View" button to see detailed results:

Summary Section:

Summary:
This is an invoice from Acme Corp dated March 15, 2024.
Total amount: $1,247.82. Services include web development 
and hosting fees. Payment due by April 1, 2024.

Key Information Extracted:

- Invoice Number: INV-2024-0315
- Date: March 15, 2024
- Vendor: Acme Corp
- Amount: $1,247.82
- Due Date: April 1, 2024
- Services: Web development, hosting

Extracted Text:

Full text content from the document...
[Shows complete text with formatting preserved]

Metadata:

{
  "filename": "Invoice_2024.pdf",
  "pages": 1,
  "file_size": "124 KB",
  "processed_at": "2024-03-20T10:15:32Z",
  "model": "gpt-4o-mini",
  "language": "en",
  "word_count": 342
}

Understanding the Results

Summary:

Auto-generated overview (150-300 words)
Captures main points
Good for quick review

Key Information:

Structured data extracted
Entities identified (people, companies, dates, amounts)
Only shown if Standard or Deep analysis used

Extracted Text:

Complete text from document
Preserves structure (paragraphs, lists)
May include formatting from original

Metadata:

Technical details about processing
Useful for debugging or optimization

Step 5: Download Results

Save your results locally:

Download Options

Click "Download" button
- Dropdown menu appears

Choose Format

○ JSON (original format)
○ Markdown (.md file)
○ Plain Text (.txt file)
○ CSV (metadata only)

Select Format and Confirm
- File downloads to your browser's download folder
- Filename: Invoice_2024_results.json

Example JSON Output

{
  "document_id": "doc_abc123xyz",
  "filename": "Invoice_2024.pdf",
  "processed_at": "2024-03-20T10:15:32Z",
  "model": "gpt-4o-mini",
  "analysis_depth": "standard",
  "summary": "This is an invoice from Acme Corp...",
  "key_information": {
    "invoice_number": "INV-2024-0315",
    "date": "2024-03-15",
    "vendor": "Acme Corp",
    "amount": 1247.82,
    "currency": "USD",
    "due_date": "2024-04-01"
  },
  "extracted_text": "Full text content...",
  "metadata": {
    "pages": 1,
    "file_size": 126976,
    "language": "en",
    "word_count": 342
  },
  "processing_stats": {
    "processing_time_seconds": 18.3,
    "input_tokens": 2847,
    "output_tokens": 421,
    "total_tokens": 3268,
    "cost_usd": 0.0049
  }
}

Step 6: Verify Processing Costs

Check how much this processing cost:

View Cost Breakdown

In Results Card:

Cost: $0.0049

Calculation:

Input tokens: 2,847 × $0.150 / 1,000,000 = $0.0004
Output tokens: 421 × $0.600 / 1,000,000 = $0.0003
Total: $0.0007 (rounded to $0.0049 including processing overhead)

Cost Factors:

Model used (GPT-4o vs GPT-4o mini)
Document size (more text = more tokens)
Analysis depth (Deep uses more tokens)
Output format (JSON includes more structure)

View Total Monthly Usage

Go to Settings → Usage Statistics

See cumulative costs:

Documents Processed: 1
Total Tokens: 3,268
Total Cost: $0.0049

Step 7: Try Different Options

Now that you've processed one document successfully, experiment:

Reprocess with Different Model

Click "Reprocess" on your result
Change model to GPT-4o (more powerful)
Keep other settings same
Click "Process"
Compare results and cost

Expect:

Similar or better quality
Higher cost (~$0.08 for same document)
Possibly more detailed insights

Try Deep Analysis

Reprocess again
Select Deep Analysis
Process
Review entity extraction and sentiment

Expect:

Longer processing time (45-90 seconds)
More structured data extracted
Entities identified (people, companies, locations)
Sentiment analysis included

Try Different Output Format

Reprocess one more time
Select Markdown format
Process
Download and open in text editor

Expect:

More human-readable format
Better for documentation
Still includes all extracted data

Understanding Processing Results

Success Indicators

Processing succeeded if:

✓ Status shows "Complete"
✓ Summary generated
✓ Extracted text visible
✓ Cost calculated
✓ Download button available

Common Issues and Solutions

Issue: Processing Stuck at 50%

Solution:

Wait 60 seconds (sometimes takes time)
If still stuck, refresh page
Check Settings → Service Status
Try reprocessing if failed

Issue: "No text extracted"

Causes:

PDF is scanned image (not searchable text)
PDF is password-protected
PDF is corrupted

Solution:

Try OCR version (coming soon)
Remove password and re-upload
Try different PDF

Issue: Cost Higher Than Expected

Causes:

Document larger than estimated
Used GPT-4o instead of mini
Deep analysis used more tokens

Solution:

Check token counts in results
Use GPT-4o mini for cost savings
Use Quick Extract for simple documents

Issue: Summary Seems Incomplete

Causes:

Document too long for single processing
Complex structure (tables, images)
Used Quick Extract (minimal analysis)

Solution:

Use Deep Analysis instead
Enable content chunking for large docs
Try GPT-4o for better understanding

Best Practices for Document Processing

Start Simple:

Test with 1-2 page documents first
Use Standard Analysis (not Deep)
Use GPT-4o mini initially
Verify results quality

Scale Gradually:

Once comfortable, try larger documents
Experiment with Deep Analysis
Compare GPT-4o vs mini results
Test batch processing

Optimize Costs:

Use GPT-4o mini for straightforward extraction
Reserve GPT-4o for complex analysis
Use Quick Extract when speed matters
Monitor monthly usage in Settings

Monitor Quality:

Review extracted text accuracy
Check if summaries capture key points
Verify entity extraction correctness
Adjust settings based on results

Next Steps

Now that you've processed your first document:

Process Multiple Documents
- Set Up Batch Processing
- Learn to handle 10+ documents at once
Integrate with API
- API Authentication
- Automate document processing
Search Your Documents
- Semantic Search Guide
- Find documents by meaning
Optimize Costs
- Cost Optimization
- Reduce monthly spending

Troubleshooting

For detailed troubleshooting:

Support

Need help?

Prerequisites​

Step 1: Prepare Your Test Document​

Step 2: Access the Dashboard​

Step 3A: Upload a PDF Document​

Upload Interface​

Configure Processing Options​

Submit Processing Job​

Step 3B: Scrape a URL​

URL Input Interface​

Configure Processing Options​

Submit Scraping Job​

Step 4: View Results​

Results Card​

View Full Content​

Understanding the Results​

Step 5: Download Results​

Download Options​

Example JSON Output​

Step 6: Verify Processing Costs​

View Cost Breakdown​

View Total Monthly Usage​

Step 7: Try Different Options​

Reprocess with Different Model​

Try Deep Analysis​

Try Different Output Format​

Understanding Processing Results​

Success Indicators​

Common Issues and Solutions​

Best Practices for Document Processing​

Next Steps​

Troubleshooting​

Support​

Prerequisites

Step 1: Prepare Your Test Document

Step 2: Access the Dashboard

Step 3A: Upload a PDF Document

Upload Interface

Configure Processing Options

Submit Processing Job

Step 3B: Scrape a URL

URL Input Interface

Configure Processing Options

Submit Scraping Job

Step 4: View Results

Results Card

View Full Content

Understanding the Results

Step 5: Download Results

Download Options

Example JSON Output

Step 6: Verify Processing Costs

View Cost Breakdown

View Total Monthly Usage

Step 7: Try Different Options

Reprocess with Different Model

Try Deep Analysis

Try Different Output Format

Understanding Processing Results

Success Indicators

Common Issues and Solutions

Best Practices for Document Processing

Next Steps

Troubleshooting

Support