Skip to main content

Processing Your First Document

This guide walks you through uploading and processing your first document with Alactic AGI. You'll learn how to extract content, view results, and understand the processing pipeline.

Prerequisites

Before processing your first document:

  • Deployment completed successfully
  • Dashboard accessible at your VM's IP address
  • Deployment key retrieved and saved
  • Logged into dashboard

Step 1: Prepare Your Test Document

For your first processing job, choose a simple document:

Recommended Test Documents:

  • Single-page PDF (invoice, receipt, form)
  • Short research paper (5-10 pages)
  • News article URL
  • Blog post URL

Avoid for First Test:

  • Very large PDFs (100+ pages)
  • Scanned documents with poor quality
  • Password-protected PDFs
  • Websites with heavy JavaScript

Sample Documents: If you don't have a document ready, try these public URLs:

  • News article: https://www.bbc.com/news/technology (any recent article)
  • Research paper: https://arxiv.org/abs/2303.08774 (GPT-4 paper)
  • Blog post: Any Medium article URL

Step 2: Access the Dashboard

  1. Open your browser Navigate to your dashboard URL:

    https://<your-vm-ip-address>
  2. Login

    • Enter your deployment key (format: ak-xxxxx)
    • Click "Sign In"
    • You'll see the main dashboard
  3. Verify Dashboard Loaded

    • Top navigation visible
    • Processing tabs visible (PDF Upload, URL Scraping, Batch)
    • Status panel shows your plan and usage

Step 3A: Upload a PDF Document

If you're processing a PDF file:

Upload Interface

  1. Click "PDF Upload" tab

    • Located in the main content area
    • Default tab when dashboard loads
  2. Select Your File

    Method 1: Drag and Drop

    • Drag PDF file from your computer
    • Drop onto the upload area
    • File will be added to queue

    Method 2: Click to Browse

    • Click "Select PDF Files" button
    • Browse to your PDF location
    • Select file and click "Open"
  3. Verify File Added You'll see a preview card showing:

    • File name
    • File size
    • Page count (if detected)
    • Ready to process

Configure Processing Options

Before submitting, configure how you want the document processed:

1. Select Model

Choose the AI model:

○ Alactic GPT-4o mini (Recommended for first test)
- Faster processing
- Lower cost ($0.150/1M tokens)
- Good for straightforward extraction

○ Alactic GPT-4o
- More powerful analysis
- Higher cost ($2.50/1M tokens)
- Better for complex documents

For first test: Use GPT-4o mini

2. Select Analysis Depth

○ Quick Extract
- 5-10 seconds
- Basic text extraction
- No analysis

● Standard Analysis (Recommended)
- 15-30 seconds
- Text extraction + summary
- Key points identified

○ Deep Analysis
- 45-90 seconds
- Full content understanding
- Entity extraction
- Sentiment analysis

For first test: Use Standard Analysis

3. Select Output Format

● JSON (Recommended)
- Structured data
- Easy to review
- Includes metadata

○ Markdown
- Human-readable
- Preserves formatting

○ Plain Text
- Simple text only

For first test: Use JSON

4. Optional Settings

☑ Enable Vector Storage
- Allows semantic search later
- Adds 2-5 seconds to processing
- Recommended: Enable

☐ Enable Content Chunking
- For large documents (50+ pages)
- First test: Leave disabled

Submit Processing Job

  1. Review Configuration

    • Model: GPT-4o mini
    • Depth: Standard Analysis
    • Format: JSON
    • Vector Storage: Enabled
  2. Click "Process Document"

    • Job will be submitted
    • Processing starts immediately
    • Progress bar appears
  3. Monitor Progress You'll see real-time updates:

    Processing document... 25%
    Extracting text...
  4. Wait for Completion

    • Standard analysis: 15-30 seconds
    • Don't close browser
    • Don't navigate away

Step 3B: Scrape a URL

If you're processing a website URL:

URL Input Interface

  1. Click "URL Scraping" tab

    • Second tab in main content area
  2. Enter URL

    • Paste URL into input field
    • Example: https://www.bbc.com/news/technology-12345678
    • Must start with http:// or https://
  3. Verify URL Valid

    • Green checkmark appears if valid
    • Red X if invalid
    • Fix any typos

Configure Processing Options

Same options as PDF processing:

Quick Settings for First URL:

  • Model: GPT-4o mini
  • Depth: Standard Analysis
  • Format: JSON
  • Vector Storage: Enabled

Submit Scraping Job

  1. Click "Scrape URL"

    • Job submitted
    • Processing begins
  2. Monitor Progress

    Fetching page...
    Extracting content...
    Processing with AI...
  3. Wait for Completion

    • URL scraping: 10-25 seconds
    • Depends on page size and complexity

Step 4: View Results

Once processing completes, you'll see results immediately:

Results Card

Document Information:

 Invoice_2024.pdf
Processed: Just now
Model: GPT-4o mini
Status: Complete
Time: 18.3 seconds
Tokens: 2,847 in / 421 out
Cost: $0.0049

Actions:

  • View - See full content
  • Download - Save JSON file
  • Reprocess - Try different options
  • Delete - Remove result

View Full Content

Click "View" button to see detailed results:

Summary Section:

Summary:
This is an invoice from Acme Corp dated March 15, 2024.
Total amount: $1,247.82. Services include web development
and hosting fees. Payment due by April 1, 2024.

Key Information Extracted:

- Invoice Number: INV-2024-0315
- Date: March 15, 2024
- Vendor: Acme Corp
- Amount: $1,247.82
- Due Date: April 1, 2024
- Services: Web development, hosting

Extracted Text:

Full text content from the document...
[Shows complete text with formatting preserved]

Metadata:

{
"filename": "Invoice_2024.pdf",
"pages": 1,
"file_size": "124 KB",
"processed_at": "2024-03-20T10:15:32Z",
"model": "gpt-4o-mini",
"language": "en",
"word_count": 342
}

Understanding the Results

Summary:

  • Auto-generated overview (150-300 words)
  • Captures main points
  • Good for quick review

Key Information:

  • Structured data extracted
  • Entities identified (people, companies, dates, amounts)
  • Only shown if Standard or Deep analysis used

Extracted Text:

  • Complete text from document
  • Preserves structure (paragraphs, lists)
  • May include formatting from original

Metadata:

  • Technical details about processing
  • Useful for debugging or optimization

Step 5: Download Results

Save your results locally:

Download Options

  1. Click "Download" button

    • Dropdown menu appears
  2. Choose Format

    ○ JSON (original format)
    ○ Markdown (.md file)
    ○ Plain Text (.txt file)
    ○ CSV (metadata only)
  3. Select Format and Confirm

    • File downloads to your browser's download folder
    • Filename: Invoice_2024_results.json

Example JSON Output

{
"document_id": "doc_abc123xyz",
"filename": "Invoice_2024.pdf",
"processed_at": "2024-03-20T10:15:32Z",
"model": "gpt-4o-mini",
"analysis_depth": "standard",
"summary": "This is an invoice from Acme Corp...",
"key_information": {
"invoice_number": "INV-2024-0315",
"date": "2024-03-15",
"vendor": "Acme Corp",
"amount": 1247.82,
"currency": "USD",
"due_date": "2024-04-01"
},
"extracted_text": "Full text content...",
"metadata": {
"pages": 1,
"file_size": 126976,
"language": "en",
"word_count": 342
},
"processing_stats": {
"processing_time_seconds": 18.3,
"input_tokens": 2847,
"output_tokens": 421,
"total_tokens": 3268,
"cost_usd": 0.0049
}
}

Step 6: Verify Processing Costs

Check how much this processing cost:

View Cost Breakdown

In Results Card:

Cost: $0.0049

Calculation:

Input tokens: 2,847 × $0.150 / 1,000,000 = $0.0004
Output tokens: 421 × $0.600 / 1,000,000 = $0.0003
Total: $0.0007 (rounded to $0.0049 including processing overhead)

Cost Factors:

  • Model used (GPT-4o vs GPT-4o mini)
  • Document size (more text = more tokens)
  • Analysis depth (Deep uses more tokens)
  • Output format (JSON includes more structure)

View Total Monthly Usage

  1. Go to SettingsUsage Statistics
  2. See cumulative costs:
    Documents Processed: 1
    Total Tokens: 3,268
    Total Cost: $0.0049

Step 7: Try Different Options

Now that you've processed one document successfully, experiment:

Reprocess with Different Model

  1. Click "Reprocess" on your result
  2. Change model to GPT-4o (more powerful)
  3. Keep other settings same
  4. Click "Process"
  5. Compare results and cost

Expect:

  • Similar or better quality
  • Higher cost (~$0.08 for same document)
  • Possibly more detailed insights

Try Deep Analysis

  1. Reprocess again
  2. Select Deep Analysis
  3. Process
  4. Review entity extraction and sentiment

Expect:

  • Longer processing time (45-90 seconds)
  • More structured data extracted
  • Entities identified (people, companies, locations)
  • Sentiment analysis included

Try Different Output Format

  1. Reprocess one more time
  2. Select Markdown format
  3. Process
  4. Download and open in text editor

Expect:

  • More human-readable format
  • Better for documentation
  • Still includes all extracted data

Understanding Processing Results

Success Indicators

Processing succeeded if:

  • ✓ Status shows "Complete"
  • ✓ Summary generated
  • ✓ Extracted text visible
  • ✓ Cost calculated
  • ✓ Download button available

Common Issues and Solutions

Issue: Processing Stuck at 50%

Solution:

  • Wait 60 seconds (sometimes takes time)
  • If still stuck, refresh page
  • Check Settings → Service Status
  • Try reprocessing if failed

Issue: "No text extracted"

Causes:

  • PDF is scanned image (not searchable text)
  • PDF is password-protected
  • PDF is corrupted

Solution:

  • Try OCR version (coming soon)
  • Remove password and re-upload
  • Try different PDF

Issue: Cost Higher Than Expected

Causes:

  • Document larger than estimated
  • Used GPT-4o instead of mini
  • Deep analysis used more tokens

Solution:

  • Check token counts in results
  • Use GPT-4o mini for cost savings
  • Use Quick Extract for simple documents

Issue: Summary Seems Incomplete

Causes:

  • Document too long for single processing
  • Complex structure (tables, images)
  • Used Quick Extract (minimal analysis)

Solution:

  • Use Deep Analysis instead
  • Enable content chunking for large docs
  • Try GPT-4o for better understanding

Best Practices for Document Processing

Start Simple:

  • Test with 1-2 page documents first
  • Use Standard Analysis (not Deep)
  • Use GPT-4o mini initially
  • Verify results quality

Scale Gradually:

  • Once comfortable, try larger documents
  • Experiment with Deep Analysis
  • Compare GPT-4o vs mini results
  • Test batch processing

Optimize Costs:

  • Use GPT-4o mini for straightforward extraction
  • Reserve GPT-4o for complex analysis
  • Use Quick Extract when speed matters
  • Monitor monthly usage in Settings

Monitor Quality:

  • Review extracted text accuracy
  • Check if summaries capture key points
  • Verify entity extraction correctness
  • Adjust settings based on results

Next Steps

Now that you've processed your first document:

  1. Process Multiple Documents

  2. Integrate with API

  3. Search Your Documents

  4. Optimize Costs

Troubleshooting

For detailed troubleshooting:

Support

Need help?