Processing Your First Document
This guide walks you through uploading and processing your first document with Alactic AGI. You'll learn how to extract content, view results, and understand the processing pipeline.
Prerequisites
Before processing your first document:
- Deployment completed successfully
- Dashboard accessible at your VM's IP address
- Deployment key retrieved and saved
- Logged into dashboard
Step 1: Prepare Your Test Document
For your first processing job, choose a simple document:
Recommended Test Documents:
- Single-page PDF (invoice, receipt, form)
- Short research paper (5-10 pages)
- News article URL
- Blog post URL
Avoid for First Test:
- Very large PDFs (100+ pages)
- Scanned documents with poor quality
- Password-protected PDFs
- Websites with heavy JavaScript
Sample Documents: If you don't have a document ready, try these public URLs:
- News article:
https://www.bbc.com/news/technology(any recent article) - Research paper:
https://arxiv.org/abs/2303.08774(GPT-4 paper) - Blog post: Any Medium article URL
Step 2: Access the Dashboard
-
Open your browser Navigate to your dashboard URL:
https://<your-vm-ip-address> -
Login
- Enter your deployment key (format:
ak-xxxxx) - Click "Sign In"
- You'll see the main dashboard
- Enter your deployment key (format:
-
Verify Dashboard Loaded
- Top navigation visible
- Processing tabs visible (PDF Upload, URL Scraping, Batch)
- Status panel shows your plan and usage
Step 3A: Upload a PDF Document
If you're processing a PDF file:
Upload Interface
-
Click "PDF Upload" tab
- Located in the main content area
- Default tab when dashboard loads
-
Select Your File
Method 1: Drag and Drop
- Drag PDF file from your computer
- Drop onto the upload area
- File will be added to queue
Method 2: Click to Browse
- Click "Select PDF Files" button
- Browse to your PDF location
- Select file and click "Open"
-
Verify File Added You'll see a preview card showing:
- File name
- File size
- Page count (if detected)
- Ready to process
Configure Processing Options
Before submitting, configure how you want the document processed:
1. Select Model
Choose the AI model:
○ Alactic GPT-4o mini (Recommended for first test)
- Faster processing
- Lower cost ($0.150/1M tokens)
- Good for straightforward extraction
○ Alactic GPT-4o
- More powerful analysis
- Higher cost ($2.50/1M tokens)
- Better for complex documents
For first test: Use GPT-4o mini
2. Select Analysis Depth
○ Quick Extract
- 5-10 seconds
- Basic text extraction
- No analysis
● Standard Analysis (Recommended)
- 15-30 seconds
- Text extraction + summary
- Key points identified
○ Deep Analysis
- 45-90 seconds
- Full content understanding
- Entity extraction
- Sentiment analysis
For first test: Use Standard Analysis
3. Select Output Format
● JSON (Recommended)
- Structured data
- Easy to review
- Includes metadata
○ Markdown
- Human-readable
- Preserves formatting
○ Plain Text
- Simple text only
For first test: Use JSON
4. Optional Settings
☑ Enable Vector Storage
- Allows semantic search later
- Adds 2-5 seconds to processing
- Recommended: Enable
☐ Enable Content Chunking
- For large documents (50+ pages)
- First test: Leave disabled
Submit Processing Job
-
Review Configuration
- Model: GPT-4o mini
- Depth: Standard Analysis
- Format: JSON
- Vector Storage: Enabled
-
Click "Process Document"
- Job will be submitted
- Processing starts immediately
- Progress bar appears
-
Monitor Progress You'll see real-time updates:
Processing document... 25%
Extracting text... -
Wait for Completion
- Standard analysis: 15-30 seconds
- Don't close browser
- Don't navigate away
Step 3B: Scrape a URL
If you're processing a website URL:
URL Input Interface
-
Click "URL Scraping" tab
- Second tab in main content area
-
Enter URL
- Paste URL into input field
- Example:
https://www.bbc.com/news/technology-12345678 - Must start with
http://orhttps://
-
Verify URL Valid
- Green checkmark appears if valid
- Red X if invalid
- Fix any typos
Configure Processing Options
Same options as PDF processing:
Quick Settings for First URL:
- Model: GPT-4o mini
- Depth: Standard Analysis
- Format: JSON
- Vector Storage: Enabled
Submit Scraping Job
-
Click "Scrape URL"
- Job submitted
- Processing begins
-
Monitor Progress
Fetching page...
Extracting content...
Processing with AI... -
Wait for Completion
- URL scraping: 10-25 seconds
- Depends on page size and complexity
Step 4: View Results
Once processing completes, you'll see results immediately:
Results Card
Document Information:
Invoice_2024.pdf
Processed: Just now
Model: GPT-4o mini
Status: Complete
Time: 18.3 seconds
Tokens: 2,847 in / 421 out
Cost: $0.0049
Actions:
- View - See full content
- Download - Save JSON file
- Reprocess - Try different options
- Delete - Remove result
View Full Content
Click "View" button to see detailed results:
Summary Section:
Summary:
This is an invoice from Acme Corp dated March 15, 2024.
Total amount: $1,247.82. Services include web development
and hosting fees. Payment due by April 1, 2024.
Key Information Extracted:
- Invoice Number: INV-2024-0315
- Date: March 15, 2024
- Vendor: Acme Corp
- Amount: $1,247.82
- Due Date: April 1, 2024
- Services: Web development, hosting
Extracted Text:
Full text content from the document...
[Shows complete text with formatting preserved]
Metadata:
{
"filename": "Invoice_2024.pdf",
"pages": 1,
"file_size": "124 KB",
"processed_at": "2024-03-20T10:15:32Z",
"model": "gpt-4o-mini",
"language": "en",
"word_count": 342
}
Understanding the Results
Summary:
- Auto-generated overview (150-300 words)
- Captures main points
- Good for quick review
Key Information:
- Structured data extracted
- Entities identified (people, companies, dates, amounts)
- Only shown if Standard or Deep analysis used
Extracted Text:
- Complete text from document
- Preserves structure (paragraphs, lists)
- May include formatting from original
Metadata:
- Technical details about processing
- Useful for debugging or optimization
Step 5: Download Results
Save your results locally:
Download Options
-
Click "Download" button
- Dropdown menu appears
-
Choose Format
○ JSON (original format)
○ Markdown (.md file)
○ Plain Text (.txt file)
○ CSV (metadata only) -
Select Format and Confirm
- File downloads to your browser's download folder
- Filename:
Invoice_2024_results.json
Example JSON Output
{
"document_id": "doc_abc123xyz",
"filename": "Invoice_2024.pdf",
"processed_at": "2024-03-20T10:15:32Z",
"model": "gpt-4o-mini",
"analysis_depth": "standard",
"summary": "This is an invoice from Acme Corp...",
"key_information": {
"invoice_number": "INV-2024-0315",
"date": "2024-03-15",
"vendor": "Acme Corp",
"amount": 1247.82,
"currency": "USD",
"due_date": "2024-04-01"
},
"extracted_text": "Full text content...",
"metadata": {
"pages": 1,
"file_size": 126976,
"language": "en",
"word_count": 342
},
"processing_stats": {
"processing_time_seconds": 18.3,
"input_tokens": 2847,
"output_tokens": 421,
"total_tokens": 3268,
"cost_usd": 0.0049
}
}
Step 6: Verify Processing Costs
Check how much this processing cost:
View Cost Breakdown
In Results Card:
Cost: $0.0049
Calculation:
Input tokens: 2,847 × $0.150 / 1,000,000 = $0.0004
Output tokens: 421 × $0.600 / 1,000,000 = $0.0003
Total: $0.0007 (rounded to $0.0049 including processing overhead)
Cost Factors:
- Model used (GPT-4o vs GPT-4o mini)
- Document size (more text = more tokens)
- Analysis depth (Deep uses more tokens)
- Output format (JSON includes more structure)
View Total Monthly Usage
- Go to Settings → Usage Statistics
- See cumulative costs:
Documents Processed: 1
Total Tokens: 3,268
Total Cost: $0.0049
Step 7: Try Different Options
Now that you've processed one document successfully, experiment:
Reprocess with Different Model
- Click "Reprocess" on your result
- Change model to GPT-4o (more powerful)
- Keep other settings same
- Click "Process"
- Compare results and cost
Expect:
- Similar or better quality
- Higher cost (~$0.08 for same document)
- Possibly more detailed insights
Try Deep Analysis
- Reprocess again
- Select Deep Analysis
- Process
- Review entity extraction and sentiment
Expect:
- Longer processing time (45-90 seconds)
- More structured data extracted
- Entities identified (people, companies, locations)
- Sentiment analysis included
Try Different Output Format
- Reprocess one more time
- Select Markdown format
- Process
- Download and open in text editor
Expect:
- More human-readable format
- Better for documentation
- Still includes all extracted data
Understanding Processing Results
Success Indicators
Processing succeeded if:
- ✓ Status shows "Complete"
- ✓ Summary generated
- ✓ Extracted text visible
- ✓ Cost calculated
- ✓ Download button available
Common Issues and Solutions
Issue: Processing Stuck at 50%
Solution:
- Wait 60 seconds (sometimes takes time)
- If still stuck, refresh page
- Check Settings → Service Status
- Try reprocessing if failed
Issue: "No text extracted"
Causes:
- PDF is scanned image (not searchable text)
- PDF is password-protected
- PDF is corrupted
Solution:
- Try OCR version (coming soon)
- Remove password and re-upload
- Try different PDF
Issue: Cost Higher Than Expected
Causes:
- Document larger than estimated
- Used GPT-4o instead of mini
- Deep analysis used more tokens
Solution:
- Check token counts in results
- Use GPT-4o mini for cost savings
- Use Quick Extract for simple documents
Issue: Summary Seems Incomplete
Causes:
- Document too long for single processing
- Complex structure (tables, images)
- Used Quick Extract (minimal analysis)
Solution:
- Use Deep Analysis instead
- Enable content chunking for large docs
- Try GPT-4o for better understanding
Best Practices for Document Processing
Start Simple:
- Test with 1-2 page documents first
- Use Standard Analysis (not Deep)
- Use GPT-4o mini initially
- Verify results quality
Scale Gradually:
- Once comfortable, try larger documents
- Experiment with Deep Analysis
- Compare GPT-4o vs mini results
- Test batch processing
Optimize Costs:
- Use GPT-4o mini for straightforward extraction
- Reserve GPT-4o for complex analysis
- Use Quick Extract when speed matters
- Monitor monthly usage in Settings
Monitor Quality:
- Review extracted text accuracy
- Check if summaries capture key points
- Verify entity extraction correctness
- Adjust settings based on results
Next Steps
Now that you've processed your first document:
-
Process Multiple Documents
- Set Up Batch Processing
- Learn to handle 10+ documents at once
-
Integrate with API
- API Authentication
- Automate document processing
-
Search Your Documents
- Semantic Search Guide
- Find documents by meaning
-
Optimize Costs
- Cost Optimization
- Reduce monthly spending
Troubleshooting
For detailed troubleshooting:
Support
Need help?
- Email: support@alacticai.com
- Support Portal: alactic.io/support
- Community: community.alactic.ai