Document Intelligence
Info
This product is currently in beta and subject to change.
Overview
Document Intelligence accurately classifies any PDF document and extracts key information — all with exclusively in-memory processing and a zero-retention architecture that guarantees your sensitive data never leaves Switzerland.
Security and Data Privacy
GuardOS is committed to ensuring the security and privacy of your data. See Security and Data Privacy for more information.
How It Works
Document Intelligence Schema
Document Intelligence applies a multi-step, mixture of experts approach to extract key information from a given PDF document.
Step 1 - Pre-Processing
Each document undergoes an initial pre-processing stage to remove noise and optimize it for accurate classification.
This step includes textbox- and image-extraction. The textbox-position and size, textsize and the text itself are extracted. Any images inside the PDF are photometrically cleaned, geometrically normalized and denoised. The contrast and illumination are adapted. Then the image gets grayscaled, resized and morphological operations are applied.
Step 2 - Data Extraction
To deliver the most accurate results, each document is intelligently processed using a cutting-edge mixture-of-experts approach. Specialized models, ranging from machine learning and natural language processing to OCR and large language models, are strategically applied to specific tasks for maximum effectiveness.
For instance, determining whether a document includes a signature is handled by a custom-trained computer vision model designed for high precision.
Step 3 - Post-Processing
In the post-processing phase, extracted information is verified, refined, and consolidated into a clean, standardized format.
This step ensures consistency across outputs by validating key data points, resolving ambiguities, and applying business logic as needed.
The result is structured, reliable information—ready for seamless integration into downstream systems or workflows.
Input File Formats
Document Intelligence supports the following input document file types:
- PDF (up to 15MB, unlimited pages)
Pricing
Pricing is based on expected volume. Please contact us for a quote.
Info
Clients are NOT charged for:
- Failed requests (e.g. invalid API key, invalid file type, etc.)
- Failed data processing (e.g. documents that are too large, documents that are not PDFs, etc.)
- Failed data extraction
- Documents classified as "other"
Supported Document Types
The Document Intelligence API currently supports the following document types:
| Document Type | Description |
|---|---|
| certificate_of_incorporation | Legal documents establishing the creation of a corporation |
| certificate_of_registration | Official registration certificates for businesses and entities |
| change_of_address | Documents recording official address changes for companies |
| change_of_name | Documents recording legal company name changes |
| commercial_register | Commercial registry extracts and business registrations |
| confirmation_statement | Annual statements confirming company details remain current |
| director_appointment | Documents recording the appointment of new company directors |
| director_resignation | Documents recording the resignation of company directors |
| dissolvement_of_lp | Documents related to the dissolution of limited partnerships |
| financial_statements | Company financial reports, balance sheets and profit/loss statements |
| loan_agreement | Financial loan contracts and credit agreements |
| register_of_shareholders | Records of company shareholders and their holdings |
| rental_agreement | Property rental and lease agreements |
| other | Documents not matching any of the above categories |
Preview Stage Document Types
The following document types are currently in preview.
| Document Type | Description |
|---|---|
| bank_statement | Summary of financial transactions occurring within a given period on a bank account |
| birth_certificate | Official record documenting the birth of a person |
| employment_contract | Legal agreement between an employer and an employee detailing terms of employment |
| government_id | Official government-issued identification documents (e.g., ID card, driver's license) |
| invoice | Commercial document itemizing a transaction between a buyer and a seller |
| passport | Official travel document issued by a government, verifying identity and nationality |
| real_estate_purchase_agreement | Contract for the sale and purchase of real property |
| share_purchase_agreement | Contract for the sale and purchase of company shares |
| shareholder_agreement | Agreement among shareholders regarding company operations and share ownership rights |
| utility_bill | Bill for essential services like electricity, water, or gas |
Supported Document Metadata
Info
Document metadata is currently experimental and subject to change.
The Document Intelligence API currently supports the following document type agnostic metadata:
| Metadata Field | Description |
|---|---|
| companyName | Company name detected in document (if available) |
| companyRegistrationNumber | Company registration number (if available) |
| documentDate | Document date in YYYY-MM-DD format (if available) |
| hasSignature | Whether a handwritten signature was detected (if available) |
Endpoints
| Method | Endpoint |
|---|---|
| POST | https://api.guardos.ai/api/v1/document-intelligence |
POST: /api/v1/document-intelligence
Request Headers
interface DocumentIntelligenceRequest {
'Content-Type': 'multipart/form-data'
'x-api-key': string
}
| Header Name | Type | Description |
|---|---|---|
| Content-Type | string | multipart/form-data |
| x-api-key | string | Your API key |
Request Body
The request should be a multipart/form-data with a single file field containing a PDF document.
| Field Name | Type | Description |
|---|---|---|
| file | File | PDF file to analyze (max 15MB) |
Success Response
interface DocumentIntelligenceResponse {
document: {
type: 'commercial_register' | 'rental_agreement' | 'loan_agreement' | ... | 'other'
metadata: {
companyName: string | null
companyRegistrationNumber: string | null
documentDate: string | null // Format: YYYY-MM-DD
hasSignature: boolean
}
pages: number // Number of pages in the document
}
}
| Field Name | Type | Description |
|---|---|---|
| document.type | string | Type of document detected |
| document.metadata.companyName | string | Company name detected in document (if available) |
| document.metadata.companyRegistrationNumber | string | Company registration number (if available) |
| document.metadata.documentDate | string | Document date in YYYY-MM-DD format (if available) |
| document.metadata.hasSignature | boolean | Whether a handwritten signature was detected |
| pages | number | Number of pages in the processed document |
Error Response
interface DocumentIntelligenceError {
error: string
details?: string
}
| Status Code | Description |
|---|---|
| 400 | Bad Request - No file or invalid file type uploaded |
| 401 | Unauthorized - Invalid or missing API key |
| 429 | Too Many Requests - Rate limit exceeded |
| 500 | Server Error - Processing error |
Examples
Example Request
curl -X POST https://api.guardos.ai/api/v1/document-intelligence \
-H "x-api-key: YOUR_API_KEY" \
-F "file=@document.pdf"
// Using a file input element: <input type="file" id="pdfFile" accept="application/pdf">
const fileInput = document.getElementById('pdfFile');
const file = fileInput.files[0];
const form = new FormData();
form.append('file', file);
const response = await fetch('https://api.guardos.ai/api/v1/document-intelligence', {
method: 'POST',
headers: {
'x-api-key': 'YOUR_API_KEY'
},
body: form
});
const result = await response.json();
console.log(result);
import fetch from 'node-fetch';
import { createReadStream } from 'fs';
import FormData from 'form-data';
async function analyzeDocument() {
const form = new FormData();
form.append('file', createReadStream('document.pdf'));
const response = await fetch('https://api.guardos.ai/api/v1/document-intelligence', {
method: 'POST',
headers: {
'x-api-key': 'YOUR_API_KEY',
...form.getHeaders()
},
body: form
});
const result = await response.json();
console.log(result);
}
analyzeDocument();
import requests
url = 'https://api.guardos.ai/api/v1/document-intelligence'
headers = {
'x-api-key': 'YOUR_API_KEY'
}
files = {
'file': ('document.pdf', open('document.pdf', 'rb'), 'application/pdf')
}
response = requests.post(url, headers=headers, files=files)
print(response.json())
Example Response
{
"document": {
"type": "commercial_register",
"metadata": {
"companyName": "Acme Corporation AG",
"companyRegistrationNumber": "CHE-123.456.789",
"documentDate": "2023-05-15",
"hasSignature": true
}
},
"pages": 3
}