Document Intelligence

Info

This product is currently in beta and subject to change.

Overview

Document Intelligence accurately classifies any PDF document and extracts key information — all with exclusively in-memory processing and a zero-retention architecture that guarantees your sensitive data never leaves Switzerland.

Security and Data Privacy

GuardOS is committed to ensuring the security and privacy of your data. See Security and Data Privacy for more information.

How It Works

Document Intelligence Schema

Document Intelligence applies a multi-step, mixture of experts approach to extract key information from a given PDF document.

Step 1 - Pre-Processing

Each document undergoes an initial pre-processing stage to remove noise and optimize it for accurate classification.

This step includes textbox- and image-extraction. The textbox-position and size, textsize and the text itself are extracted. Any images inside the PDF are photometrically cleaned, geometrically normalized and denoised. The contrast and illumination are adapted. Then the image gets grayscaled, resized and morphological operations are applied.

Step 2 - Data Extraction

To deliver the most accurate results, each document is intelligently processed using a cutting-edge mixture-of-experts approach. Specialized models, ranging from machine learning and natural language processing to OCR and large language models, are strategically applied to specific tasks for maximum effectiveness.

For instance, determining whether a document includes a signature is handled by a custom-trained computer vision model designed for high precision.

Step 3 - Post-Processing

In the post-processing phase, extracted information is verified, refined, and consolidated into a clean, standardized format.

This step ensures consistency across outputs by validating key data points, resolving ambiguities, and applying business logic as needed.

The result is structured, reliable information—ready for seamless integration into downstream systems or workflows.

Input File Formats

Document Intelligence supports the following input document file types:

PDF (up to 15MB, unlimited pages)

Pricing

Pricing is based on expected volume. Please contact us for a quote.

Info

Clients are NOT charged for:

Failed requests (e.g. invalid API key, invalid file type, etc.)
Failed data processing (e.g. documents that are too large, documents that are not PDFs, etc.)
Failed data extraction
Documents classified as "other"

Supported Document Types

The Document Intelligence API currently supports the following document types:

Document Type	Description
certificate_of_incorporation	Legal documents establishing the creation of a corporation
certificate_of_registration	Official registration certificates for businesses and entities
change_of_address	Documents recording official address changes for companies
change_of_name	Documents recording legal company name changes
commercial_register	Commercial registry extracts and business registrations
confirmation_statement	Annual statements confirming company details remain current
director_appointment	Documents recording the appointment of new company directors
director_resignation	Documents recording the resignation of company directors
dissolvement_of_lp	Documents related to the dissolution of limited partnerships
financial_statements	Company financial reports, balance sheets and profit/loss statements
loan_agreement	Financial loan contracts and credit agreements
register_of_shareholders	Records of company shareholders and their holdings
rental_agreement	Property rental and lease agreements
other	Documents not matching any of the above categories

Preview Stage Document Types

The following document types are currently in preview.

Document Type	Description
bank_statement	Summary of financial transactions occurring within a given period on a bank account
birth_certificate	Official record documenting the birth of a person
employment_contract	Legal agreement between an employer and an employee detailing terms of employment
government_id	Official government-issued identification documents (e.g., ID card, driver's license)
invoice	Commercial document itemizing a transaction between a buyer and a seller
passport	Official travel document issued by a government, verifying identity and nationality
real_estate_purchase_agreement	Contract for the sale and purchase of real property
share_purchase_agreement	Contract for the sale and purchase of company shares
shareholder_agreement	Agreement among shareholders regarding company operations and share ownership rights
utility_bill	Bill for essential services like electricity, water, or gas

Supported Document Metadata

Info

Document metadata is currently experimental and subject to change.

The Document Intelligence API currently supports the following document type agnostic metadata:

Metadata Field	Description
companyName	Company name detected in document (if available)
companyRegistrationNumber	Company registration number (if available)
documentDate	Document date in YYYY-MM-DD format (if available)
hasSignature	Whether a handwritten signature was detected (if available)

Endpoints

Method	Endpoint
POST	`https://api.guardos.ai/api/v1/document-intelligence`

POST: /api/v1/document-intelligence

Request Headers

interface DocumentIntelligenceRequest {
	'Content-Type': 'multipart/form-data'
	'x-api-key': string
}

Header Name	Type	Description
Content-Type	string	`multipart/form-data`
x-api-key	string	Your API key

Request Body

The request should be a multipart/form-data with a single file field containing a PDF document.

Field Name	Type	Description
file	File	PDF file to analyze (max 15MB)

Success Response

interface DocumentIntelligenceResponse {
	document: {
		type: 'commercial_register' | 'rental_agreement' | 'loan_agreement' | ... | 'other'
		metadata: {
			companyName: string | null
			companyRegistrationNumber: string | null
			documentDate: string | null // Format: YYYY-MM-DD
			hasSignature: boolean
		}
		pages: number // Number of pages in the document
	}
}

Field Name	Type	Description
document.type	string	Type of document detected
document.metadata.companyName	string	Company name detected in document (if available)
document.metadata.companyRegistrationNumber	string	Company registration number (if available)
document.metadata.documentDate	string	Document date in YYYY-MM-DD format (if available)
document.metadata.hasSignature	boolean	Whether a handwritten signature was detected
pages	number	Number of pages in the processed document

Error Response

interface DocumentIntelligenceError {
	error: string
	details?: string
}

Status Code	Description
400	Bad Request - No file or invalid file type uploaded
401	Unauthorized - Invalid or missing API key
429	Too Many Requests - Rate limit exceeded
500	Server Error - Processing error

Examples

Example Request

curl

curl -X POST https://api.guardos.ai/api/v1/document-intelligence \
  -H "x-api-key: YOUR_API_KEY" \
  -F "file=@document.pdf"

JavaScript (Browser)

// Using a file input element: <input type="file" id="pdfFile" accept="application/pdf">
const fileInput = document.getElementById('pdfFile');
const file = fileInput.files[0];

const form = new FormData();
form.append('file', file);

const response = await fetch('https://api.guardos.ai/api/v1/document-intelligence', {
  method: 'POST',
  headers: {
    'x-api-key': 'YOUR_API_KEY'
  },
  body: form
});

const result = await response.json();
console.log(result);

Node.js

import fetch from 'node-fetch';
import { createReadStream } from 'fs';
import FormData from 'form-data';

async function analyzeDocument() {
  const form = new FormData();
  form.append('file', createReadStream('document.pdf'));
  
  const response = await fetch('https://api.guardos.ai/api/v1/document-intelligence', {
    method: 'POST',
    headers: {
      'x-api-key': 'YOUR_API_KEY',
      ...form.getHeaders()
    },
    body: form
  });
  
  const result = await response.json();
  console.log(result);
}

analyzeDocument();

Python

import requests

url = 'https://api.guardos.ai/api/v1/document-intelligence'
headers = {
    'x-api-key': 'YOUR_API_KEY'
}
files = {
    'file': ('document.pdf', open('document.pdf', 'rb'), 'application/pdf')
}

response = requests.post(url, headers=headers, files=files)
print(response.json())

Example Response

{
	"document": {
		"type": "commercial_register",
		"metadata": {
			"companyName": "Acme Corporation AG",
			"companyRegistrationNumber": "CHE-123.456.789",
			"documentDate": "2023-05-15",
			"hasSignature": true
		}
	},
	"pages": 3
}