GuardOS Docs
Getting Started
Security and Data Privacy
Getting Started
Security and Data Privacy
  • Technical Documentation

    • Getting Started
    • Security and Data Privacy
    • API Keys
    • Usage Metrics
    • Private Endpoint
    • Document Intelligence

Document Intelligence

Info

This product is currently in beta and subject to change.

Overview

Document Intelligence accurately classifies any PDF document and extracts key information — all with exclusively in-memory processing and a zero-retention architecture that guarantees your sensitive data never leaves Switzerland.

Security and Data Privacy

GuardOS is committed to ensuring the security and privacy of your data. See Security and Data Privacy for more information.

How It Works

Document Intelligence Schema Document Intelligence Schema

Document Intelligence applies a multi-step, mixture of experts approach to extract key information from a given PDF document.

Step 1 - Pre-Processing

Each document undergoes an initial pre-processing stage to remove noise and optimize it for accurate classification.

This step includes textbox- and image-extraction. The textbox-position and size, textsize and the text itself are extracted. Any images inside the PDF are photometrically cleaned, geometrically normalized and denoised. The contrast and illumination are adapted. Then the image gets grayscaled, resized and morphological operations are applied.

Step 2 - Data Extraction

To deliver the most accurate results, each document is intelligently processed using a cutting-edge mixture-of-experts approach. Specialized models, ranging from machine learning and natural language processing to OCR and large language models, are strategically applied to specific tasks for maximum effectiveness.

For instance, determining whether a document includes a signature is handled by a custom-trained computer vision model designed for high precision.

Step 3 - Post-Processing

In the post-processing phase, extracted information is verified, refined, and consolidated into a clean, standardized format.

This step ensures consistency across outputs by validating key data points, resolving ambiguities, and applying business logic as needed.

The result is structured, reliable information—ready for seamless integration into downstream systems or workflows.

Input File Formats

Document Intelligence supports the following input document file types:

  • PDF (up to 15MB, unlimited pages)

Pricing

Pricing is based on expected volume. Please contact us for a quote.

Info

Clients are NOT charged for:

  • Failed requests (e.g. invalid API key, invalid file type, etc.)
  • Failed data processing (e.g. documents that are too large, documents that are not PDFs, etc.)
  • Failed data extraction
  • Documents classified as "other"

Supported Document Types

The Document Intelligence API currently supports the following document types:

Document TypeDescription
certificate_of_incorporationLegal documents establishing the creation of a corporation
certificate_of_registrationOfficial registration certificates for businesses and entities
change_of_addressDocuments recording official address changes for companies
change_of_nameDocuments recording legal company name changes
commercial_registerCommercial registry extracts and business registrations
confirmation_statementAnnual statements confirming company details remain current
director_appointmentDocuments recording the appointment of new company directors
director_resignationDocuments recording the resignation of company directors
dissolvement_of_lpDocuments related to the dissolution of limited partnerships
financial_statementsCompany financial reports, balance sheets and profit/loss statements
loan_agreementFinancial loan contracts and credit agreements
register_of_shareholdersRecords of company shareholders and their holdings
rental_agreementProperty rental and lease agreements
otherDocuments not matching any of the above categories

Preview Stage Document Types

The following document types are currently in preview.

Document TypeDescription
bank_statementSummary of financial transactions occurring within a given period on a bank account
birth_certificateOfficial record documenting the birth of a person
employment_contractLegal agreement between an employer and an employee detailing terms of employment
government_idOfficial government-issued identification documents (e.g., ID card, driver's license)
invoiceCommercial document itemizing a transaction between a buyer and a seller
passportOfficial travel document issued by a government, verifying identity and nationality
real_estate_purchase_agreementContract for the sale and purchase of real property
share_purchase_agreementContract for the sale and purchase of company shares
shareholder_agreementAgreement among shareholders regarding company operations and share ownership rights
utility_billBill for essential services like electricity, water, or gas

Supported Document Metadata

Info

Document metadata is currently experimental and subject to change.

The Document Intelligence API currently supports the following document type agnostic metadata:

Metadata FieldDescription
companyNameCompany name detected in document (if available)
companyRegistrationNumberCompany registration number (if available)
documentDateDocument date in YYYY-MM-DD format (if available)
hasSignatureWhether a handwritten signature was detected (if available)

Endpoints

MethodEndpoint
POSThttps://api.guardos.ai/api/v1/document-intelligence

POST: /api/v1/document-intelligence

Request Headers

interface DocumentIntelligenceRequest {
	'Content-Type': 'multipart/form-data'
	'x-api-key': string
}
Header NameTypeDescription
Content-Typestringmultipart/form-data
x-api-keystringYour API key

Request Body

The request should be a multipart/form-data with a single file field containing a PDF document.

Field NameTypeDescription
fileFilePDF file to analyze (max 15MB)

Success Response

interface DocumentIntelligenceResponse {
	document: {
		type: 'commercial_register' | 'rental_agreement' | 'loan_agreement' | ... | 'other'
		metadata: {
			companyName: string | null
			companyRegistrationNumber: string | null
			documentDate: string | null // Format: YYYY-MM-DD
			hasSignature: boolean
		}
		pages: number // Number of pages in the document
	}
}
Field NameTypeDescription
document.typestringType of document detected
document.metadata.companyNamestringCompany name detected in document (if available)
document.metadata.companyRegistrationNumberstringCompany registration number (if available)
document.metadata.documentDatestringDocument date in YYYY-MM-DD format (if available)
document.metadata.hasSignaturebooleanWhether a handwritten signature was detected
pagesnumberNumber of pages in the processed document

Error Response

interface DocumentIntelligenceError {
	error: string
	details?: string
}
Status CodeDescription
400Bad Request - No file or invalid file type uploaded
401Unauthorized - Invalid or missing API key
429Too Many Requests - Rate limit exceeded
500Server Error - Processing error

Examples

Example Request

curl
curl -X POST https://api.guardos.ai/api/v1/document-intelligence \
  -H "x-api-key: YOUR_API_KEY" \
  -F "file=@document.pdf"
JavaScript (Browser)
// Using a file input element: <input type="file" id="pdfFile" accept="application/pdf">
const fileInput = document.getElementById('pdfFile');
const file = fileInput.files[0];

const form = new FormData();
form.append('file', file);

const response = await fetch('https://api.guardos.ai/api/v1/document-intelligence', {
  method: 'POST',
  headers: {
    'x-api-key': 'YOUR_API_KEY'
  },
  body: form
});

const result = await response.json();
console.log(result);
Node.js
import fetch from 'node-fetch';
import { createReadStream } from 'fs';
import FormData from 'form-data';

async function analyzeDocument() {
  const form = new FormData();
  form.append('file', createReadStream('document.pdf'));
  
  const response = await fetch('https://api.guardos.ai/api/v1/document-intelligence', {
    method: 'POST',
    headers: {
      'x-api-key': 'YOUR_API_KEY',
      ...form.getHeaders()
    },
    body: form
  });
  
  const result = await response.json();
  console.log(result);
}

analyzeDocument();
Python
import requests

url = 'https://api.guardos.ai/api/v1/document-intelligence'
headers = {
    'x-api-key': 'YOUR_API_KEY'
}
files = {
    'file': ('document.pdf', open('document.pdf', 'rb'), 'application/pdf')
}

response = requests.post(url, headers=headers, files=files)
print(response.json())

Example Response

{
	"document": {
		"type": "commercial_register",
		"metadata": {
			"companyName": "Acme Corporation AG",
			"companyRegistrationNumber": "CHE-123.456.789",
			"documentDate": "2023-05-15",
			"hasSignature": true
		}
	},
	"pages": 3
}
Prev
Private Endpoint