What kind of businesses do you help?

I help agencies and service businesses that already have clients, repeated work, and growing demand, but whose internal setup is becoming too messy to support the next stage of growth.

What can the system help with?

It can help with client context, research, documents, reports, company knowledge, planning, and execution tools that support the work your team already does.

Do you have a software development background?

Yes. Before focusing on AI systems and internal business tools, I worked across Python, JS/TS, React, Assembly, C++, Java, Spring Boot and SQL-backed applications using PostgreSQL and MySQL. That background helps me build beyond simple AI wrappers and think in terms of architecture, data models, processes and maintainable systems.

What kind of internal systems do you build?

I build custom tools around the internal work that limits capacity: research, client context, documents, reports, planning, and delivery support. We start with the bottleneck making it harder for your team to handle more client work, then shape the system around it.

Who is the best fit for your work?

The best fit is usually an agency, consultancy, recruitment firm, dev studio or service business that already has clients, but whose internal process is becoming scattered across tools, documents, spreadsheets and messages.

What does a first project usually look like?

Usually one focused internal problem: scattered client context, repeated research, document preparation, reporting, planning, or execution support. We scope one useful first version, build it as real software, put it in the team's hands, and extend from there if it works.

Case Studies

Document Intelligence / OCR / Vision-Language Models

Scanned Document Processing

Self-Hosted Multi-Model OCR/VL Document Preprocessing Pipeline

A self-hosted OCR/VL preprocessing pipeline for turning messy technical documents into structured, retrieval-ready data for downstream AI systems.

OCR/VLDocument preprocessingSelf-hosted GPU infrastructureCUDATechnical documentsRetrieval preparation

Overview

I built a self-hosted OCR/VL document preprocessing pipeline for turning messy technical documents into structured, retrieval-ready data for AI systems.

The project focused on the ingestion layer: parsing, cleaning, structuring, and preparing complex documents before they could be used by downstream retrieval and AI workflows.

This was not a simple PDF upload flow. The source files were technical documents where no two documents followed the same structure. Some pages were text-heavy, some were table-heavy, some depended on visual layout, and some required OCR/VL interpretation because ordinary PDF text extraction lost too much meaning.

I researched, tested, and evaluated multiple OCR and vision-language model approaches, then designed a self-hosted architecture that combined three different models because no single OCR/VL model handled every document type well enough on its own.

The work also involved dealing with real infrastructure issues: GPU setup, NVIDIA/CUDA compatibility, CUBLAS errors, VRAM limits, memory issues, model loading problems, and speed bottlenecks.

Problem

The source files were technical documents, not clean text documents.

No two documents followed the same structure. Some contained dense paragraphs, some relied heavily on tables, some had diagrams or visual sections, some were scanned, and some lost important meaning when processed through ordinary PDF text extraction.

A single OCR model could not handle all document types well. Plain text extraction lost structure from tables, layouts, and visual sections. OCR/VL models produced different strengths and weaknesses depending on the page type.

Large PDFs and image-heavy pages created speed, memory, and GPU bottlenecks. Self-hosting the models introduced CUDA, NVIDIA, CUBLAS, dependency, and VRAM issues.

The challenge was to build a preprocessing architecture that could handle heterogeneous technical documents and produce structured outputs despite inconsistent source formats.

Technical challenges I solved

No single OCR/VL model was good enough. I tested multiple OCR and vision-language model approaches and compared their outputs on different document types. Some models were better at plain text extraction, some handled layout better, some were better for tables, some were more useful for visual or image-heavy sections, some produced cleaner markdown-style output, and some were faster but less structurally accurate.

Because of that, I designed the pipeline as a multi-model OCR/VL system instead of depending on one model. The architecture combined three models so the system could use the strengths of each one and produce better structured document outputs. The goal was not simply to extract text. The goal was to extract usable technical structure.

The documents were highly inconsistent. Some pages were mostly text, some were dominated by tables, some had multi-column layouts, some included diagrams, image regions, or visual references, some were scanned or partially scanned, and some had repeated headers, footers, numbering, and formatting noise.

This meant the pipeline had to be flexible enough to handle different page types instead of treating every page as the same extraction problem. I designed the preprocessing flow around page-level processing and structured output generation so the system could preserve more meaning from the original documents.

Self-hosting created GPU and runtime problems. I worked through NVIDIA/CUDA setup, driver compatibility, CUDA version mismatches, CUBLAS errors, model loading problems, dependency issues, VRAM limits, and runtime instability.

Speed became a serious constraint. OCR/VL pipelines can be slow, especially when processing large technical PDFs page by page. Running multiple models made the quality better, but it also introduced a performance problem.

I worked on batching where possible, tuning image resolution and DPI, caching reusable prompt templates, monitoring GPU and VRAM usage, reducing unnecessary repeated processing, improving memory handling, and testing acceleration modules where relevant.

The output had to be useful for downstream AI systems. Raw OCR text was not enough. The output needed to be structured enough to support chunking, metadata enrichment, embedding, retrieval, citations, and later evaluation.

Architecture and implementation

The architecture followed a multi-model preprocessing flow.

Raw technical documents were first converted into page-level inputs. Each page could then be processed through OCR/VL components depending on the type of content it contained: text-heavy pages, table-heavy pages, scanned pages, or visually structured pages.

I tested multiple OCR/VL models and found that no single model performed best across every document type. Because of that, I designed the architecture around combining three models into one preprocessing system.

Each model contributed where it was strongest, and the pipeline produced cleaner structured output than relying on one OCR path.

The system generated structured markdown or JSON-style outputs that preserved more of the original document meaning: headings, sections, tables, visual descriptions, and page-level context.

That output was then prepared for chunking, metadata enrichment, embedding, retrieval, and evaluation.

The infrastructure was self-hosted, so part of the architecture involved making the models run reliably on GPU hardware. I had to solve NVIDIA/CUDA compatibility problems, CUBLAS errors, model loading issues, VRAM limits, memory issues, and speed bottlenecks.

Speed was a major constraint, so I worked on acceleration strategies such as batching, DPI tuning, prompt-template caching, GPU/VRAM monitoring, memory handling, and acceleration module testing.

What I built

I built the OCR/VL preprocessing workflow for converting messy technical documents into structured AI-ready content.

The result was a preprocessing system designed for messy real-world technical documents, not clean demo PDFs.

Researching OCR/VL model options
Testing different extraction approaches
Evaluating output quality across different document types
Designing a three-model OCR/VL architecture
Self-hosting the OCR/VL stack
Setting up GPU infrastructure
Debugging NVIDIA/CUDA compatibility issues
Resolving CUBLAS and model loading problems
Handling VRAM and memory constraints
Improving speed through batching, caching, DPI tuning, and acceleration testing
Generating structured markdown or JSON-style outputs
Handling tables, visual sections, scanned pages, and inconsistent layouts
Preparing outputs for chunking, metadata, embedding, retrieval, and evaluation

System pieces

Self-hosted OCR/VL stack
Three-model OCR/VL architecture
OCR model research and comparison
Vision-language model testing
Model output evaluation
GPU VM setup
NVIDIA/CUDA configuration
CUDA compatibility debugging
CUBLAS issue resolution
Model loading debugging
VRAM and memory management
Memory issue investigation
Batch OCR processing
DPI tuning
Prompt-template caching
GPU/VRAM monitoring
Acceleration module testing
PDF preprocessing
Page-level processing
Visual document understanding
Table extraction and handling
Image/visual section description
Structured markdown output
JSON-style layout output
Chunk preparation
Metadata enrichment
Embedding preparation
Retrieval preparation
Evaluation preparation
Retries and failure handling
Logging and debugging

Why it was technically hard

This was technically hard because it combined three difficult problems at the same time.

First, the documents were inconsistent. There was no stable template, no predictable layout, and no single extraction method that worked everywhere.

Second, the model layer was imperfect. Every OCR/VL model had strengths and weaknesses, so I had to evaluate the tradeoffs and design a system that combined multiple models instead of trusting one.

Third, the infrastructure was heavy. Self-hosting OCR/VL models meant dealing with GPU setup, CUDA/NVIDIA problems, VRAM limits, memory behavior, speed bottlenecks, and runtime instability.

The system had to balance quality, speed, reliability, and output structure. That is what made it a real AI systems project instead of a basic document parser.

Why this matters

Most useful AI systems depend on the quality of their source layer.

If the documents are processed badly, the retrieval system will be bad. If the chunks are bad, the answers will be bad. If the structure is lost, the AI cannot reliably recover it later.

This project shows that I can build the infrastructure underneath serious AI systems: the part that turns messy technical documents into structured, searchable, retrieval-ready data.

It also shows that I can make practical architecture decisions under real constraints: model limitations, document inconsistency, GPU issues, speed constraints, and downstream retrieval requirements.

Put your documents to work.

Tell me what kinds of documents your business handles. I’ll identify the right way to process, structure, and use them.

Tell me about your documents