stefan.onl
  • Work
  • Case Studies

Case Studies

Detailed builds behind production AI systems.

Architecture, infrastructure, model evaluation, and implementation notes from systems built around messy real-world sources.

01

Document Intelligence / OCR / Vision-Language Models

Self-Hosted Multi-Model OCR/VL Document Preprocessing Pipeline

A self-hosted OCR/VL preprocessing pipeline for turning messy technical documents into structured, retrieval-ready data for downstream AI systems.

OCR/VLDocument preprocessingSelf-hosted GPU infrastructureCUDATechnical documentsRetrieval preparation

02

Data Ingestion / Web Extraction / AI-Assisted Structuring

4,000-Link Web Data Extraction Pipeline

A large-scale web data extraction pipeline that filtered roughly 4,000 candidate links, processed valid targets, used discovery fallbacks, and produced structured data for downstream systems.

Web scrapingData extractionAI-assisted extractionSchema validationDeep crawlingJSON pipelines
stefan.onl

© 2026 Stefan Matić. All rights reserved.