Auto-Documenting Codebases Using LLMs and Language Parsers

Introduction In this blog, I want to share a documentation approach that leverages Large Language Models (LLMs) to transform a completely undocumented or partially documented codebase into a well-documented one. By combining LLMs with language-specific parsers, we can extract function definitions and send each function, along with its surrounding “context,” to the LLM for generating documentation. While the method isn’t perfect, it provides a solid starting point for automating code documentation. ...

March 31, 2025 · 2 min · Zeeshan Khan

Extracting Structured Content from PDFs Using OCR and LLM

Introduction In this post, I want to document the steps I took to parse PDFs and extract structured output using OCR and LLM. The goal is to extract structured content from PDFs and other documents with high accuracy. The results of this experiment were quite impressive. I was able to extract structured content from various documents with a high degree of accuracy. The process was straightforward, and the extracted data was well-structured and useful. ...

August 30, 2024 · 3 min · Zeeshan Khan