26th EAAAI (EANN) 2025, 26 - 29 June 2025, Limassol, Cyprus

A Norm-Chatbot: Local LLM-Vision with Vision-based RAG for Complex Production Documents and Task-Specific Responses

Michael Sahl Lystbæk, Ulrik Sahl Lystbæk

Abstract:

  Large language models (LLMs) have a significant impact on the development of applied solutions for industrial purposes, with their general knowledge capabilities supporting more autonomous work processes. The weights learned from these extensively trained models provide the backbone for fine-tuning and retrieval-augmented generation (RAG) of LLMs, allowing them to address specific work-task operations in unique enterprise contexts. In this study, a locally running vision-based LLM framework is proposed, including custom vision-based RAG and a business-oriented evaluation method for handling sensitive core business documents. This approach enhances data privacy and prevents leakage of sensitive data. The LLM adapts core enterprise knowledge from in-house data spaces using affordable computational resources. Our proposed norm-chatbot for enterprise LLM data processing runs on a commercial Nvidia RTX 4070Ti GPU processor in an offline environment. A RAG vision technique is applied to handle complex document layouts by transforming them into image content, addressing the challenges that traditional text-based indexing and RAG methods face when documents consist of mixed data types such as text, tables, images illustrations, graphs, etc. The proposed framework uses RAG data processing with a PDF-to-image converter on top of an LLM vision model that embeds the images with textual information. The framework combines ColPali RAG tested with the LLAMA 3.2 11b Vision and LLAVA 1.6 models for highly semantic document processing, specifically for both descriptive and question answering (Q&A) tasks. From the study we found a clear trade-off between fast content generation and precision. However, the proposed framework shows promising results in generating contextual meaningful content in two different knowledge generation tasks for processing custom datasets of enterprise norms and a standard database to support tool development and operations in a Danish tool manufacturing company with a global customer segment.  

*** Title, author list and abstract as submitted during Camera-Ready version delivery. Small changes that may have occurred during processing by Springer may not appear in this window.