Home

Build a "Chat with PDF/Doc/TextFiles" App using Spring AI and PostgreSQL And Docker

In this tutorial, we will build a RAG (Retrieval Augmented Generation) application. This allows users to upload their own documents (PDFs, Word Docs, Text files) and ask the AI questions based specifically on those files.

We will use Google Gemini (for cheap/free AI models), PostgreSQL (to store data), and Spring AI.

We will build a pipeline that:

  1. Extracts text from PDFs and Office documents using Apache Tika.
  2. Transforms the text into mathematical vectors (Embeddings).
  3. Loads them into a PostgreSQL (pgvector) database.
  4. Retrieves the data when a user asks a question via a REST API.

Tools and Technologies

  • Java 21
  • Spring Boot 3.4.x
  • Spring AI 1.0.0+
  • Database: PostgreSQL with pgvector extension (via Docker)
  • Docker: We use postgresSQL official  image.
  • AI Model: Google Gemini Flash (Fast & Cost-effective)
  • Document Reader: Apache Tika

Development Steps

  • Step 1: Project Configuration (Gradle)
  • Step 2: Database Setup (Docker)
  • Step 3: Application Properties
  • Step 4: The ETL Service (Ingestion Pipeline)
  • Step 5: The RAG Controller (Chat with Data)
  • Step 6: How to Test This?

1. Step 1: Project Configuration (Gradle)

We need to add the Tika reader (for file parsing) and PGVector (for the database).

1. File: build.gradle

Code Example

2. Step 2: Database Setup (Docker)

We need a specific version of PostgreSQL that supports vector math. We will use Docker Compose.

1. File: docker-compose.yml

Make these files in root folder same as your build.gradle file

make init.sql file (empty file)

docker-compose.yml

services:
  db:
    image: pgvector/pgvector:pg16
    container_name: planet-learning-db
    restart: unless-stopped
    ports:
      - "5432:5432"
    environment:
      - POSTGRES_DB=vectordb
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=password
    volumes:
      - postgres_data:/var/lib/postgresql/data
      - ./init.sql:/docker-entrypoint-initdb.d/init.sql
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5

volumes:
  postgres_data:
Run this command to start the DB:
docker-compose up -d

3. Step 3: Application Properties

Configure your Google API key and database connection. Spring AI will automatically try to create the vector table for you.

1. File: src/main/resources/application.properties

Code Example

4. Step 4: The ETL Service (Ingestion Pipeline)

This service reads your files (PDFs, Text, etc.), splits them into small pieces (chunks), and saves them into the database.

Step Image

1. File: src/main/java/com/planetlearning/rag/service/EtlService.java

Code Example

5. Step 5: The RAG Controller (Chat with Data)

This is where the magic happens. We use the QuestionAnswerAdvisor. This component automatically searches your database for relevant info and adds it to the prompt before sending it to Google Gemini.

1. File: src/main/java/com/planetlearning/rag/controller/RagController.java

Code Example

6. Step6: How to Test This?

How to Test This?

  1. Prepare Files: Create a folder src/main/resources/docs and put a PDF inside (e.g., company_policy.pdf).
  2. Start DB: Run docker-compose up -d.
  3. Run App: Start the Spring Boot Application.
  4. Ingest Data:
    • Open Postman.
    • POST http://localhost:8080/api/v1/rag/ingest
    • Result: "Documents parsed and stored..." (Check your logs to see chunk counts).
  5. Ask a Question:
    • GET http://localhost:8080/api/v1/rag/chat?query=What is the policy on remote work?
    • Result: The AI will answer based specifically on the PDF you uploaded.