March 17, 2025 ALL, AUTOMATE WORKFLOW EVERYWHERE, BUSINESS, DATA ANALYSIS, ENGLISH ARTICLES, MY PROJECTS, PYTHON 0

Building an AI-Powered Web Scraper with Ollama & ChromaDB

A Step-by-Step Breakdown of a Python Script for Intelligent Web Scraping and Q&A

In this blog post, we’ll analyze a Python script that automates web scraping, text processing, and AI-powered question answering using Ollama and ChromaDB. This script is a powerful tool for extracting website content and enabling users to ask questions about the extracted data interactively.

View the script: RAG Web Scraper 6

What Does This Script Do?

This Python script:

Detects and Stops Running Ollama Processes ✅
Restarts Ollama to Ensure a Fresh AI Model is Running
Scrapes Webpage Content Dynamically
Extracts and Displays the Webpage Title
Processes and Stores Text Data in ChromaDB for Fast Retrieval ️
Uses Ollama’s AI Model to Answer User Questions
Allows Users to Change URLs & Scrape Different Pages Without Restarting

Let’s break down how each part of the script works.

1️⃣ System Information & Ollama Process Management

The script starts by printing system information and managing Ollama processes to avoid conflicts.

Checking System Information

Before starting, the script prints:

Operating System Name
Platform Details
CPU Core Count
Available Memory Details

Stopping Any Running Ollama Processes

The script checks for existing Ollama processes and terminates them to ensure a clean restart:

Restarting Ollama

Once all running instances are stopped, the script restarts Ollama to serve AI models.

2️⃣ Web Scraping & Dynamic Content Extraction

The script asks the user for a URL and extracts content from the page.

Asking for the Webpage URL

Extracting the Page Title

The script looks for the

tag and extracts the article title.

Extracting the Article Content

The script then extracts the main article content inside

Storing Extracted Text as a Document

To process the extracted text efficiently, it is wrapped in a Document object (from LangChain).

3️⃣ Text Processing & ChromaDB Storage

The extracted text is split into smaller chunks and stored in ChromaDB for fast retrieval.

✂️ Splitting the Text into Chunks

To enable efficient question-answering, the script splits the text into small, searchable chunks.

Storing Data in ChromaDB

The text chunks are converted into vector embeddings and stored in ChromaDB.

from langchain_ollama import OllamaEmbeddings
from langchain_community.vectorstores import Chroma

local_embeddings = OllamaEmbeddings(model=”all-minilm”)
vectorstore = Chroma.from_documents(documents=all_splits, embedding=local_embeddings, persist_directory=”chroma_db”)

4️⃣ AI-Powered Question Answering

Once the content is processed and stored, the script allows the user to ask questions about the article.

Interactive Q&A Loop

Users can ask multiple questions without restarting the script.

Retrieving Relevant Information

When a question is asked, ChromaDB retrieves the most relevant text chunks.

Answering the Question with AI

The retrieved text is sent to the Ollama model for generating a response.

Displaying the Answer

5️⃣ Additional Features

Changing the URL Without Restarting

Users can type "change url" to scrape a new webpage dynamically without restarting the script.

Key Takeaways

✔️ Automates web scraping and AI-powered Q&A
✔️ Handles dynamic URL changes efficiently
✔️ Uses ChromaDB for fast text retrieval
✔️ Manages system processes, ensuring Ollama runs smoothly
✔️ Provides a continuous chatbot-like experience

Final Thoughts

This Python script is a powerful AI-driven tool that combines web scraping, vector search, and AI question-answering into one seamless workflow. It can be used for automated research, knowledge extraction, and real-time information retrieval.

Would you like to integrate this into your own projects? Send me an email: info@mindstorm.gr !

About Us

Contact Info

Building an AI-Powered Web Scraper with Ollama & ChromaDB

Building an AI-Powered Web Scraper with Ollama & ChromaDB

A Step-by-Step Breakdown of a Python Script for Intelligent Web Scraping and Q&A

What Does This Script Do?

1️⃣ System Information & Ollama Process Management

Checking System Information

Stopping Any Running Ollama Processes

Restarting Ollama

2️⃣ Web Scraping & Dynamic Content Extraction

Asking for the Webpage URL

Extracting the Page Title

tag and extracts the article title.

Extracting the Article Content

Storing Extracted Text as a Document

3️⃣ Text Processing & ChromaDB Storage

✂️ Splitting the Text into Chunks

Storing Data in ChromaDB

4️⃣ AI-Powered Question Answering

Interactive Q&A Loop

Retrieving Relevant Information

Answering the Question with AI

Displaying the Answer

5️⃣ Additional Features

Changing the URL Without Restarting

Key Takeaways

Final Thoughts

30,055

Useful Links

Office Hours

©2026, Jobspace. All Rights Reserved.