A Code Implementation to Use Ollama through Google Colab and Building a Local RAG Pipeline on Using DeepSeek-R1 1.5B through Ollama, LangChain, FAISS, and ChromaDB for Q&A

0 4 minutes read

1744090671 A Code Implementation to Use Ollama through Google Colab and.png

In this tutorial, we will build a fully retrieval pipeline using open source tools that work smoothly on Google Colab. First, we will look at how to prepare OLLAMA and use models through Colab. Merging the Deepseek-R1 1.5B language model provided by ollama, standard coordination for Langchain, and high-performance Chromadb vector store allows users to inquire about actual time information extracted from PDF PDF. Through a collection of local linguistic typical thinking and the recovery of real data from PDF documents, the pipeline shows a strong, special and effective alternative.

!pip install colab-xterm
%load_ext colabxterm

We use the Colab-XTERM supplement to enable the peripheral access directly within the Colab environment. By installing it using! PIP Install COLLAB and download it via %_Ext Colabxterm, users can open an interactive station window inside the colum, making orders such as Llama service or monitor local operations.

% Xterm Magic is used after loading the Collab supplement to launch an interactive terminal within the Colab Notebook. This allows users to implement SHELL orders in actual time, just like the regular station, which makes them especially useful for running background services such as Llama or file management or correction of operations at the system level without leaving the notebook.

Here, we install ollama using Curl | U.

Then, we start OLLAMA using OLLAMA.

Finally, we download Deepseek-R1: 1.5B via OLLAMA locally which can be used to build the rag pipeline.

!pip install langchain langchain-community sentence-transformers chromadb faiss-cpu

To prepare the basic components of the RAG pipeline, we install basic libraries, including Langchain, Langchain-Community, Transformers Transform, Chromadb and fais-CPU. These packets allow documents, inclusion, and storing vectors, and retrieval functions required to build an effective local rag system and standard.

from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.llms import Ollama
from langchain.chains import RetrievalQA
from google.colab import files
import os
from langchain_core.prompts import ChatPromptTemplate
from langchain_ollama.llms import OllamaLLM

We import the main units of Langchain-Community and Langchain-ALLAMA libraries to deal with PDF and divide the text and include generating and storing vectors with Chroma and LLM integration via ollama. It also includes the access to a Colab file and guided molds, allowing a smooth flow of swallowing the document to inquiring about the response using a locally hosted model.

print("Please upload your PDF file...")
uploaded = files.upload()


file_path = list(uploaded.keys())[0]
print(f"File '{file_path}' successfully uploaded.")


if not file_path.lower().endswith('.pdf'):
    print("Warning: Uploaded file is not a PDF. This may cause issues.")

To allow users to add their knowledge sources, we demand that you download PDF using Google.colab.files.upload (). It verifies the type of file downloaded and provides notes, ensuring only PDF processing for more inclusion and retrieval.

!pip install pypdf
import pypdf
loader = PyPDFLoader(file_path)
documents = loader.load()
print(f"Successfully loaded {len(documents)} pages from PDF")

To extract the content from PDF, we install the PYPDF library and use the Pypdfloader from Langchain to download the document. This process converts each PDF page into organized format, which allows estuary tasks such as dividing the text and inclusion.

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
)
chunks = text_splitter.split_documents(documents)
print(f"Split documents into {len(chunks)} chunks")

The loaded PDF is divided into controlled parts using RecursivecharactertextSplitter, with each piece of 1000 letters and 200 charge overlaps. This guarantees the context better retention through the pieces, which improves the importance of the clips that were recovered while answering the questions.

embeddings = HuggingFaceEmbeddings(
    model_name="all-MiniLM-L6-v2",
    model_kwargs={'device': 'cpu'}
)


persist_directory = "./chroma_db"


vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory=persist_directory
)


vectorstore.persist()
print(f"Vector store created and persisted to {persist_directory}")

The text parts are included using the All-Minilm-L6-V2 from wholesale transformers, and works on the CPU to enable semantic research. Then these implications are stored in the continuous Chromadb vector store, allowing effective recovery to the similarities through the sessions.

llm = OllamaLLM(model="deepseek-r1:1.5b")
retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 3}  
)


qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",  
    retriever=retriever,
    return_source_documents=True  
)


print("RAG pipeline created successfully!")

The RAG pipeline is completed by connecting the local Deepseek-R1 model (via ollamallm) with the document to the Chroma. Using the Langchain Retrievalqa series with “Things” strategy, the model reclaims the best 3 parts related to inquiries and create a coacher answers for the context, while completing the local RAT setting.

def query_rag(question):
    result = qa_chain({"query": question})
   
    print("nQuestion:", question)
    print("nAnswer:", result["result"])
   
    print("nSources:")
    for i, doc in enumerate(result["source_documents"]):
        print(f"Source {i+1}:n{doc.page_content[:200]}...n")
   
    return result


question = "What is the main topic of this document?"  
result = query_rag(question)

To test the RAG pipeline, the query_rag function takes the user question, recover the relevant context using the recovered, and create an answer using LLM. It also displays higher source documents, providing transparency and tracking to the form of the form.

In conclusion, this tutorial combines ollama, Chromadb retrieval, synchronization capabilities for Langchain, and Deepseek-R1 thinking via OLLAMA. I have offered the construction of a light -weight, but strong -minded rag system for the free level of Google Colab. The solution enables users to ask questions on the basis of updated content of the uploaded documents, with the answers created through a local LLM. This architecture provides a basis for building an artificial, developed, customized and privacy aid without incurring cloud costs or transverse performance.

Here is Clap notebook. Also, do not forget to follow us twitter And join us Telegram channel and LinkedIn GrOup. Don’t forget to join 85k+ ml subreddit.

🔥 [Register Now] The virtual Minicon Conference on Open Source AI: Free Registration + attendance Certificate + 3 hours short (April 12, 9 am- 12 pm Pacific time) [Sponsored]

Asif Razzaq is the CEO of Marktechpost Media Inc .. As a pioneer and vision engineer, ASIF is committed to harnessing the potential of artificial intelligence for social goodness. His last endeavor is to launch the artificial intelligence platform, Marktechpost, which highlights its in -depth coverage of machine learning and deep learning news, which is technically sound and can be easily understood by a wide audience. The platform is proud of more than 2 million monthly views, which shows its popularity among the masses.