A Coding Implementation to Build a Conversational Research Assistant with FAISS, Langchain, Pypdf, and TinyLlama-1.1B-Chat-v1.0

0 3 minutes read

1742730517 A Coding Implementation to Build a Conversational Research Assistant with.png

Conversation research assistants with RAG leaders treat the restrictions imposed on traditional language models by combining them and information retrieval systems. The system searches through specific knowledge rules, recalls relevant information, and displays a conversation with appropriate martyrdom. This approach reduces hallucinations, takes care of the field, and responsibilities in the recovered text. In this tutorial, we will show the construction of such a assistant using a Tinyllama-1B-Cat-V1.0 from Huging Face and FAISS from Meta and the Langchain framework to answer questions about scientific papers.

First, let’s fix the necessary libraries:

!pip install langchain-community langchain pypdf sentence-transformers faiss-cpu transformers accelerate einops

Now, let’s import the required libraries:

import os
import torch
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.chains import ConversationalRetrievalChain
from langchain_community.llms import HuggingFacePipeline
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import pandas as pd 
from IPython.display import display, Markdown

We will install the drive to save the paper in another step:

from google.colab import drive
drive.mount('/content/drive')
print("Google Drive mounted")

For our knowledge, we will use PDF documents for scientific papers. Let’s create a job to download and address these documents:

def load_documents(pdf_folder_path):
    documents = []


    if not pdf_folder_path:
        print("Downloading a sample paper...")
        !wget -q  -O attention.pdf
        pdf_docs = ["attention.pdf"]
    else:
        pdf_docs = [os.path.join(pdf_folder_path, f) for f in os.listdir(pdf_folder_path)
                   if f.endswith('.pdf')]


    print(f"Found {len(pdf_docs)} PDF documents")


    for pdf_path in pdf_docs:
        try:
            loader = PyPDFLoader(pdf_path)
            documents.extend(loader.load())
            print(f"Loaded: {pdf_path}")
        except Exception as e:
            print(f"Error loading {pdf_path}: {e}")


    return documents




documents = load_documents("")

Next, we need to divide these documents into smaller parts for effective retrieval:

def split_documents(documents):
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=200,
        length_function=len,
    )
    chunks = text_splitter.split_documents(documents)
    print(f"Split {len(documents)} documents into {len(chunks)} chunks")
    return chunks


chunks = split_documents(documents)

We will use wholesale transformers to create veil to cut our documents:

def create_vector_store(chunks):
    print("Loading embedding model...")
    embedding_model = HuggingFaceEmbeddings(
        model_name="sentence-transformers/all-MiniLM-L6-v2",
        model_kwargs={'device': 'cuda' if torch.cuda.is_available() else 'cpu'}
    )


    print("Creating vector store...")
    vector_store = FAISS.from_documents(chunks, embedding_model)
    print("Vector store created successfully!")
    return vector_store


vector_store = create_vector_store(chunks)

Now, let’s carry an open source language model to create responses. We will use Tinyllama, which is small enough to run on a colum but it is still strong enough for our mission:

def load_language_model():
    print("Loading language model...")
    model_id = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"


    try:
        import subprocess
        print("Installing/updating bitsandbytes...")
        subprocess.check_call(["pip", "install", "-U", "bitsandbytes"])
        print("Successfully installed/updated bitsandbytes")
    except:
        print("Could not update bitsandbytes, will proceed without 8-bit quantization")


    from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, pipeline
    import torch


    tokenizer = AutoTokenizer.from_pretrained(model_id)


    if torch.cuda.is_available():
        try:
            quantization_config = BitsAndBytesConfig(
                load_in_8bit=True,
                llm_int8_threshold=6.0,
                llm_int8_has_fp16_weight=False
            )


            model = AutoModelForCausalLM.from_pretrained(
                model_id,
                torch_dtype=torch.bfloat16,
                device_map="auto",
                quantization_config=quantization_config
            )
            print("Model loaded with 8-bit quantization")
        except Exception as e:
            print(f"Error with quantization: {e}")
            print("Falling back to standard model loading without quantization")
            model = AutoModelForCausalLM.from_pretrained(
                model_id,
                torch_dtype=torch.bfloat16,
                device_map="auto"
            )
    else:
        model = AutoModelForCausalLM.from_pretrained(
            model_id,
            torch_dtype=torch.float32,
            device_map="auto"
        )


    pipe = pipeline(
        "text-generation",
        model=model,
        tokenizer=tokenizer,
        max_length=2048,
        temperature=0.2,
        top_p=0.95,
        repetition_penalty=1.2,
        return_full_text=False
    )


    from langchain_community.llms import HuggingFacePipeline
    llm = HuggingFacePipeline(pipeline=pipe)
    print("Language model loaded successfully!")
    return llm


llm = load_language_model()

Now, let’s build our assistant by combining the vector store and the language model:

def format_research_assistant_output(query, response, sources):
    output = f"n{'=' * 50}n"
    output += f"USER QUERY: {query}n"
    output += f"{'-' * 50}nn"
    output += f"ASSISTANT RESPONSE:n{response}nn"
    output += f"{'-' * 50}n"
    output += f"SOURCES REFERENCED:nn"


    for i, doc in enumerate(sources):
        output += f"Source #{i+1}:n"
        content_preview = doc.page_content[:200] + "..." if len(doc.page_content) > 200 else doc.page_content
        wrapped_content = textwrap.fill(content_preview, width=80)
        output += f"{wrapped_content}nn"


    output += f"{'=' * 50}n"
    return output


import textwrap


research_assistant = create_research_assistant(vector_store, llm)


test_queries = [
    "What is the key idea behind the Transformer model?",
    "Explain self-attention mechanism in simple terms.",
    "Who are the authors of the paper?",
    "What are the main advantages of using attention mechanisms?"
]


for query in test_queries:
    response, sources = research_assistant(query, return_sources=True)
    formatted_output = format_research_assistant_output(query, response, sources)
    print(formatted_output)

In this tutorial, we have built conversation research assistant using a generation for retrieval with open source models. RAG enhances language models by integrating documents retrieval, reducing hallucinations, and ensuring the accuracy of the special field. The guide extends through the preparation of the environment, processing scientific papers, creating deviant implications using fais and wholesale transformers, and integrating an open source language model such as Tinyllama. The assistant restores the cut documents and generates responses. This implementation allows users to inquire about the base of knowledge, which makes research with energy intelligence more reliable and effective to answer the questions of the field.

Here is Clap notebook. Also, do not forget to follow us twitter And join us Telegram channel and LinkedIn GrOup. Don’t forget to join 85k+ ml subreddit.

Asif Razzaq is the CEO of Marktechpost Media Inc .. As a pioneer and vision engineer, ASIF is committed to harnessing the potential of artificial intelligence for social goodness. His last endeavor is to launch the artificial intelligence platform, Marktechpost, which highlights its in -depth coverage of machine learning and deep learning news, which is technically intact and can be easily understood by a wide audience. The platform is proud of more than 2 million monthly views, which shows its popularity among the masses.

2025-03-23 05:01:00