Building an IBM AIX Expert Chatbot using RAG and FAISS

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • MyrinNew
    Senior Member
    • Feb 2024
    • 5175

    #1

    Building an IBM AIX Expert Chatbot using RAG and FAISS

    Introduction

    In this guide, we will build a RAG-based chatbot that acts as an IBM AIX expert who will answer all the our AIX related queries. This chatbot will retrieve information from IBM AIX documentation, process it using FAISS for vector search, and generate responses using OpenAI's LLM. We will also cover automating AIX documentation collection and deploying the chatbot using Docker.





    πŸ“Œ System Design Overview

    Architecture

    1. Data Collection Layer: Scrapes IBM AIX documentation and extracts text from PDFs.
    2. Data Processing Layer: Splits text and converts it into vector embeddings using FAISS.
    3. Retrieval-Augmented Generation (RAG) Layer: Uses a vector database to fetch relevant documents and feed them to OpenAI’s LLM.
    4. User Interface: A Streamlit-based UI for chatbot interactions.
    5. Deployment: Docker containerization for easy deployment and scalability.





    πŸ“Œ Step 1: Setting Up the Project

    Install Required Dependencies

    Ensure you have Python 3.9+ installed, then install the required libraries:






    pip install streamlit langchain openai faiss-cpu bs4 requests PyMuPDF







    Create a new Python script Aix_Rag_Chatbot.py and add the following code.





    πŸ“Œ Step 2: Collecting IBM AIX Documentation

    We will automate the process of gathering IBM AIX documentation from IBM Knowledge Center, IBM Redbooks (PDFs), and online resources.


    πŸ”Ή Web Scraping IBM Docs





    import requests
    from bs4 import BeautifulSoup

    def scrape_aix_docs(url):
    """
    Fetches IBM AIX documentation from a given URL and extracts text content.
    """
    response = requests.get(url)
    soup = BeautifulSoup(response.text, "html.parser")
    text_content = "\n".join([p.get_text() for p in soup.find_all("p")])
    with open("aix_docs.txt", "w", encoding="utf-8") as file:
    file.write(text_content)
    print("AIX documentation scraped and saved!")







    Call this function with:






    scrape_aix_docs("https://www.ibm.com/docs/en/aix/7.3?topic=commands")







    ⏳ Estimated Time: 10-20 seconds


    πŸ”Ή Extracting Text from AIX PDFs





    import fitz # PyMuPDF

    def extract_text_from_pdf(pdf_path):
    """
    Extracts text from a given PDF file and appends it to aix_docs.txt.
    """
    output_text = ""
    with fitz.open(pdf_path) as doc:
    for page in doc:
    output_text += page.get_text()
    with open("aix_docs.txt", "a", encoding="utf-8") as file:
    file.write(output_text)
    print("Extracted text from PDF saved!")







    Call this function with:






    extract_text_from_pdf("aix_redbook.pdf")







    ⏳ Estimated Time: 20-60 seconds





    πŸ“Œ Step 3: Creating a FAISS Vector Database

    We will now split the extracted text, convert it into embeddings, and store it using FAISS.






    from langchain.text_splitter import CharacterTextSplitter
    from langchain.embeddings import OpenAIEmbeddings
    from langchain.vectorstores import FAISS
    from langchain.document_loaders import TextLoader

    def build_vector_db():
    """
    Processes AIX documentation, splits it into smaller chunks,
    generates embeddings, and stores them in FAISS vector database.
    """
    with open("aix_docs.txt", "r", encoding="utf-8") as file:
    aix_text = file.read()
    text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
    split_texts = text_splitter.split_text(aix_text)
    with open("aix_docs_cleaned.txt", "w", encoding="utf-8") as file:
    for chunk in split_texts:
    file.write(chunk + "\n\n")
    loader = TextLoader("aix_docs_cleaned.txt")
    documents = loader.load()
    embedding_model = OpenAIEmbeddings()
    vector_db = FAISS.from_documents(documents, embedding_model)
    vector_db.save_local("aix_vector_db")
    print("Vector database built successfully!")







    Call this function once:






    build_vector_db()







    ⏳ Estimated Time: 1-2 minutes





    πŸ“Œ Step 4: Implementing the AIX Expert Chatbot

    Now, let’s create a Streamlit UI for interacting with our chatbot.






    import streamlit as st
    from langchain.llms import OpenAI
    from langchain.chains import RetrievalQA

    # Load FAISS vector DB
    embedding_model = OpenAIEmbeddings()
    vector_db = FAISS.load_local("aix_vector_db", embedding_model)
    retriever = vector_db.as_retriever()

    # Setup RAG Chain
    llm = OpenAI()
    qa_chain = RetrievalQA.from_chain_type(llm=llm, retriever=retriever)

    # Streamlit UI
    st.title("IBM AIX Expert Chatbot")
    st.write("Ask me anything about IBM AIX!")

    user_input = st.text_input("Enter your question:")
    if st.button("Submit"):
    if user_input:
    response = qa_chain.run(user_input)
    st.write(response)
    else:
    st.write("Please enter a question.")







    ⏳ Estimated Time: 30-60 seconds


    Run the chatbot with:






    streamlit run Aix_Rag_Chatbot.py










    πŸ“Œ Step 5: Deploying with Docker

    πŸ”Ή Build & Run Docker Container





    # Build the Docker image
    docker build -t aix-rag-chatbot .

    # Run the container
    docker run -p 8501:8501 aix-rag-chatbot







    Access your chatbot at http://localhost:8501





    Now the chatbot is ready for action, deploy it on your favourite on-prem/cloud environment and scale it as per need.




    More...
Working...