LlamaIndex의 RAG pipeline에 대한 스터디

배경 및 목적

LlamaIndex 공식 문서에 있는 "Building a RAG pipeline" 이해
RAG pipeline의 stage별로 약간의 고도화 적용

참고 자료

LlamaIndex 공식 문서 : Learn > Building a RAG pipeline

활용 툴

Visual studio code(jupyter notebook)
LlamaIndex
DBeaver

실행 과정

RAG pipeline

RAG pipeline Stage별 적용 내용

Loading : SimpleDirectoryReader
Indexing & Embedding : VectorStoreIndex
Storing : chromadb
Querying
- Retrieval : VectorIndexRetriever, similarity_top_k = 2(유사도 상위 2개)
- PostProcessing
  - SimilarityPostprocessor(similarity_cotoff=0.2)
  - KeywordNodePostprocessor(required_keywords, exclude_keyword)
- Response Synthesis : get_response_synthesizer

#### 작성 코드

필요한 라이브러리와 모듈 임포트, LlamaIndex Settings, 환경 변수(api key) 로드

# 필요한 라이브러리와 모듈 임포트
from dotenv import load_dotenv
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import Settings
import chromadb

# LlamaIndex Settings
Settings.llm = OpenAI(model="gpt-4o-mini", temperature=0)
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

# .env 파일에서 환경 변수(OPENAI_API_KEY) 로드
load_dotenv()

Chromadb 관련

# Chromadb 관련 모듈 임포트
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import StorageContext

# Chromadb 클라이언트 초기화
db = chromadb.PersistentClient(path="./chroma_db")

# "quickstart" 컬렉션 생성 또는 가져오기
chroma_collection = db.get_or_create_collection("quickstart")

ChromaVectorStore 및 StorageContext 설정

vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

컬렉션에 데이터가 있는지 확인하고 작업 수행

if chroma_collection.count() == 0:
    print("Collection is empty. Indexing documents...")
    # load documents
    documents = SimpleDirectoryReader("./data").load_data()
    
    # create your index
    index = VectorStoreIndex.from_documents(
        documents, storage_context=storage_context
    )
    print("Indexing completed.")
else:
    print(f"Collection already contains {chroma_collection.count()} items. Loading existing index...")
    # load your index from stored vectors
    index = VectorStoreIndex.from_vector_store(
        vector_store, storage_context=storage_context
    )
    print("Existing index loaded.")

실행 결과

Collection already contains 22 items. Loading existing index...
Existing index loaded.

Querying Stage 필요 라이브러리 임포트

from llama_index.core import get_response_synthesizer
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.postprocessor import SimilarityPostprocessor, KeywordNodePostprocessor

Retriever 설정

retriever = VectorIndexRetriever(index=index, similarity_top_k=2)

text_qa_template 설정

from llama_index.core import PromptTemplate
text_qa_template_str = (
    "We have provided context information below. \n"
    "---------------------\n"
    "{context_str}"
    "\n---------------------\n"
    "Given this information, please answer the question: {query_str}\n Write in Korean."
)

text_qa_template = PromptTemplate(text_qa_template_str)

Response Synthesizer 설정

response_synthesizer = get_response_synthesizer(text_qa_template=text_qa_template)

query engine 설정

query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=response_synthesizer,
    # node_postprocessors=[SimilarityPostprocessor(similarity_cutoff=0.2), KeywordNodePostprocessor(required_keywords=["Combinator"], exclude_keywords=["Italy"])],
    node_postprocessors=[SimilarityPostprocessor(similarity_cutoff=0.2), KeywordNodePostprocessor(exclude_keywords=["Italy"])]
)

query 내용 설정

query = "What did the author do growing up?"

query_engine 실행

response = query_engine.query(query)

LLM 답변 확인

print(response.response)

답변 결과

저자는 성장하면서 주로 글쓰기와 프로그래밍에 집중했습니다. 대학에 가기 전, 그는 짧은 이야기를 쓰며 글쓰기 연습을 했지만, 그 이야기들은 플롯이 거의 없고 강한 감정을 가진 캐릭터들로만 구성되어 있어 좋지 않았다고 회상합니다. 프로그래밍에 대해서는 9학년 때 IBM 1401 컴퓨터를 사용하여 처음 시도했으며, 당시에는 펀치 카드로 프로그램을 작성해야 했습니다. 이후 그는 TRS-80이라는 마이크로컴퓨터를 구입하여 간단한 게임과 모델 로켓 비행 예측 프로그램, 그리고 아버지를 위해 사용한 워드 프로세서를 만들었습니다. 이러한 경험들은 그가 나중에 Y Combinator를 시작하고, 새로운 프로그래밍 언어인 Arc를 개발하는 데 큰 도움이 되었습니다.

source_nodes 내용 확인

for node in response.source_nodes:
    print(node)

출력 결과 --> 유사도 점수가 0.2점대로 그렇게 높지 않음.

Node ID: 9c1fd7de-910e-4212-a759-8c7857cec05d
Text: What I Worked On  February 2021  Before college the two main
things I worked on, outside of school, were writing and programming. I
didn't write essays. I wrote what beginning writers were supposed to
write then, and probably still are: short stories. My stories were
awful. They had hardly any plot, just characters with strong feelings,
which I ...
Score:  0.297

Node ID: 8ef64de0-d29b-48a2-a167-29722516b073
Text: Much to my surprise, the time I spent working on this stuff was
not wasted after all. After we started Y Combinator, I would often
encounter startups working on parts of this new architecture, and it
was very useful to have spent so much time thinking about it and even
trying to write some of it.  The subset I would build as an open
source proje...
Score:  0.291

metadata 확인

response.metadata

출력 결과

{'9c1fd7de-910e-4212-a759-8c7857cec05d': {'file_path': 'c:\\llamaindex\\01-Learn\\data\\paul_graham_essay.txt',
  'file_name': 'paul_graham_essay.txt',
  'file_type': 'text/plain',
  'file_size': 75042,
  'creation_date': '2024-08-26',
  'last_modified_date': '2024-08-21'},
 '8ef64de0-d29b-48a2-a167-29722516b073': {'file_path': 'c:\\llamaindex\\01-Learn\\data\\paul_graham_essay.txt',
  'file_name': 'paul_graham_essay.txt',
  'file_type': 'text/plain',
  'file_size': 75042,
  'creation_date': '2024-08-26',
  'last_modified_date': '2024-08-21'}}

Chromadb 내용 확인 : DBeaver

결과 및 인사이트

5줄의 단순한 코드의 Stage의 의미와 작동 방식 등을 조금이나 이해할 수 있었음.
Indexing의 결과를 chromadb에 저장할 때, PERSIST_DIR 방식과 유사하게 적용.
Querying Stage의 3가지 구성 단계
- Retrieval : query와 가장 관련성이 높은 문서를 찾아서 반환.
- PostProcessing : 검색된 결과에 키워드 또는 유사도 cutoff 등을 통해서 필터링.
- Response Synthesis : 쿼리, 관련성 높고 필터링된 데이터(context), 프롬프트 등을 결합하여 LLM으로 전송
Response synthesizer를 구성할 때, text_qa_template의 2가지 입력 변수
- context-str : retriever의 내용이 전달됨.
- query-str : langchain의 invoke 처럼 query 메소드의 input 값이 전달됨.
최종 output인 response에는 다음의 내용들이 포함되어 있음.
- response : LLM이 생성한 답변
- source_nodes : retrieval 결과(Node ID, Text, Score 등)
- metadata : source_node들의 filepath, filename 등

Indexing(embedding)에 사용된 parameter 값
- chunk_size = 1024 (tokens)
- chunk_overlap = 200 (tokens)

⏰ 가장 빠르게 AI를 배우는 곳 | 지피터스 AI스터디 19기 사전판매 시작 (11월 중순 개강) 🚀

LlamaIndex의 RAG pipeline에 대한 스터디

배경 및 목적

참고 자료

활용 툴

실행 과정

RAG pipeline Stage별 적용 내용

결과 및 인사이트

👉 이 게시글도 읽어보세요