[7기 랭체인] 문서 기반 문제 생성기

부제: Fine-tuning? It’s your turn.

Intro

‘고양이 챗봇’ → ‘문서 기반 문제 생성’ 프로젝트 변경
- 다른 고양이 챗봇 존재. (그리 재밌진 않다…)
  Chat with the Cat Generative Dialogue Processor (CatGDP)
- 응답에 따라 이미지를 바로 생성할 경우, 많은 시간 소요
- 개인 공부를 위해 ‘문서 기반 문제 생성’ 서비스 필요.
문서 기반 문제 생성 서비스
- 목표
  - CS 및 ML 기초에 대한 문제 생성
  - 문서 내용에 대한 문제 생성
  - 문제에 대한 정답 및 풀이 제공

1주차 - streamlit 클라우드에 일반 챗봇 만들어 배포해보기

0. Intro

LangChain과 Prompt 공부할 수 있는 기회 삼아서 챗봇까지 만들게 되었습니다.
챗봇 정보
- git: https://github.com/xcellentbird/streamlit-chatbot.git
- url: https://xcellentbird.streamlit.app/

1. 개발 환경

streamlit: python으로 프론트엔드 개발을 가능하게 합니다.
langchain
openai
google: langchain 내에서 구글 서치 tool 사용을 위해 추가

2. Streamlit으로 채팅 UI 만들기

3. LangChain 연결하기

app.py

import traceback

import streamlit as st
from funcy import chunks

from llm_agent import OpenAIChatAgent, GenFAQsLLM

N_FAQS = st.sidebar.number_input("Number of FAQs", min_value=1, max_value=10, value=4)

st.title('🦜🔗 Wrtn Up')

# session에 대화 내역을 저장할 messages 생성
if "messages" not in st.session_state:
    st.session_state.messages = []

# session에 있는 메세지 업데이트
for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.markdown(message["content"])

# 대화 입력 공간 생성
input_content = st.chat_input("What is up?")
if "clicked_faq" in st.session_state:
    input_content = st.session_state['clicked_faq']

    del st.session_state['clicked_faq']

try:
		# langchain chain, agent 객체 생성
    llm_agent = OpenAIChatAgent()
    gen_faq_llm = GenFAQsLLM()

except Exception as e:
    st.error(f"Error initializing agent...\\n\\n{traceback.format_exc()}\\n{e.__class__.__name__}: {e}")
    st.stop()

else:
    def faq_button_callback(clicked_faq: str):
        st.session_state['clicked_faq'] = clicked_faq
		
		# input 값이 있을 경우
    if input_content:
				# session에 input값 업로드
        st.session_state.messages.append({"role": "human", "content": input_content})
				
				# human 대상자로 input값 채팅에 업로드
        with st.chat_message("human"):
            st.markdown(input_content)
				
				# ai 대상자로
        with st.chat_message("ai"):
						# agent를 이용하여 llm response를 가져온다
            response = llm_agent.run(
                chat_history=st.session_state.messages[:-1],
                human_input=input_content,
                st_gen=st,
            )

            st.markdown(response)
            st.markdown("---")
            st.session_state.messages.append({"role": "ai", "content": response})
						
						# 반응형 FAQ 생성
            faqs = gen_faq_llm.run(
                chat_history=st.session_state.messages[:-1],
                human_input=input_content,
                ai_response=response,
            )
						
						# 생성된 FAQ를 버튼으로 생성
            btn_id = 1
            n_cols = 2
            for faqs in chunks(n_cols, faqs):
                cols = st.columns([1] * n_cols)
                for col, faq in zip(cols, faqs):
                    col.button(label=f"*{btn_id}.* {faq}", key=btn_id, on_click=faq_button_callback, args=(faq,))
                    btn_id += 1

llm_agent.py

from langchain.agents.agent import AgentExecutor
from langchain.agents.openai_functions_agent.base import OpenAIFunctionsAgent
from langchain.callbacks import StreamlitCallbackHandler
from langchain.chains import LLMChain
from langchain.chat_models import ChatOpenAI
from langchain.llms.openai import OpenAI
from langchain.prompts.chat import SystemMessagePromptTemplate
from langchain.schema.messages import SystemMessage

from config import OPENAI_API_KEY
from faq_prompt import FAQ_PROMPT
from utils import custom_google_search_tool

Tools = [
    custom_google_search_tool
]

class OpenAIChatAgent:
    system_message = SystemMessage(
        content="You are a helpful AI assistant."
    )

    extra_prompt_messages = [
        SystemMessagePromptTemplate.from_template(
            """\\n---CHAT HISTORY: {chat_history}\\n---"""
        )
    ]

    def __init__(self):
        self.llm = ChatOpenAI(
            openai_api_key=OPENAI_API_KEY,
            model_name='gpt-3.5-turbo-0613',  # gpt 3.5 turbo snapshot with function calling data
        )
        self.agent_executor = AgentExecutor.from_agent_and_tools(
            agent=OpenAIFunctionsAgent.from_llm_and_tools(
                self.llm,
                Tools,
                system_message=self.system_message,
                extra_prompt_messages=self.extra_prompt_messages,
            ),
            tools=Tools,
        )

    def run(self, chat_history, human_input, st_gen):
        st_callback = StreamlitCallbackHandler(st_gen.container())
        chat_history = [f"{message['role']}: {message['content']}" for message in chat_history]

        ai_response = self.agent_executor.run(
            chat_history=str(chat_history),
            input=human_input,
            callbacks=[st_callback],
            verbose=True,
        )

        return ai_response

class GenFAQsLLM:
    def __init__(self, llm_temp: float = 1.0):
        self.llm_temp = llm_temp
        self.faq_prompt_template = FAQ_PROMPT

        self.llm = OpenAI(
            openai_api_key=OPENAI_API_KEY,
            model_name='gpt-3.5-turbo-instruct',
            temperature=self.llm_temp,
            max_tokens=64,
        )

        self.llm_chain = LLMChain(
            llm=self.llm,
            prompt=self.faq_prompt_template,
        )

    def run(self, chat_history, human_input, ai_response, n_faqs=4):
        chat_history = [f"{message['role']}: {message['content']}" for message in chat_history]
        input_dict = {
            "chat_history": chat_history,
            "human_input": human_input,
            "ai_response": ai_response,
            "tools": str({tool.name: tool.description for tool in Tools}),
        }
        input_list = [input_dict for _ in range(n_faqs)]

        faqs = self.llm_chain.apply(input_list)

        return [faq['text'] for faq in faqs]

utils.py

from langchain.tools.google_search.tool import GoogleSearchRun, GoogleSearchAPIWrapper

from config import GOOGLE_API_KEY, GOOGLE_CSE_ID

custom_google_search_tool = GoogleSearchRun(
    name="google_search",
    description=("A wrapper around Google Search. "
                 "Useful for when you need to understand user intents "
                 "and answer user's questions about current events. "
                 "Or Double Check the fact to avoid AI hallucinations. "
                 "Input should be a search query."),
    api_wrapper=GoogleSearchAPIWrapper(
        google_api_key=GOOGLE_API_KEY,
        google_cse_id=GOOGLE_CSE_ID,
        k=8
    )
)

4. 챗봇 배포하기

Streamlit Community Cloud
https://share.streamlit.io/
배포 방법
1. git repo를 만들어 streamlit 코드 push
2. streamlit cloud에서 repo 접근 허용 및 연결, 배포

2주차 - 프로젝트 초안

1. streamlit 프로젝트 생성

파이썬 환경
- pyenv install 3.9.18
- poetry

2. streamlit - 문서 입력 및 저장

pdf_file = st.file_uploader("Upload Files", type=['pdf'])

if pdf_file:
    bytes_data = pdf_file.getvalue()
    file_url = f"files/{pdf_file.name}"
    with open(file_url, "wb") as f:
        f.write(bytes_data)

3. langchain - 문서 load

test_scope = (6, 12)

docs = PyPDFLoader(file_url).load()
doc = '\\n'.join([page.page_content for page in docs[test_scope[0]:test_scope[1]])

4. langchain - LLM 생성

gpt4 = OpenAIChat(
    model_name="gpt-4",
    temperature=0,
    max_tokens=512,
    openai_api_key=OPENAI_API_KEY
)

모델 선택
- OpenAI API 사용
- 긴 문서와 여러 문제를 출제해야하기 때문에 GPT-4 시리즈 이용
- AzureOpenAI의 경우 개인 사용자 사용 불가능
- GPT-4-32K는 일반 OpenAI API에서 사용 불가능
- GPT-4 선택
output 일관성을 위해 temperature=0으로 설정

5. langchain - Prompt, Chain 생성

prompt

prompt_template = """
You are a teacher. You are teaching a class of students. You are teaching them about the importance of BASIC.
Make Test {num} Questions for the following text

---
Question Info
Types: 4-choice (sentence or short words options)
Language: {language}
---

TEXT:
{text}
"""

prompt = ChatPromptTemplate.from_messages([
    SystemMessagePromptTemplate.from_template(prompt_template)
])

chain


test_generator = LLMChain(
    llm=gpt4,
    prompt=prompt,
    verbose=True
)

6. langchain, streamlit - 문제 생성 및 출력

test = test_generator.run(text=doc, num=10, language='Korean')
st.markdown(test)

References:

langchain: https://python.langchain.com/docs/get_started/introduction
streamlit: https://docs.streamlit.io/
- login Auth
  Streamlit-Authenticator, Part 1: Adding an authentication component to your app
  Streamlit-Authenticator, Part 2: Adding advanced features to your authentication component
- google Auth
  Google Authentication in a streamlit app
openai doc: https://platform.openai.com/docs/introduction/overview
openai-cookbook: https://github.com/openai/openai-cookbook/tree/main/examples
프롬프트 엔지니어링 가이드: https://www.promptingguide.ai/kr

⏰ 지피터스 AI스터디 17기 얼리버드 OPEN | 빠르게 시작하세요 🚀

[7기 랭체인] 문서 기반 문제 생성기

1주차 - streamlit 클라우드에 일반 챗봇 만들어 배포해보기

0. Intro

1. 개발 환경

2. Streamlit으로 채팅 UI 만들기

3. LangChain 연결하기

4. 챗봇 배포하기

2주차 - 프로젝트 초안

1. streamlit 프로젝트 생성

2. streamlit - 문서 입력 및 저장

3. langchain - 문서 load

4. langchain - LLM 생성

5. langchain - Prompt, Chain 생성

6. langchain, streamlit - 문제 생성 및 출력

References:

👉 이 게시글도 읽어보세요