동거인들의 생활비 N빵 정산하기 (feat. streamlit)

개요/목적

나는 지금 언니와 형부와 함께 셋이 살고 있다.
공동으로 사용하는 지출들을 각자 되는 사람이 지출하고, 이를 월말에 정산하여 N빵을 하고 있다.
기존 방식이 너무 번거로워, 카톡방에 #정산 같은 키워드를 넣고 해당 내역을 추출하는 툴을 만들고자 한다.

기존 방식

각자 지출하고, 각 내역을 화이트보드에 적어둔다.
월말에 모여, 스프레드시트에 일일히 기록하고, 수식으로 합산 후 N빵 정산을 한다

새로운 방식

지출할 때마다 카카오톡에 #정산 해시태그를 달고, 내역과 금액을 기재한다.
카카오톡 대화내역 csv 추출하여, 특정기간 동안 정산한 내역을 추출하여 합산하는 코드를 짜서 써보자!
사실 누리님이 부트캠프 챌린지방 카운트하는 방식을 따라했다. 아이디어 차용 ㅎㅎ [7기 API방] 지피터스 커뮤니티 리더의 아주 소소한 업무 자동화

사용한 툴

ChatGPT4.0, streamlit

1. 카카오톡에 정한 포맷으로 올리고, 대화내용 추출하기

반드시 지킨 양식 : 1) #정산 태그를 단다. 2) 내역과 금액은 슬래시(/)로 구분한다(띄어쓰기 무관).
최대한 다양한 형태로 올려보는 것을 시도했다. 올릴 때 아무 생각없이 편하게 올리게 하고 싶었다.
1개 메시지에 2개 내역도 넣어보고, 엔터도 쳐보고, 띄어쓰기도 넣어봤다.

2. GPT에게 이 파일 주면서, 내가 하고싶은 것 말하기

파일이 뭔지 설명하고, 내가 하려는 것 설명하기

이거는 나랑 언니랑 형부랑 셋이 대화한 카카오톡 채팅 내역 파일이야. 
우리는 매달 각자가 쓴 금액을 다 합친 후 N빵 정산을 하려고 해.

수행할 내용 쪼개어 요청하기

다음을 수행해줘 

1. '#정산' 이라는 단어가 들어간 Message를 찾아 
2. 그 메시지 내용은 '사용내역/금액' 이 들어가있어. 이걸 '/' 기준으로 사용내역과 금액으로 나눠줘. 
3. 그 메시지를 작성한 사람도 '정산자' 필드로 필요해 
4. 이 모든 메시지는 특정 기간에 해당하는 것만 필요한데, 2월 1일부터 현재까지의 기간동안의 메시지만 갖고 해줘. 
5. 그러면 '날짜', '정산자', '사용내역', '금액' 이렇게 csv 표로 추출해줘. 

참고로 한 개의 메시지에는 2개 이상의 정산내역이 있을 수 있어. 항상 정산내역은 '내역/금액'으로 적혀있어

3. 수행한 결과물 확인하기

이때 겪었던 시행착오

처음에 데이터가 빈 csv를 생성해주길래, 내가 올린 csv파일을 잘 읽고있는지 확인하기 위해 ‘이 파일을 읽고 처음 5개의 행을 읽어줘’ 와 같은 요청을 1회 해줬었음.
혼자 여러 번 에러를 냈음. 올린 파일의 필드명(User, Date, Message)과 내가 요청한 필드명(정산자, 날짜 등)이 달라서 GPT 혼자 파악하는 데 시간이 걸린듯? 어찌저찌 기특하게도 결과물을 내줬음.

지피티가 수행한 파이썬 코드 (안읽어도 됩니다. 궁금한 사람 있을까봐 첨부)

# Redefine the start date for filtering the messages
start_date_corrected = datetime(2024, 2, 1)

# Convert the 'Date' column to datetime objects and filter messages from February 1st to the current date
chat_data_corrected['Date'] = pd.to_datetime(chat_data_corrected['Date'], format='%Y-%m-%d %H:%M:%S')
filtered_data_corrected = chat_data_corrected[(chat_data_corrected['Date'] >= start_date_corrected)]

# Find messages that contain the keyword '#정산'
settlement_messages_corrected = filtered_data_corrected[filtered_data_corrected['Message'].str.contains('#정산')]

# Split the messages into '사용내역' and '금액' based on '/'
# Also, prepare the '정산자' field
settlement_data_corrected = []
for index, row in settlement_messages_corrected.iterrows():
    # Splitting by space to handle multiple entries in one message
    entries = row['Message'].split()
    for entry in entries:
        if '/' in entry:
            item, amount = entry.split('/')
            # Only include entries that have both item and amount after split
            if item and amount:
                settlement_data_corrected.append({
                    '날짜': row['Date'],
                    '정산자': row['User'],
                    '사용내역': item.replace('#정산', '').strip(),
                    '금액': amount.strip()
                })

# Convert the settlement data to a DataFrame
settlement_df_corrected = pd.DataFrame(settlement_data_corrected)

# Export the DataFrame to a CSV file
csv_file_path_corrected = '/mnt/data/settlement_data_corrected.csv'
settlement_df_corrected.to_csv(csv_file_path_corrected, index=False)

csv_file_path_corrected, settlement_df_corrected.head()

얻어낸 csv 파일 확인하기

두줄짜리도, 한줄짜리도, 엔터든 띄어쓰기든 다 내가 원하는대로 처리해줌

4. 이걸 streamlit으로 구현하기 위한 코드 짜달라고 요청하기

필요한 인풋 필드를 나열해줌 (파일 업로드, 시작날짜, 종료날짜)
출력해야 할 값을을 알려줌 (정산자 별 지출 금액, 전체 총합)

오 고마워! 난 이거를 streamlit 으로 구현하려고 해

인풋 필드로는 csv 파일 업로드, 계산할 시작날짜와 종료날짜가 필요해. 그 기간동안의 message를 추출해야하거든. 그리고 결과물로는 각 정산자별로 총 얼마를 썼는지 알려줘야해. 그리고 전체 총합이 얼만지도 알려줘. streamlit으로 구현하기 위한 코드를 짜줘!

GPT의 답변

내 요청사항을 단계별로 쪼개어 어떻게 수행할지를 내게 알려준 후, 코드를 짜준다.

그러나 에러가 나고, 에러를 또 물어봤다. (에러가 나도 나는 두렵지 않아! 지피티에게 물어봐!)

date type 에서 약간의 실수가 있었나봄. (지피티도 ㄱr끔 실수를 한ㄷr..)

괜찮아! 그래도 다시 해줄거잖아?

오류 메시지 "TypeError: dtype=datetime64[ns]와 날짜 간의 잘못된 비교"라는 오류 메시지는 날짜를 비교할 때 유형이 일치하지 않음을 나타냅니다. 이는 Streamlit의 date_input에서 얻은 start_date 및 end_date가 날짜 객체가 아니라 날짜 객체이고 데이터 프레임의 datetime64[ns] 열과 비교하려고 하기 때문에 발생하는 것일 수 있습니다.

이 문제를 해결하려면 start_date 및 end_date를 날짜/시간 객체로 변환하여 비교가 올바르게 작동하도록 시간 구성 요소가 설정되어 있는지 확인해야 합니다. 다음은 변환이 포함된 코드 스니펫의 업데이트된 버전입니다:

다시 지피티가 답해줬고, 수정된 부분만 주길래, 전체 코드를 다시 달라고 했다.

5. 왠지 아쉬워서 욕심을 부려봤다. 그래서 얼마를 내가 더 내야하는데?

총합을 인원수로 N빵해서, 각자 얼마를 더 내야하고(-), 얼마를 받아야 하는지(+) 까지 알려주는 코드를 짜보자

고마워. 근데 하나를 더 추가하고 싶어.
세 명이 각각 지출한 금액의 각 합산과, 전체 합산 값을 얻었는데, 이제 우리는 N빵을 해야 해.

전체 합산값(grand_total)을 정산자 총 인원수로 나눈 값을 계산해줘
그리고 각 정산자가 이미 지불한 금액을 고려하여, 얼마를 더 내야 하는지를 표로 정리해줘. 
이미 지불한 금액이 N빵한 금액보다 적다면 +로 금액 표시, 더 많다면 -로 금액을 표시해줘.

이 코드까지 추가한 전체 코드를 부탁해.

최종 스트림릿 코드

import streamlit as st
import pandas as pd
from datetime import datetime

# Function to process the CSV data
def process_data(file, start_date, end_date):
    # Read the CSV data
    chat_data = pd.read_csv(file)
    
    # Convert date columns to datetime
    chat_data['Date'] = pd.to_datetime(chat_data['Date'], format='%Y-%m-%d %H:%M:%S')
    
    # Ensure start_date and end_date are datetime objects at the beginning of the day for start and end of the day for end
    start_datetime = pd.to_datetime(start_date)
    end_datetime = pd.to_datetime(end_date) + pd.Timedelta(days=1, seconds=-1)
    
    # Filter data for the given date range
    mask = (chat_data['Date'] >= start_datetime) & (chat_data['Date'] <= end_datetime)
    filtered_data = chat_data[mask]
    
    # Find messages that contain the keyword '#정산'
    settlement_messages = filtered_data[filtered_data['Message'].str.contains('#정산')]
    
    # Process messages to extract settlement data
    settlement_data = []
    for _, row in settlement_messages.iterrows():
        entries = row['Message'].split()
        for entry in entries:
            if '/' in entry:
                item, amount = entry.split('/')
                if item and amount:
                    settlement_data.append({
                        'Date': row['Date'],
                        'Settler': row['User'],
                        'Usage': item.replace('#정산', '').strip(),
                        'Amount': float(amount.strip())
                    })
    
    # Convert to DataFrame
    settlement_df = pd.DataFrame(settlement_data)
    
    # Summarize data by settler
    summary_by_settler = settlement_df.groupby('Settler')['Amount'].sum().reset_index()
    grand_total = settlement_df['Amount'].sum()
    
    # Calculate the amount each person should pay (N빵)
    equal_share = grand_total / len(summary_by_settler)
    
    # Calculate how much each settler needs to pay or receive
    summary_by_settler['Settlement'] = equal_share - summary_by_settler['Amount']
    
    # Format the Settlement column to show + for amounts to pay, and - for amounts to receive
    summary_by_settler['Settlement'] = summary_by_settler['Settlement'].apply(
        lambda x: f"+{x:.2f}" if x > 0 else f"{x:.2f}")
    
    return summary_by_settler, grand_total, equal_share

# Streamlit interface
st.title('Monthly Settlement Calculator')

# File upload widget
uploaded_file = st.file_uploader("Choose a CSV file", type="csv")

# Date input fields
start_date = st.date_input('Start Date', datetime.today())
end_date = st.date_input('End Date', datetime.today())

# Button to process data
if st.button('Calculate Settlements'):
    if uploaded_file is not None and start_date and end_date:
        # Process the uploaded CSV file
        summary_by_settler, grand_total, equal_share = process_data(uploaded_file, start_date, end_date)
        
        # Display results
        st.write('Total amount spent by each settler:')
        st.dataframe(summary_by_settler)
        
        st.write(f'Grand Total of all expenses: {grand_total:.2f}')
        st.write(f'Each person\'s share: {equal_share:.2f}')
        
        # Show how much each settler needs to pay or receive
        st.write('Settlement per settler:')
        st.dataframe(summary_by_settler[['Settler', 'Settlement']])
    else:
        st.error('Please upload a file and select a valid date range.')

# Run the Streamlit app from the command line using: streamlit run your_script_name.py

6. streamlit에 넣고 실행해보기 + 결과물

결론

다음 달 정산 때 이 앱으로 해야겠다.
https://ningcalculate.streamlit.app/ ← 정산이 필요하면 써보세요! (링크가 언제까지 유효할지 모르니, 위에 코드 복붙해서 쓰시는 걸 추천합니다.)
~~그리고 송다혜는 26만원을 더 내야 한다……~~

끗

#문과생도AI

⏰ 가장 빠르게 AI를 배우는 곳 | 지피터스 AI스터디 17기 🚀