Document Loader
Document Loader 는 다양한 형식의 문서를 불러오고 이를 langchain에 결합하기 쉬운 텍스트 형태로 변환하는 기능을 합니다. 사용자는 pdf, word, ppt, xlsx, csv 등 거의 모든 문서를 기반으로 LLM을 사용할 수 있음
URL Document Loader
대표적인 URL Loader는 WebBaseLoader와 UnstructuredURLLoader가 있습니다.
1.1 WebBaseLoader
!pip install openai !pip install langchain
from langchain.document_loaders import WebBaseLoader
loader = WebBaseLoader("https://n.news.naver.com/mnews/article/092/0002307222?sid=105")
data = loader.load()
print(data[0].page_content)
1.2 UnstructuredURLLoader
!pip install unstructured
from langchain.document_loaders import UnstructuredURLLoader
urls = [
"<https://n.news.naver.com/mnews/article/092/0002307222?sid=105>",
"<https://n.news.naver.com/mnews/article/052/0001944792?sid=105>"
]
loader = UnstructuredURLLoader(urls=urls)
data = loader.load()
data
PDF Document Loader
from langchain.document_loaders import PyPDFLoader
loader = PyPDFLoader("/content/drive/mydrive/0_스마트폰출결.pdf")
pages = loader.load_and_split()
pages[0]
Word Document Loader
!pip install docx2txt
from langchain.document_loaders import Docx2txtLoader
loader = Docx2txtLoader("/content/drive/mydrive/0_A Prompt Pattern Catalog.docx")
data = loader.load() data
CSV Document Loader
from langchain.document_loaders.csv_loader import CSVLoader
loader = CSVLoader(file_path='/content/drive/mydrive/0.블로플로_키워드_조사결과.csv', csv_args={
'delimiter': ',',
'quotechar': '"',
'fieldnames': ['keyword', 'volumn', 'competition', 'competition_index', 'high_bid', 'low_bid', 'yoy']
})
data = loader.load()
data
https://www.youtube.com/watch?v=tIU2tw3PMUE&list=PLQIgLu3Wf-q_Ne8vv-ZXuJ4mztHJaQb_v