랭체인 2-1. 벡터 저장소를 활용한 유사도 검색
vector_store_search.py
1차과제를 해보고 나서인지, 어느 모듈을 써야 하는지를 별도 검색하지 않고도 VSCode 안에서 “langchain.” 혹은 “langchain_community.” 을 쳐보면서 찾아내는 재미가 있었네요
물론, 사용법에 대해서는 예제를 검색해보면서 해결했습니다.
내용도 그리 많거나 하지는 않아서, 결과를 가져온 다음에 score 순으로 소팅해서 처리했네요.
물론, DB 검색에 대해서 소팅 조건을 준다거나 하는 방법을 썼어야 할 것 같은데, 일단 이해하기 위한 과제니까요 :)
import os
# 2차 과제 - 1번
# 비구조화된 데이터를 저장하고 검색하는 가장 일반적인 방법 중 하나는 데이터를 임베딩하여 생성된 벡터를 저장하고,
# 검색 시점에 쿼리를 임베딩하여 '가장 유사한' 임베딩 벡터를 검색하는 것입니다. 이 실습에서는 벡터 저장소를 사용하여 임베딩된 데이터를 저장하고 벡터 검색을 수행하는 방법을 배웁니다.
#
# 1. 첫 번째 단계에서는 문서(state_of_the_union.txt)를 로드하고 적절한 크기로 나눠줍니다.
# 이후, 이를 임베딩하여 벡터 저장소에 로드하는 과정을 합니다.
# 두 번째 단계에서는 주어진 쿼리에 대해 유사도 검색을 수행하고, '가장 유사한' 결과를 검색하여 출력합니다.
# 2. 예를 들어, "What did the president say about Ketanji Brown Jackson"이라는 쿼리에 대해 **유사도 검색(similarity_search)**을 수행하고,
# 두번째로 **최대 한계 관련성 검색(MMR)**으로 2개의 검색된 문서의 내용을 출력하는 과정을 실습합니다
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
# from langchain_community.chat_models import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
load_dotenv()
# 텍스트 문서 로딩
text_loader = TextLoader("./state_of_the_union.txt")
# 텍스트 스플릿
text_splitter = CharacterTextSplitter()
split_documents = text_loader.load_and_split(text_splitter=text_splitter)
db = Chroma.from_documents(split_documents, OpenAIEmbeddings(model="text-embedding-3-small"))
query_text = "What did the president say about Ketanji Brown Jackson"
print ("--------------similarity_search-----------------")
search_result_docs = db.similarity_search_with_score(query=query_text)
for doc in search_result_docs :
print (doc[1])
# print (search_result_docs)
# 점수순 내림차순 소팅
search_result_docs.sort(key=lambda x: x[1], reverse=True)
print ("--------------sort-----------------")
# print (search_result_docs)
for doc in search_result_docs :
print (doc[1])
print ("--------------result-----------------")
# 결과 출력
print (search_result_docs[0][0].page_content)
# MMR 검색
print ("--------------mmr_search-----------------")
search_result_docs = db.max_marginal_relevance_search(query=query_text, fetch_k=2)
for i in range(len(search_result_docs)) :
print (f"### Document No.{i+1} ###\n")
print (search_result_docs[i].page_content)
print ("\n")
결과
jihongkim@MacBook-Pro TestProject % python3 /Users/jihongkim/Desktop/PythonWorkspace/TestProject/vector_store_search.py
--------------similarity_search-----------------
1.1704542636871338
1.3706949949264526
1.3989368677139282
1.3989405632019043
--------------sort-----------------
1.3989405632019043
1.3989368677139282
1.3706949949264526
1.1704542636871338
--------------result-----------------
That’s why I’ve proposed closing loopholes so the very wealthy don’t pay a lower tax rate than a teacher or a firefighter.
So that’s my plan. It will grow the economy and lower costs for families.
So what are we waiting for? Let’s get this done. And while you’re at it, confirm my nominees to the Federal Reserve, which plays a critical role in fighting inflation.
My plan will not only lower costs to give families a fair shot, it will lower the deficit.
The previous Administration not only ballooned the deficit with tax cuts for the very wealthy and corporations, it undermined the watchdogs whose job was to keep pandemic relief funds from being wasted.
But in my administration, the watchdogs have been welcomed back.
We’re going after the criminals who stole billions in relief money meant for small businesses and millions of Americans.
And tonight, I’m announcing that the Justice Department will name a chief prosecutor for pandemic fraud.
By the end of this year, the deficit will be down to less than half what it was before I took office.
The only president ever to cut the deficit by more than one trillion dollars in a single year.
Lowering your costs also means demanding more competition.
I’m a capitalist, but capitalism without competition isn’t capitalism.
It’s exploitation—and it drives up prices.
When corporations don’t have to compete, their profits go up, your prices go up, and small businesses and family farmers and ranchers go under.
We see it happening with ocean carriers moving goods in and out of America.
During the pandemic, these foreign-owned companies raised prices by as much as 1,000% and made record profits.
Tonight, I’m announcing a crackdown on these companies overcharging American businesses and consumers.
And as Wall Street firms take over more nursing homes, quality in those homes has gone down and costs have gone up.
That ends on my watch.
Medicare is going to set higher standards for nursing homes and make sure your loved ones get the care they deserve and expect.
We’ll also cut costs and keep the economy going strong by giving workers a fair shot, provide more training and apprenticeships, hire them based on their skills not degrees.
Let’s pass the Paycheck Fairness Act and paid leave.
Raise the minimum wage to $15 an hour and extend the Child Tax Credit, so no one has to raise a family in poverty.
Let’s increase Pell Grants and increase our historic support of HBCUs, and invest in what Jill—our First Lady who teaches full-time—calls America’s best-kept secret: community colleges.
And let’s pass the PRO Act when a majority of workers want to form a union—they shouldn’t be stopped.
When we invest in our workers, when we build the economy from the bottom up and the middle out together, we can do something we haven’t done in a long time: build a better America.
For more than two years, COVID-19 has impacted every decision in our lives and the life of the nation.
And I know you’re tired, frustrated, and exhausted.
But I also know this.
Because of the progress we’ve made, because of your resilience and the tools we have, tonight I can say
we are moving forward safely, back to more normal routines.
We’ve reached a new moment in the fight against COVID-19, with severe cases down to a level not seen since last July.
Just a few days ago, the Centers for Disease Control and Prevention—the CDC—issued new mask guidelines.
Under these new guidelines, most Americans in most of the country can now be mask free.
And based on the projections, more of the country will reach that point across the next couple of weeks.
Thanks to the progress we have made this past year, COVID-19 need no longer control our lives.
I know some are talking about “living with COVID-19”. Tonight – I say that we will never just accept living with COVID-19.
--------------mmr_search-----------------
### Document No.1 ###
They were responding to a 9-1-1 call when a man shot and killed them with a stolen gun.
Officer Mora was 27 years old.
Officer Rivera was 22.
Both Dominican Americans who’d grown up on the same streets they later chose to patrol as police officers.
I spoke with their families and told them that we are forever in debt for their sacrifice, and we will carry on their mission to restore the trust and safety every community deserves.
I’ve worked on these issues a long time.
I know what works: Investing in crime preventionand community police officers who’ll walk the beat, who’ll know the neighborhood, and who can restore trust and safety.
So let’s not abandon our streets. Or choose between safety and equal justice.
Let’s come together to protect our communities, restore trust, and hold law enforcement accountable.
That’s why the Justice Department required body cameras, banned chokeholds, and restricted no-knock warrants for its officers.
That’s why the American Rescue Plan provided $350 Billion that cities, states, and counties can use to hire more police and invest in proven strategies like community violence interruption—trusted messengers breaking the cycle of violence and trauma and giving young people hope.
We should all agree: The answer is not to Defund the police. The answer is to FUND the police with the resources and training they need to protect our communities.
I ask Democrats and Republicans alike: Pass my budget and keep our neighborhoods safe.
And I will keep doing everything in my power to crack down on gun trafficking and ghost guns you can buy online and make at home—they have no serial numbers and can’t be traced.
And I ask Congress to pass proven measures to reduce gun violence. Pass universal background checks. Why should anyone on a terrorist list be able to purchase a weapon?
Ban assault weapons and high-capacity magazines.
Repeal the liability shield that makes gun manufacturers the only industry in America that can’t be sued.
These laws don’t infringe on the Second Amendment. They save lives.
The most fundamental right in America is the right to vote – and to have it counted. And it’s under assault.
In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections.
We cannot let this happen.
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections.
Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.
And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.
And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system.
We can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling.
We’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers.
We’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster.
### Document No.2 ###
We’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster.
We’re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.
We can do all this while keeping lit the torch of liberty that has led generations of immigrants to this land—my forefathers and so many of yours.
Provide a pathway to citizenship for Dreamers, those on temporary status, farm workers, and essential workers.
Revise our laws so businesses have the workers they need and families don’t wait decades to reunite.
It’s not only the right thing to do—it’s the economically smart thing to do.
That’s why immigration reform is supported by everyone from labor unions to religious leaders to the U.S. Chamber of Commerce.
Let’s get it done once and for all.
Advancing liberty and justice also requires protecting the rights of women.
The constitutional right affirmed in Roe v. Wade—standing precedent for half a century—is under attack as never before.
If we want to go forward—not backward—we must protect access to health care. Preserve a woman’s right to choose. And let’s continue to advance maternal health care in America.
And for our LGBTQ+ Americans, let’s finally get the bipartisan Equality Act to my desk. The onslaught of state laws targeting transgender Americans and their families is wrong.
As I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential.
While it often appears that we never agree, that isn’t true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-Americans from still-too-common hate crimes to reforming military justice.
And soon, we’ll strengthen the Violence Against Women Act that I first wrote three decades ago. It is important for us to show the nation that we can come together and do big things.
So tonight I’m offering a Unity Agenda for the Nation. Four big things we can do together.
First, beat the opioid epidemic.
There is so much we can do. Increase funding for prevention, treatment, harm reduction, and recovery.
Get rid of outdated rules that stop doctors from prescribing treatments. And stop the flow of illicit drugs by working with state and local law enforcement to go after traffickers.
If you’re suffering from addiction, know you are not alone. I believe in recovery, and I celebrate the 23 million Americans in recovery.
Second, let’s take on mental health. Especially among our children, whose lives and education have been turned upside down.
The American Rescue Plan gave schools money to hire teachers and help students make up for lost learning.
I urge every parent to make sure your school does just that. And we can all play a part—sign up to be a tutor or a mentor.
Children were also struggling before the pandemic. Bullying, violence, trauma, and the harms of social media.
As Frances Haugen, who is here with us tonight, has shown, we must hold social media platforms accountable for the national experiment they’re conducting on our children for profit.
It’s time to strengthen privacy protections, ban targeted advertising to children, demand tech companies stop collecting personal data on our children.
And let’s get all Americans the mental health services they need. More people they can turn to for help, and full parity between physical and mental health care.
Third, support our veterans.
Veterans are the best of us.
I’ve always believed that we have a sacred obligation to equip all those we send to war and care for them and their families when they come home.
My administration is providing assistance with job training and housing, and now helping lower-income veterans get VA care debt-free.
Our troops in Iraq and Afghanistan faced many dangers.
랭체인 2-2. 검색-강화 생성 체인 구축하기
retrieval_augmented_chain.py
기본 검색 컨텍스트를 그대로 프롬프트 체인으로 넘겨서 강화(추가) 검색을 하는 방법으로 이해했습니다.
어떻게 사용하면 될지는 알 것 같은데요, 하지만 아직 이게 실제로 어떤 경우에 응용이 가능한 건지 감이 안오는데, 실제 조금 더 복잡한 응용 사례에서 사용하는 사례를 알면 도움이 크게 될 것 같네요.
import os
# 2차 과제 - 2번
# "검색-강화 생성" 체인을 통해 특정 질문에 대한 컨텍스트를 검색하고, 이를 기반으로 질문에 답하는 방법을 배웁니다.
# 이 과정에서 벡터 저장소를 활용하여 관련 컨텍스트를 검색하고, 검색된 컨텍스트를 사용하여 모델이 질문에 답하도록 합니다.
#
# 1. “harrison worked at kensho”라는 문장을 FAISS에 저장합니다.
# 2. "where did harrison work?”라는 질문에 대한 답변을 출력합니다.
# 3. 두번째 체인에서는 “where did harrison work?”이라는 질문에 대한 답변을 이탈리아어로 답변을 출력합니다.
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
# from langchain_community.chat_models import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain_core.runnables import RunnablePassthrough
load_dotenv()
# 1. “harrison worked at kensho”라는 문장을 FAISS에 저장합니다.
db_faiss = FAISS.from_texts(["harrison worked at kensho"], embedding=OpenAIEmbeddings())
query_text = "where did harrison work?"
# 2. "where did harrison work?”라는 질문에 대한 답변을 출력합니다.
# docs = db_faiss.similarity_search(query=query_text)
# print (docs[0].page_content)
retriever = db_faiss.as_retriever(search_type="similarity")
docs = retriever.invoke(query_text)
print (docs[0].page_content)
prompt = PromptTemplate.from_template(
"""
Answer the question based on the context in italian.
Question : {question}
Context : {context}
"""
)
llm = ChatOpenAI()
# 출력 포맷을 변환하기 위해 출력 파서를 추가하고
output = StrOutputParser()
chain = {"question": RunnablePassthrough(), "context" : retriever} | prompt | llm | output
result = chain.invoke(query_text)
print (result)
결과
jihongkim@MacBook-Pro TestProject % python3 /Users/jihongkim/Desktop/PythonWorkspace/TestProject/retrieval_augmented_chain.py
harrison worked at kensho
Harrison ha lavorato a Kensho.
#11기랭체인