논문 요약 삽질기

챗GPT에서 Scholar Assist라는 플러그인을 활용해서 관심 있는 논문을 불러오고, 이를 요약해봤습니다. 아래 4가지 정도로 테스트를 해봤는데, 별로 마음에 들지 않았습니다.

6학년에게 설명하는 식으로 리뷰
배경지식이 전혀 없는 사람에게 설명하는 식으로 논문을 설명
배경지식이 있는 전문가에게 설명하는 식으로 논문을 설명
배경지식이 전혀 없는 사람을 위한 유튜브 스크립트 작성
배경지식이 있는 전문가에게 설명하는 식으로 유튜브 스크립트 작성

https://chat.openai.com/share/cb8b5a64-6f3d-4b1c-838e-ebb1bc569127

https://chat.openai.com/share/9707d9d9-0a0b-4d15-96a6-aae19ef95f85

플러그인의 문제인가 싶어서 인기 있는 플러그인인 Scholar AI로도 시도를 해봤습니다. Custom instruction을 아래와 같이 추가했더니 논문을 찾자마자 바로 요약을 해주었습니다.

User Custom Instruction

: I'm a NLP Engineer who is working for 6 years on it. I'd like to read a paper fast by reading well-compressed version of a paper.

System Custom Instruction

: You are the professional research scientist major in NLP. You will write long well-compressed summary of a paper.

바로 작성해준 요약이 별로여서 추가로 항목별로 요약을 해달라고도 요청을 해봤지만, 이 역시 크게 다르지 않았습니다. 요약 부분 있는 내용만 가져오는 느낌을 받았습니다. PDF 전체 내용을 가져오는 게 아닌 것 같아 AskYourPDF 플러그인을 써보기로 했습니다.

https://chat.openai.com/share/c70eab04-6f4e-4d22-a145-48b3d7d3c610

AskYourPDF는 조금 더 내용이 추가된 것을 볼 수 있었으나, 이 마저도 각 항목별 내용이 적어서 크게 도움이 되진 않았습니다.

https://chat.openai.com/share/3c4189a6-e79c-4563-9256-4d99e0ab8958

다른 괜찮은 서비스는 없는지 궁금해서 구현된 서비스를 찾아 보니 Scholary와 OpenRead 서비스를 찾을 수 있었습니다.

아래는 Scholary 결과입니다.

Introduction

Building generally capable embodied agents that continuously explore, plan, and develop new skills in open-ended worlds is a grand challenge for the AI community [1,2,3,4,5].

Recent advances in large language model (LLM) based agents harness the world knowledge encapsulated in pre-trained LLMs to generate consistent action plans or executable policies [16, 22, 19].
They are applied to embodied tasks like games and robotics [23,24,25,26,27], as well as NLP tasks without embodiment [28,29,30].
We argue that an effective lifelong learning agent should have similar capabilities as human players: (1) propose suitable tasks based on its current skill level and world state, e.g., learn to harvest sand and cactus before iron if it finds itself in a desert rather than a forest; (2) refine skills based on environmental feedback and commit mastered skills to memory for future reuse in similar situations; (3) continually explore the world and seek out new tasks in a self-driven manner
MethodsVOYAGER consists of three novel components: (1) an automatic curriculum (Sec. 2.1) that suggests objectives for open-ended exploration, (2) a skill library (Sec. 2.2) for developing increasingly complex behaviors, and (3) an iterative prompting mechanism (Sec. 2.3) that generates executable code for embodied control.
Embodied agents encounter a variety of objectives with different complexity levels in open-ended environments.
An automatic curriculum offers numerous benefits for open-ended exploration, ensuring a challenging but manageable learning process, fostering curiosity-driven intrinsic motivation for agents to learn and explore, and encouraging the development of general and flexible problemsolving strategies [42,43,44].
Our automatic curriculum capitalizes on the internet-scale knowledge contained within GPT-4 by prompting it to provide a steady stream of new tasks or challenges.
Instead of engaging in multi-round conversations, we concatenate a system prompt and a user prompt to obtain each assistant’s response
ResultsWe systematically evaluate VOYAGER and baselines on their exploration performance, tech tree mastery, map coverage, and zero-shot generalization capability to novel tasks in a new world.

Significantly better exploration.
We systematically evaluate VOYAGER and baselines on their exploration performance, tech tree mastery, map coverage, and zero-shot generalization capability to novel tasks in a new world.
VOYAGER’s superiority is evident in its ability to consistently make new strides, discovering 63 unique items within 160 prompting iterations, 3.3× many novel items compared to its counterparts.
The Minecraft tech tree tests the agent’s ability to craft and use a hierarchy of tools.
Progressing through this tree requires the agent to master systematic and compositional skills.
This underscores the effectiveness of the automatic curriculum, which consistently presents challenges of suitable complexity to facilitate the agent’s progress
ConclusionWe introduce VOYAGER, the first LLM-powered embodied lifelong learning agent, which leverages GPT-4 to explore the world continuously, develop increasingly sophisticated skills, and make new discoveries consistently without human intervention.
VOYAGER exhibits superior performance in discovering novel items, unlocking the Minecraft tech tree, traversing diverse terrains, and applying its learned skill library to unseen tasks in a newly instantiated world.
VOYAGER serves as a starting point to develop powerful generalist agents without tuning the model parameters

아래는 OpenRead 결과입니다.

Full Paper Espresso

The text discusses the development of an AI agent called Voyager that can autonomously explore and learn in the game Minecraft. Voyager uses a large language model, an automatic curriculum, a skill library, and an iterative prompting mechanism to improve its performance and acquire new skills without human intervention. The agent outperforms other baselines in exploration, mastery of game mechanics, map coverage, and generalization to new tasks. The text describes the design choices and performance of the Voyager learning algorithm, which uses GPT-4 for code generation and explores open-ended tasks in Minecraft. It emphasizes the importance of automatic curriculum, skill library, and self-verification for the agent's progress, and highlights the potential for enhancing Voyager with multimodal perception models. The text also mentions limitations of the GPT-4 API and the use of human feedback to improve Voyager's performance. The text discusses the use of GPT-3.5 in the Voyager project for automating tasks in Minecraft. It explains the warm-up schedule used to train the agent and describes the input prompt for GPT-4, including control primitive APIs. The text also mentions software tools like ReAct, Reflexion, and AutoGPT that automate tasks in the game. It analyzes the impact of removing certain design choices in the Voyager project and presents the results in Figure 1.

Espresso By Paragraph

1 Introduction: The text discusses the challenge of building embodied agents that can continuously explore, plan, and develop new skills in open-ended worlds. The authors propose Voyager, an agent powered by a large language model, that can autonomously explore and learn in the game Minecraft without human intervention.

2 Method: The Voyager AI agent uses a curriculum generated by GPT-4 to learn progressively harder tasks in Minecraft. It builds a skill library of action programs that can be reused and applied to new tasks, leading to better performance compared to other AI agents.

2.1 Automatic Curriculum: The Voyager system is made up of three parts: an automatic curriculum that suggests goals, a skill library for developing more advanced behaviors, and an iterative prompting mechanism that generates code for controlling the system. Details of these components can be found in section 2.1, 2.2, and 2.3 of the text.

2.2 Skill Library: The text describes the use of an automatic curriculum that leverages GPT-4 and GPT-3.5 to provide a steady stream of new tasks and challenges for embodied agents in open-ended environments. The curriculum adapts based on the agent's progress and current state, fostering curiosity-driven intrinsic motivation and the development of problem-solving strategies.

2.3 Iterative Prompting Mechanism: The text discusses the importance of a skill library in an automatic curriculum and how it can be used to improve and refine the performance of a AI model called GPT-4. The skill library contains executable code that helps complete specific tasks, and the AI model uses input prompts, control primitive APIs, and feedback to generate code and improve its performance.

3.1 Experimental Setup: The text describes an iterative prompting mechanism for self-improvement in program synthesis. The mechanism uses environment feedback, execution errors, and self-verification to refine code generation and acquire new skills without human intervention.

3.2 Baselines: The text describes the use of OpenAI's GPT-4 and GPT-3.5-turbo APIs for text completion and the text-embedding-ada-002 API for text embedding. They use a temperature of 0 except for the automatic curriculum, which uses a temperature of 0.1. They also use MineDojo and Mineflayer JavaScript APIs for motor controls in their simulation environment. Additional details can be found in the Appendix, Section B.1.

3.3 Evaluation Results: The authors of the text explain that they selected three representative algorithms as baselines for their Minecraft agent, which are ReAct, Reflexion, and AutoGPT. They also clarify that their work focuses on pushing the limits of GPT-4 for lifelong embodied agent learning, rather than solving 3D perception or sensorimotor control problems.

3.4 Ablation Studies: The article discusses an evaluation of the Voyager agent compared to other baselines in terms of exploration performance, tech tree mastery, map coverage, and zero-shot generalization to new tasks. Voyager demonstrates superior exploration, tech tree mastery, extensive map traversal, and efficient zero-shot generalization, with the help of a skill library.

3.5 Multimodal Feedback from Humans: The text discusses the impact of six design choices on the performance of Voyager, including automatic curriculum, skill library, environment feedback, execution errors, self-verification, and GPT-4 for code generation. The results show that automatic curriculum, skill library, and self-verification are crucial for the agent's progress, while GPT-4 significantly outperforms GPT-3.5 in code generation.

4 Limitations and Future Work: The current version of Voyager does not have visual perception capabilities, but it has the potential to be enhanced with multimodal perception models. The use of human feedback can help Voyager construct complex 3D structures in Minecraft by providing visual critique and breaking down tasks into smaller steps.

Decision-making Agents in Minecraft.: The GPT-4 API is significantly more expensive than GPT-3.5 and has some limitations, such as inaccuracies in generating correct skills and occasional hallucinations in proposing unachievable tasks or using invalid inputs. However, the authors express confidence that future improvements in GPT API models and open-source LLM finetuning techniques will address these limitations.

Large Language Models for Agent Planning.: Minecraft is a 3D world with versatile game mechanics that supports various activities. In this paper, the authors present Voyager, a learning algorithm that combines low-level controllers and high-level planners to enable open-ended exploration in Minecraft, using a bottom-up automatic curriculum driven by curiosity.

Code Generation with Execution.: The text discusses the increasing use of large language models (LLMs) in embodied agent research for planning purposes, specifically in the areas of robot learning and text agents. It identifies different approaches and tools that leverage LLMs for generating subgoals, executing policies, and enhancing reasoning, but highlights the need for a skill library to develop more complex behaviors in lifelong learning scenarios.

6 Conclusion: Code generation has long been a challenge in NLP, with various approaches using execution results to improve program synthesis. Voyager differentiates itself by integrating environment feedback, execution errors, and self-verification into an iterative prompting mechanism for embodied control.

7 Acknowledgements: The article introduces Voyager, an advanced learning agent that uses GPT-4 to explore and learn in the world autonomously. It performs exceptionally well in discovering new things, navigating different terrains, and applying its learned skills to new tasks.

A.2 Prompting: The authors express gratitude to their colleagues and friends for their helpful feedback and discussions during the completion of this work, which was done during Guanzhi Wang's internship at NVIDIA. Guanzhi Wang is supported by the Kortschak fellowship at Caltech.

A.3.1 Components in the Prompt: GPT-4 and GPT-3.5 allow users to assign roles to each prompt message, such as a high-level instruction for the system, a detailed instruction for the user, and a response message generated by the model. This approach helps save token usage and replaces the need for multi-round conversations.

A.3.2 Additional Context: The input prompt for GPT-4 includes various components such as directives, the agent's state, nearby blocks and entities, chests, biome, time, health and hunger bars, position, and previous tasks. The chain-of-thought prompting is used to ask GPT-4 to assess the current progress and provide suggestions for the next task.

A.3.3 Warm-up Schedule: The text explains that the authors utilize GPT-3.5 to ask questions and retrieve relevant information from a wiki knowledge base. This external knowledge base is beneficial when GPT-3.5 is not pre-trained in a specific domain, such as Minecraft.

A.4.1 Components in the Prompt: The text describes a warm-up schedule used in training an agent, where the agent's prompt is gradually exposed to more information over time, starting with basic skills and progressing to more complex ones. The warm-up setting used in the experiments is presented in Table A.1.

A.5.1 Components in the Prompt: The text describes the input prompt for GPT-4, including guidelines for code generation and control primitive APIs implemented by the system. The control primitive APIs include functions for exploration, mining, crafting, placing blocks, smelting, attacking mobs, interacting with chests, and various actions related to movement and manipulation of objects in the game.

B.1 Experimental Setup: The text provides a summary of different functions and features of a bot, including activating items and using items on entities in the game. It also mentions various aspects of the agent's state, execution errors, and the task context, as well as a chain-of-thought prompting method used with GPT-4.

B.2 Baselines: The input prompt for GPT-4 includes the agent's state, the task proposed by the automatic curriculum, the task context, chain-of-thought prompting, and few-shot examples for in-context learning.

B.3 Ablations: The simulation environment is based on MineDojo and uses Mineflayer JavaScript APIs for motor controls. The bot is designed to provide environment feedback, handle exceptions, and continue execution even after dying, preserving its inventory and recycling certain objects. For more information, refer to the codebase.

B.4.1 Significantly Better Exploration: ReAct, Reflexion, and AutoGPT are software tools that use code generation and refinement to automate tasks. ReAct and Reflexion focus on generating actions based on observations, while AutoGPT decomposes tasks into subgoals and executes them. All three tools aim to explore the world and collect as many items as possible.

B.4.2 Extensive Map Traversal: The text discusses the impact of removing certain design choices in the Voyager project, such as manual curriculum, skill library, environment feedback, execution errors, self-verification, and using GPT-3.5 instead of GPT-4 for code generation. The study analyzes how these changes affect the performance of exploration in Voyager.

B.4.3 Efficient Zero-Shot Generalization to Unseen Tasks: In Figure 1, the meaning of each icon is explained in Figure A.1. There were three trials for each method, and the items collected by Voyager in each trial are listed, followed by the items collected by ReAct and Reflexion in their respective trials.

결과적으로 제가 직접 논문을 읽었을 때보다 논문에 대해 얻게 되는 정보가 한정적이어서 본문을 읽는 게 낫겠단 생각이 계속 들었습니다. 좀 더 발전시켜볼까란 생각이 들지만, 표나 그림이 없으니 글만으로는 온전히 이해하기가 힘들었습니다. 이 정도 수준이면 차라리 perplexity.ai 검색 결과가 더 낫겠단 생각도 들었네요.

⏰ 가장 빠르게 AI를 배우는 곳 | 지피터스 AI스터디 17기 🚀

논문 요약 삽질기

Introduction

👉 이 게시글도 읽어보세요