Recently, I came across a fascinating paper titled “Automatic Chain-of-Thought Prompting”. You can read it here. The work explores how large language models can be guided to reason step by step automatically, without requiring manually crafted examples. This is an exciting development because it makes structured reasoning more accessible and scalable, moving us closer to AI systems that can solve complex problems with minimal human intervention.
There are two common CoT paradigms. One uses a short trigger (for example, “Let’s think step by step”) to nudge the model into a stepwise reasoning mode before answering a question. The other uses a few manual demonstrations, each composed of a question and a reasoning chain that leads to an answer. The former is easy to use, but the latter tends to be more effective, though it requires manual effort to craft high-quality examples. To eliminate this manual effort, LLMs can be used to generate reasoning chains. However, these generated chains often contain mistakes. To mitigate this issue, a technique called Auto-CoT is used.
Auto-CoT consists of two main stages:
Question Clustering: A question bank is sampled that contains a diverse set of questions, each with a single correct answer. Each question is converted into an encoding using Sentence-BERT. The question representations are then processed by the k-means clustering algorithm to produce k clusters. For each cluster i, the questions are sorted into a list in ascending order of their distance from the cluster center.
Demonstration Sampling: A representative question is selected from each cluster, and its reasoning chain is generated using Zero-Shot-CoT with simple heuristics. For example, a heuristic might prioritize shorter questions with shorter thought chains. Once this step is finished, there will be k constructed demonstrations. Each demonstration is a tuple consisting of a question, its reasoning chain, and the corresponding answer. These constructed demonstrations are then used for in-context learning and fed to LLMs to obtain reasoning chains with answers for a given question.
Points to Keep in Mind:
1. With the rapid development of foundational models, CoT or its variations may not be necessary in the future.
2. Maintenance overhead due to many moving parts.
3.This clustering-based sampling method can be considered as diversity-based, which is in sharp contrast to similarity-based. if we took each demonstration as a kind of skill, diverse demonstrations seem to cover more alternative skills for solving target questions.
4. Auto-CoT may not work in situations where the required logic is not sequential.
5. This technique may be more helpful for smaller, less powerful domain specific models.
-
Unlocking Smarter AI Reasoning with Automatic Chain-of-Thought Prompting
-
Exploring DeepMind’s Tree of Thoughts: A New Approach to LLM Reasoning
Recently, I came across an interesting paper from Google DeepMind introducing the concept of Tree of Thoughts (ToT). You can read the full paper here. Unlike the traditional Chain of Thought approach, which follows a single linear reasoning path, the Tree of Thoughts framework treats problem-solving as a search through multiple possible reasoning paths. This opens up the ability to explore alternatives, evaluate them, and backtrack when necessary—bringing the strengths of classical search algorithms like A*, Beam Search, and even combinations of BFS/DFS into the world of large language models.
ToT frames any problem as a search over a tree and consists of four components:
1. Thought decomposition:
ToT leverages problem properties to design and decompose intermediate thought steps. It sets the boundaries for the thoughts that the generator produces.
2. Thought Generation:
Explore multiple candidates at each step. When generating a new thought for the current state, you generally provide the model with all previous thoughts along the path leading to the current state.
3. State evaluator:
Each path is scored by the LLM based on how promising they are. We can use the LLM as a judge here!
4. Search algorithm:
We have the top-k most promising candidates based on their evaluation scores. We can use a heap-like data structure to select these top-k candidates.
For some problems, the search space can grow exponentially which could contribute to high latency and cost due to multiple 3rd-party LLM requests. Implementing ToT reliably is a challenge in itself as there are many moving parts.
Reference: https://lnkd.in/d8WFQ8TJ