Visualization of Thought (VoT) prompting enables LLMs to achieve spatial reasoning by visualizing reasoning steps, significantly outperforming other techniques and showing LLMs can develop a "mind's eye".
VoT is used in an open-source "large action model" called Pi win-assistant that can control Windows UIs via natural language, similar to Anthropic's closed-source Claude.
This is similar to Anthropic's closed-source Claude model's ability to use software via language
Key quotes
"Visualization of thought prompting to elicit the mind's eye of LLMs for spatial reasoning"
"VoT prompting proposed in this paper consistently induces LLMs to visualize the reasoning steps and inform subsequent steps"
"The first open source large action model/generalist artificial narrow intelligence that controls complete human user interfaces only by using natural language"
This summary contains AI-generated information and may have important inaccuracies or omissions.