Recursive Knowledge Crystallization: A Framework for Persistent Autonomous Agent Self-Evolution

Gists

fig1a

Abstract

In the development of autonomous agents using Large Language Models (LLMs), restrictions such as context window limits and session fragmentation pose significant barriers to the long-term accumulation of knowledge. This study proposes a “self-evolving framework” where an agent continuously records and refines its operational guidelines and technical knowledge—referred to as its SKILL—directly onto a local filesystem in a universally readable format (Markdown). By conducting experiments across two distinct environments featuring opaque constraints and complex legacy server rules using Google’s Antigravity and Gemini CLI, we demonstrate the efficacy of this framework. Our findings reveal that the agent effectively evolves its SKILL through iterative cycles of trial and error, ultimately saturating its learning. Furthermore, by transferring this evolved SKILL to a completely clean environment, we verify that the agent can successfully implement complete, flawless client applications in a single attempt (zero-shot generation). This methodology not only circumvents the limitations of short-term memory dependency but also pioneers a new paradigm for cross-environment knowledge portability and automated system analysis.

The infographic of this article is as follows.

fig1d

1. Introduction

One of the primary challenges in the deployment of autonomous AI agents is the persistence of learning. Traditionally, when an agent’s execution environment is reset or its context window is cleared, it is forced to relearn task specifications from scratch.

In the realm of autonomous agents, enabling long-term memory and self-correction has been a focal point of recent research. Notable frameworks such as Reflexion (Shinn et al., 2023) endow agents with dynamic memory to self-reflect and refine reasoning through trial and error. Similarly, Voyager (Wang et al., 2023) introduces an open-ended embodied agent in Minecraft that utilizes an ever-growing executable code skill library for lifelong learning. Another significant approach is MemGPT (Packer et al., 2023), which treats LLMs as operating systems, employing hierarchical memory management to bypass context window limitations.

However, these existing methodologies predominantly rely on in-memory contexts, specialized vector databases, or internal virtual memory abstractions. This makes the acquired knowledge difficult to decouple from the specific agent instance and challenging for human developers to read, audit, or directly manage.

In contrast, our proposed framework introduces a paradigm shift: the continuous self-rewriting of an agent’s “SKILL” residing entirely on a local filesystem (SKILL.md). By utilizing standard development toolchains (Antigravity and Gemini CLI), the agent investigates unknown environments and automatically maintains its own operational manual. This approach offers a distinct advantage—the physical persistence of knowledge in a universally readable Markdown format. It enables not only continuous learning but also “Zero-Shot Knowledge Transfer.” We empirically prove that once the SKILL evolves and saturates in one environment (Antigravity), it can be loaded into a completely clean environment (Gemini CLI) to synthesize complex code perfectly on the first attempt without any trial and error.

2. Experimental Procedure

In this framework, SKILL refers to Agent Skill, a standardized set of executable capabilities and knowledge markers that define an agent’s operational boundaries. As defined by the Agent Skill community (https://agentskills.io/home), these skills allow for modular, portable, and verifiable agentic functions. Our framework treats the SKILL.md file as a living repository of these Agent Skills.

To demonstrate the continuous evolution of the SKILL and its application, we designed two distinct experiments. In both cases, Antigravity was utilized for the iterative evolution of the SKILL, followed by a demonstration using the Gemini CLI for zero-shot script completion.

2.1. Experiment 1: The “Silent Bureaucrat” Blind Incremental Evolution

Workflow

  1. Start the Server.
  2. Launch the Antigravity agent.
  3. Execute the prompt to develop the Client script.
  4. The AI agent updates the SKILL based on failures or newly discovered rules.
  5. The Client script is reset to its initial blank state (Tabula Rasa). Keeping the exact same prompt, steps 3 and 4 are looped. When the agent hits the “Limit reached” block, it is forced to terminate the cycle, but its SKILL persists. We performed 10 cycles until the evolution of the SKILL was saturated and success was achieved.
  6. The fully evolved SKILL is transferred to a clean environment. Using the Gemini CLI, we confirm that the Client script is completed perfectly in a single prompt (zero-shot).

fig1b

Mermaid Chart Playground

2.2. Experiment 2: Chaos Server Constraints & Architectural Pattern Acquisition

Workflow

  1. Start the Server.
  2. Launch the Antigravity agent.
  3. Execute the prompt to develop specific features of the Client script.
  4. The AI agent updates the SKILL, documenting both the discovered constraints and higher-level architectural implementation guidelines.
  5. The Client script is not reset. Instead, the prompt is progressively changed in each cycle to command the development of new features, building upon the existing script. This process continues until the planned evolution of the application is complete (5 cycles).
  6. The fully evolved SKILL is transferred to a clean environment. Using the Gemini CLI, the client-app.js is reset to a blank state, and a single, comprehensive prompt is given to build the entire SDK. We confirm that the complete application is generated flawlessly on the first attempt (zero-shot).

fig1c

Mermaid Chart Playground

3. Results and Discussions

3.1. Experiment 1 Results & Discussion

Table 1: Experiment 1 Data Summary

Cycle Success Mod. Count Interaction Count Skill Size (Bytes) State
1 False 5 5 2666 Exploration (Fail)
2 False 6 6 2963 Exploration (Fail)
3 False 3 4 2963 Exploration (Fail)
4 False 4 5 3474 Exploration (Fail)
5 True 3 3 3654 Breakthrough
6 True 1 1 3654 Convergence
7 True 2 2 3654 Stable
8 True 2 2 3728 Stable
9 True 1 1 3728 Full Convergence
10 True 1 1 3728 Full Convergence

fig2a

Figure 1: The inverse relationship between trial-and-error reduction and knowledge accumulation for Experiment 1

Zero-Shot Verification: Utilizing the completely evolved SKILL from cycle 10, the Gemini CLI successfully generated the functional client code on the very first attempt.

fig2b

Figure 2: Zero-shot script completion in Gemini CLI using the evolved SKILL from Experiment 1.

Discussion:

In Experiment 1, the agent operated under a severe “blind” constraint. During cycles 1 through 4, the agent repeatedly hit the hidden error limit and was forcefully blocked. In traditional LLM agent execution, resetting the script here would mean starting from zero. However, because the agent accurately recorded the fragments of rules it discovered into SKILL.md before “dying” in each cycle, the subsequent generations inherited this knowledge. Cycle 5 represents the critical threshold where the accumulated knowledge was sufficient to bypass all 20 rules without hitting the error limit. The convergence to a modification count of 1 in later cycles, and the successful zero-shot execution in Gemini CLI, definitively prove that the agent effectively offloaded its working memory to the local filesystem, rendering the task independent of the model’s immediate context window.

3.2. Experiment 2 Results & Discussion

Table 2: Experiment 2 Data Summary

Cycle Success Mod. Count Interaction Count Skill Size (Bytes) Learning Content
1 True 1 2 2013 Identity, Content-Type identification
2 True 3 4 2407 Secure/Admin constraints (Riddles)
3 True 2 1 2634 Audit (Base64 signature)
4 True 1 0 2634 Avoid Session Block
5 True 2 0 2957 Refactoring & Guidelines

fig3a

Figure 3: The learning curve for Experiment 2

Zero-Shot Verification: Utilizing the evolved SKILL from cycle 5, the Gemini CLI successfully built the entire, complex SDK from a blank file in a single pass without prior context.

fig3b

Figure 4: Zero-shot script completion in Gemini CLI using the evolved SKILL from Experiment 2.

Discussion:

While Experiment 1 showcased constraint discovery, Experiment 2 highlighted the agent’s capability for architectural self-organization. The server presented not just rigid rules, but semantic puzzles (e.g., reversing strings, encoding to Base64). The agent successfully decoded these and recorded the solutions. More importantly, when faced with the “Session Block” error in Cycle 4, the agent realized that static headers were insufficient. By Cycle 5, the agent did not just fix the error; it abstracted the solution into a “Centralized Header Factory” pattern. This transition from reactive error-fixing to proactive software architecture design is clearly reflected in the drop of interaction counts to 0 in later cycles. The zero-shot success in Gemini CLI demonstrates that the agent can extract best practices from messy legacy systems and output clean, robust SDKs instantly.

3.3. Analysis of SKILL.md Evolution Before and After

Analyzing the physical changes in the SKILL.md files provides deep insight into the agent’s cognitive process. For the full, unredacted text of the SKILL.md files and client scripts before and after evolution for both experiments, please refer to Appendix A and Appendix B.

Experiment 1 (Silent Bureaucrat)

Experiment 2 (Chaos Server)

3.4. Comprehensive Discussion on Significance and Novelty

The synthesis of these two experiments reveals a highly potent framework for AI-driven software engineering. The traditional dependency on a single continuous context window is fundamentally fragile; when the session ends, the learning dies (Catastrophic forgetting).

By enforcing the physical writing of knowledge to a Markdown file (SKILL.md), we achieved persistent meta-learning. The significance of using Markdown over specialized vector databases is twofold:

  1. Human Readability and Auditability: Developers can read, review, and manually correct the SKILL.md file, fostering a true Human-AI collaborative environment.
  2. Cross-Platform Portability (Zero-Shot Knowledge Transfer): As demonstrated by successfully moving the evolved SKILL from Antigravity to Gemini CLI, knowledge is no longer locked into a specific agent instance. An agent can spend days investigating a system, and the resulting SKILL can be instantly deployed to a fleet of worker agents, granting them immediate “senior-level” expertise without any required training or trial-and-error phases.

4. Specific Applications Utilizing This Approach

Based on our findings, this framework can be highly impactful in the following real-world scenarios:

  1. Automated Reverse Engineering & SDK Generation: When integrating with undocumented or chaotic legacy API systems, an agent can be deployed to blindly probe the endpoints. It will autonomously decode the hidden rules, generating both a human-readable API specification (the SKILL) and robust client SDKs.
  2. Organizational Tacit Knowledge Documentation: In many organizations, “tribal knowledge” dictates how to deploy or test specific systems (e.g., “always wait 5 seconds before hitting this endpoint”). By having an agent execute tasks and fail, it can discover and formalize this tacit knowledge into universally accessible Markdown files, acting as an automated technical writer.
  3. Robust E2E Automated Test Suites: UI and API specifications change frequently, breaking tests. An agent equipped with this framework can dynamically update testing protocols in its SKILL as it encounters new failures, acting as a self-healing testing mechanism that minimizes maintenance overhead.
  4. AI Onboarding and Scaling: A mature SKILL file cultivated by an exploratory agent can be instantly copied to the local workspaces of newly deployed AI agents (or human engineers). This instantly transfers “experience,” allowing for rapid scaling of development resources.

5. Strategies for Agent Skill Evolution

To ensure the continuous enhancement of Agent Skills within this framework, we identify two primary evolutionary paths:

  1. Targeted Skill Evolution: When the goal is to refine a specific capability, the agent is instructed via an Appendix in the SKILL.md file. Upon successful problem resolution, the agent triggers a targeted update—adding, deleting, or modifying entries—to that specific skill definition.
  2. Holistic Skill Evolution: To evolve the agent’s entire skill set simultaneously, evolution instructions are embedded directly into global context files (e.g., GEMINI.md). This compels the agent to evaluate and update the relevant SKILL.md files immediately after any successful task execution across its entire operational spectrum.

6. Summary

7. Appendix

The sample scripts, SKILL.md, and prompts can be seen at https://gist.github.com/tanaikech/966f83cc438b6077b05b9843be09e930.

 Share!