Visual Causal Chain Bookmarking
[MSc Dissertation — In Progress] A novel credit assignment method for long-horizon agentic RL. VCBM automatically identifies causal "bookmarks" from HTTP state-change events, redistributing gradient credit to causally significant steps — achieving up to 98% rollout overhead reduction vs. naive baselines.
Key Features
Core technologies and system features.
Causal Bookmark Detection
Automatically identifies causally significant steps via HTTP state-change events (POST/PUT/DELETE/PATCH with status < 400). Mathematically equivalent to full environment state-diff detection under REST semantics — zero accuracy loss.
VCC Credit Redistribution
Loss weighting: L_VCC = -(Σ_{t∈B} L_t^PPO + α·Σ_{t∉B} L_t^PPO) / (|B| + α(T−|B|)). Setting α=1 recovers standard LOOP exactly. α=0.1 used in experiments. Implemented on top of Apple ML Research's LOOP framework with Qwen2.5-1.5B-Instruct + LoRA rank-8.
AppWorld Benchmark
Evaluated on AppWorld (ACL 2024) — a multi-app digital assistant benchmark with 750+ tasks across Spotify, Gmail, Calendar and 10+ APIs. Agent generates Python code to interact with a sandboxed app environment. Tasks require 5–20+ API calls with sparse rewards.
Hardware & Scale
Training on NVIDIA RTX 4080 (15.57 GiB VRAM), 1.55B parameter model (Qwen2.5-1.5B-Instruct), LoRA fine-tuning (9.23M trainable params, 0.59%). vLLM V1 inference engine, FSDP2 single-GPU training. 100-iteration VCBM run completed; scaling to 3.5B on UoM CSF cluster (ticket RITM0104892 approved).
Timing Breakthrough
Root cause analysis revealed rollout collection accounts for 98% of VCBM's overhead vs LOOP (1500s vs 88s per 50 steps). Fix 1 — HTTP bookmark detector added inline to execute_with_bookmark() — projects to reduce get_rollouts from 1500s → ~75–100s, making VCBM ~8% faster than LOOP overall.
Performance Graphs
Visualizations of model performance and results across experiments.

avg_return — Learning Curve
Average episode return across 100 training iterations on AppWorld. Starts ~0.10, trends upward with high variance (0.05–0.45). Mean ≈ 0.27 — consistent with sparse-reward long-horizon task difficulty on Qwen2.5-1.5B + LoRA rank-8.

grad_norm — Training Stability
Per-step gradient norms (spiky, 0–0.25) vs smoothed mean (stabilising from 0.13 → 0.10). Decreasing mean confirms healthy PPO training. max_grad_norm=0.1 clipping prevents divergence. n_high_kl cumulative: only 2 over 200 steps.

VCBM vs LOOP — Timing Analysis
Root cause analysis: get_rollouts accounts for 98% of VCBM's overhead over LOOP (1500s vs 88s per 50 steps). All other phases are 8–16% faster in VCBM. Fix 1 (inline HTTP bookmark detector) projects 93% rollout reduction → VCBM ~8% faster than LOOP overall.

GPU Memory Profile — RTX 4080
Peak VRAM per training phase. model_backward hits 14.11 GB — within the 15.57 GB RTX 4080 limit with headroom for inference. FSDP2 single-GPU mode with bf16 precision. LoRA rank-8 keeps trainable params to 9.23M (0.59%) enabling dual inference+training on one GPU.
or
Click
Tap Arrows
Project Source Code
Explore the primary logical modules.
1# phi_agents/appworld/interface.py2def execute_with_bookmark(self, code: str) -> tuple[str, bool]:3 """4 Wraps execute() to detect causal bookmarks via HTTP state changes.5 A step is a bookmark iff it triggered a successful state-mutating6 API call (POST/PUT/DELETE/PATCH with status < 400).7 8 Mathematically equivalent to full state-diff detection under 9 REST semantics (Proposition 2 in dissertation).10 Zero accuracy loss. Zero additional API calls.11 """12 n_before = len(self.requester.request_tracker.requests)13 result = self.execute(code)14 new_requests = self.requester.request_tracker.requests[n_before:]15 16 is_bookmark = any(17 req.method in ("POST", "PUT", "DELETE", "PATCH")18 and req.status_code is not None19 and req.status_code < 40020 for req in new_requests21 )22 return result, is_bookmarkLive Simulation Output
Simulated console execution.
Source Code
GitHub repositories for this project.