Visual Causal Chain Bookmarking

[MSc Dissertation — In Progress] A novel credit assignment method for long-horizon agentic RL. VCBM automatically identifies causal "bookmarks" from HTTP state-change events, redistributing gradient credit to causally significant steps — achieving up to 98% rollout overhead reduction vs. naive baselines.

RLDissertation
# features

Key Features

Core technologies and system features.

Causal Bookmark Detection

Automatically identifies causally significant steps via HTTP state-change events (POST/PUT/DELETE/PATCH with status < 400). Mathematically equivalent to full environment state-diff detection under REST semantics — zero accuracy loss.

VCC Credit Redistribution

Loss weighting: L_VCC = -(Σ_{t∈B} L_t^PPO + α·Σ_{t∉B} L_t^PPO) / (|B| + α(T−|B|)). Setting α=1 recovers standard LOOP exactly. α=0.1 used in experiments. Implemented on top of Apple ML Research's LOOP framework with Qwen2.5-1.5B-Instruct + LoRA rank-8.

AppWorld Benchmark

Evaluated on AppWorld (ACL 2024) — a multi-app digital assistant benchmark with 750+ tasks across Spotify, Gmail, Calendar and 10+ APIs. Agent generates Python code to interact with a sandboxed app environment. Tasks require 5–20+ API calls with sparse rewards.

Hardware & Scale

Training on NVIDIA RTX 4080 (15.57 GiB VRAM), 1.55B parameter model (Qwen2.5-1.5B-Instruct), LoRA fine-tuning (9.23M trainable params, 0.59%). vLLM V1 inference engine, FSDP2 single-GPU training. 100-iteration VCBM run completed; scaling to 3.5B on UoM CSF cluster (ticket RITM0104892 approved).

Timing Breakthrough

Root cause analysis revealed rollout collection accounts for 98% of VCBM's overhead vs LOOP (1500s vs 88s per 50 steps). Fix 1 — HTTP bookmark detector added inline to execute_with_bookmark() — projects to reduce get_rollouts from 1500s → ~75–100s, making VCBM ~8% faster than LOOP overall.

# graphs

Performance Graphs

Visualizations of model performance and results across experiments.

avg_return over 100 training iterations

avg_return — Learning Curve

Average episode return across 100 training iterations on AppWorld. Starts ~0.10, trends upward with high variance (0.05–0.45). Mean ≈ 0.27 — consistent with sparse-reward long-horizon task difficulty on Qwen2.5-1.5B + LoRA rank-8.

Gradient norm and gradient norm mean over training

grad_norm — Training Stability

Per-step gradient norms (spiky, 0–0.25) vs smoothed mean (stabilising from 0.13 → 0.10). Decreasing mean confirms healthy PPO training. max_grad_norm=0.1 clipping prevents divergence. n_high_kl cumulative: only 2 over 200 steps.

VCBM vs LOOP phase-by-phase timing comparison

VCBM vs LOOP — Timing Analysis

Root cause analysis: get_rollouts accounts for 98% of VCBM's overhead over LOOP (1500s vs 88s per 50 steps). All other phases are 8–16% faster in VCBM. Fix 1 (inline HTTP bookmark detector) projects 93% rollout reduction → VCBM ~8% faster than LOOP overall.

Peak GPU memory per training phase on RTX 4080

GPU Memory Profile — RTX 4080

Peak VRAM per training phase. model_backward hits 14.11 GB — within the 15.57 GB RTX 4080 limit with headroom for inference. FSDP2 single-GPU mode with bf16 precision. LoRA rank-8 keeps trainable params to 9.23M (0.59%) enabling dual inference+training on one GPU.

01/04
Swipe or
Tap Arrows
# source

Project Source Code

Explore the primary logical modules.

EXPLORER
HTTP Bookmark Detector (Fix 1)
srcHTTP Bookmark Detector (Fix 1)
1# phi_agents/appworld/interface.py
2def execute_with_bookmark(self, code: str) -> tuple[str, bool]:
3 """
4 Wraps execute() to detect causal bookmarks via HTTP state changes.
5 A step is a bookmark iff it triggered a successful state-mutating
6 API call (POST/PUT/DELETE/PATCH with status < 400).
7
8 Mathematically equivalent to full state-diff detection under
9 REST semantics (Proposition 2 in dissertation).
10 Zero accuracy loss. Zero additional API calls.
11 """
12 n_before = len(self.requester.request_tracker.requests)
13 result = self.execute(code)
14 new_requests = self.requester.request_tracker.requests[n_before:]
15
16 is_bookmark = any(
17 req.method in ("POST", "PUT", "DELETE", "PATCH")
18 and req.status_code is not None
19 and req.status_code < 400
20 for req in new_requests
21 )
22 return result, is_bookmark
# simulation

Live Simulation Output

Simulated console execution.

Outputs
Training Run — Iteration 194–200
$_
# repositories

Source Code

GitHub repositories for this project.