Nirav-Madhani · April 3, 2026 03:42
diff --git a/task_rl_colab.ipynb b/task_rl_colab.ipynb
 {
 "nbformat": 4,
 "nbformat_minor": 0,
 "metadata": {
  "colab": {
   "provenance": [],
   "gpuType": "A100"
  },
  "kernelspec": {
   "name": "python3",
   "display_name": "Python 3"
  },
  "language_info": {
   "name": "python"
  },
  "accelerator": "GPU"
 },
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Task RL Training (GRPO on GSM8K)\n",
    "\n",
    "**Step 8** of the meditation training pipeline. Trains the model to solve math problems using GRPO with binary correctness reward.\n",
    "\n",
    "The model still uses its meditation ability (from Step 7) — it meditates then solves.\n",
    "\n",
    "## Setup\n",
    "1. **GPU**: Change runtime to GPU (A100 recommended) via Runtime > Change runtime type\n",
    "2. **Secrets**: Add these in the Secrets panel (left sidebar):\n",
    "   - `HF_TOKEN` — HuggingFace token (read/write)\n",
    "3. **Run all cells** (Ctrl+F9)\n",
    "\n",
    "**No judge needed** — reward is purely programmatic (answer matches GSM8K ground truth).\n",
    "\n",
    "Auto-resumes from latest HF checkpoint. Checkpoints upload to HF every 10 steps.\n",
    "\n",
    "**Model**: LFM2.5-1.2B-Thinking (SFT + Meditation RL + fresh LoRA for Task RL)\n",
    "**Dataset**: GSM8K train (7473 problems)\n",
    "**Reward**: Binary correctness (1.0 if correct, 0.0 if wrong)\n",
    "**Repo**: Nirav-Madhani/LFM2.5-1.2B-Meditation"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "# Cell 1: Load secrets\n",
    "from google.colab import userdata\n",
    "import os\n",
    "\n",
    "os.environ['HF_TOKEN'] = userdata.get('HF_TOKEN')\n",
    "\n",
    "# Optional: Gemini key (not needed for Task RL, but set if available)\n",
    "try:\n",
    "    os.environ['GEMINI_PAID_KEY'] = userdata.get('GEMINI_PAID_KEY')\n",
    "except Exception:\n",
    "    pass\n",
    "\n",
    "print('Secrets loaded')\n",
    "print(f'HF_TOKEN: {os.environ[\"HF_TOKEN\"][:8]}...')"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "# Cell 2: Check GPU\n",
    "import torch\n",
    "if torch.cuda.is_available():\n",
    "    name = torch.cuda.get_device_name(0)\n",
    "    vram = torch.cuda.get_device_properties(0).total_memory / 1024**3\n",
    "    print(f'GPU: {name} ({vram:.1f} GB)')\n",
    "else:\n",
    "    raise RuntimeError('No GPU! Change runtime type: Runtime -> Change runtime type -> GPU')"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "# Cell 3: Download training script from HuggingFace\n",
    "!pip install -q huggingface_hub\n",
    "\n",
    "from huggingface_hub import hf_hub_download\n",
    "from pathlib import Path\n",
    "\n",
    "WORK_DIR = Path('/content/meditation')\n",
    "WORK_DIR.mkdir(parents=True, exist_ok=True)\n",
    "\n",
    "HF_REPO = 'Nirav-Madhani/LFM2.5-1.2B-Meditation'\n",
    "HF_TOKEN = os.environ['HF_TOKEN']\n",
    "\n",
    "script_path = hf_hub_download(\n",
    "    repo_id=HF_REPO,\n",
    "    filename='task-rl-training.py',\n",
    "    local_dir=WORK_DIR,\n",
    "    token=HF_TOKEN,\n",
    ")\n",
    "print(f'Training script: {script_path}')\n",
    "\n",
    "print('\\nFiles in work dir:')\n",
    "for f in sorted(WORK_DIR.rglob('*')):\n",
    "    if f.is_file():\n",
    "        print(f'  {f.relative_to(WORK_DIR)} ({f.stat().st_size/1024:.0f} KB)')"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "# Cell 4: Run Task RL training\n",
    "# Downloads SFT + meditation RL checkpoints from HF automatically\n",
    "# Dataset (GSM8K) is downloaded from HuggingFace Datasets\n",
    "# No judge API needed — reward is binary correctness\n",
    "\n",
    "!cd /content/meditation && python -u task-rl-training.py"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "# Cell 5 (optional): Check GPU memory\n",
    "import torch\n",
    "if torch.cuda.is_available():\n",
    "    alloc = torch.cuda.memory_allocated() / 1024**3\n",
    "    total = torch.cuda.get_device_properties(0).total_memory / 1024**3\n",
    "    print(f'GPU Memory: {alloc:.1f} / {total:.1f} GB ({alloc/total*100:.0f}%)')"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "# Cell 6 (optional): List checkpoints on HuggingFace\n",
    "from huggingface_hub import HfApi\n",
    "api = HfApi(token=os.environ['HF_TOKEN'])\n",
    "files = api.list_repo_files('Nirav-Madhani/LFM2.5-1.2B-Meditation')\n",
    "ckpts = sorted([f for f in files if 'task_rl' in f and 'checkpoint' in f])\n",
    "print(f'Task RL checkpoints on HF ({len(ckpts)}):')\n",
    "for c in ckpts:\n",
    "    print(f'  {c}')"
   ],
   "execution_count": null,
   "outputs": []
  }
 ]
 }
	{
	"nbformat": 4,
	"nbformat_minor": 0,
	"metadata": {
	"colab": {
	"provenance": [],
	"gpuType": "A100"
	},
	"kernelspec": {
	"name": "python3",
	"display_name": "Python 3"
	},
	"language_info": {
	"name": "python"
	},
	"accelerator": "GPU"
	},
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"# Task RL Training (GRPO on GSM8K)\n",
	"\n",
	"Step 8 of the meditation training pipeline. Trains the model to solve math problems using GRPO with binary correctness reward.\n",
	"\n",
	"The model still uses its meditation ability (from Step 7) — it meditates then solves.\n",
	"\n",
	"## Setup\n",
	"1. GPU: Change runtime to GPU (A100 recommended) via Runtime > Change runtime type\n",
	"2. Secrets: Add these in the Secrets panel (left sidebar):\n",
	" - `HF_TOKEN` — HuggingFace token (read/write)\n",
	"3. Run all cells (Ctrl+F9)\n",
	"\n",
	"No judge needed — reward is purely programmatic (answer matches GSM8K ground truth).\n",
	"\n",
	"Auto-resumes from latest HF checkpoint. Checkpoints upload to HF every 10 steps.\n",
	"\n",
	"Model: LFM2.5-1.2B-Thinking (SFT + Meditation RL + fresh LoRA for Task RL)\n",
	"Dataset: GSM8K train (7473 problems)\n",
	"Reward: Binary correctness (1.0 if correct, 0.0 if wrong)\n",
	"Repo: Nirav-Madhani/LFM2.5-1.2B-Meditation"
	]
	},
	{
	"cell_type": "code",
	"metadata": {},
	"source": [
	"# Cell 1: Load secrets\n",
	"from google.colab import userdata\n",
	"import os\n",
	"\n",
	"os.environ['HF_TOKEN'] = userdata.get('HF_TOKEN')\n",
	"\n",
	"# Optional: Gemini key (not needed for Task RL, but set if available)\n",
	"try:\n",
	" os.environ['GEMINI_PAID_KEY'] = userdata.get('GEMINI_PAID_KEY')\n",
	"except Exception:\n",
	" pass\n",
	"\n",
	"print('Secrets loaded')\n",
	"print(f'HF_TOKEN: {os.environ[\"HF_TOKEN\"][:8]}...')"
	],
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {},
	"source": [
	"# Cell 2: Check GPU\n",
	"import torch\n",
	"if torch.cuda.is_available():\n",
	" name = torch.cuda.get_device_name(0)\n",
	" vram = torch.cuda.get_device_properties(0).total_memory / 1024**3\n",
	" print(f'GPU: {name} ({vram:.1f} GB)')\n",
	"else:\n",
	" raise RuntimeError('No GPU! Change runtime type: Runtime -> Change runtime type -> GPU')"
	],
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {},
	"source": [
	"# Cell 3: Download training script from HuggingFace\n",
	"!pip install -q huggingface_hub\n",
	"\n",
	"from huggingface_hub import hf_hub_download\n",
	"from pathlib import Path\n",
	"\n",
	"WORK_DIR = Path('/content/meditation')\n",
	"WORK_DIR.mkdir(parents=True, exist_ok=True)\n",
	"\n",
	"HF_REPO = 'Nirav-Madhani/LFM2.5-1.2B-Meditation'\n",
	"HF_TOKEN = os.environ['HF_TOKEN']\n",
	"\n",
	"script_path = hf_hub_download(\n",
	" repo_id=HF_REPO,\n",
	" filename='task-rl-training.py',\n",
	" local_dir=WORK_DIR,\n",
	" token=HF_TOKEN,\n",
	")\n",
	"print(f'Training script: {script_path}')\n",
	"\n",
	"print('\\nFiles in work dir:')\n",
	"for f in sorted(WORK_DIR.rglob('*')):\n",
	" if f.is_file():\n",
	" print(f' {f.relative_to(WORK_DIR)} ({f.stat().st_size/1024:.0f} KB)')"
	],
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {},
	"source": [
	"# Cell 4: Run Task RL training\n",
	"# Downloads SFT + meditation RL checkpoints from HF automatically\n",
	"# Dataset (GSM8K) is downloaded from HuggingFace Datasets\n",
	"# No judge API needed — reward is binary correctness\n",
	"\n",
	"!cd /content/meditation && python -u task-rl-training.py"
	],
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {},
	"source": [
	"# Cell 5 (optional): Check GPU memory\n",
	"import torch\n",
	"if torch.cuda.is_available():\n",
	" alloc = torch.cuda.memory_allocated() / 1024**3\n",
	" total = torch.cuda.get_device_properties(0).total_memory / 1024**3\n",
	" print(f'GPU Memory: {alloc:.1f} / {total:.1f} GB ({alloc/total*100:.0f}%)')"
	],
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {},
	"source": [
	"# Cell 6 (optional): List checkpoints on HuggingFace\n",
	"from huggingface_hub import HfApi\n",
	"api = HfApi(token=os.environ['HF_TOKEN'])\n",
	"files = api.list_repo_files('Nirav-Madhani/LFM2.5-1.2B-Meditation')\n",
	"ckpts = sorted([f for f in files if 'task_rl' in f and 'checkpoint' in f])\n",
	"print(f'Task RL checkpoints on HF ({len(ckpts)}):')\n",
	"for c in ckpts:\n",
	" print(f' {c}')"
	],
	"execution_count": null,
	"outputs": []
	}
	]
	}
No results found