🧪PI‑0.5
1. What is PI0.5 / PI05
PI0.5 (PI05) extends PI0 toward a Vision-Language-Action (VLA) policy:
visual backbone + language component (LeRobot variant uses Paligemma),
input: natural-language instruction (e.g., “put the yellow book on the wooden tray”) + observation history,
output: low-level manipulator actions.
Hugging Face:
lerobot/pi05_libero_finetuned— PI05 fine-tuned on LIBERO tasks (table/books/shelves, etc.).
For KNX, this is closer to a single policy handling multiple tasks in a rich scene via natural language.
2. Environment prep (same base as PI0)
If you’ve followed PI0 setup:
condaenvlerobotready,LeRobot installed.
Additional packages:
EGL/GL for headless rendering:
sudo apt install -y \
libgl1-mesa-glx \
libglib2.0-0 \
libosmesa6 \
xvfbCMake for
hf-egl-probe:
3. LIBERO assets
LeRobot can create LIBERO envs via its factory, and assets download on-demand from Hugging Face. Quick check:
You should see asset download logs and env init messages.
4. Paligemma access (Hugging Face token)
PI05 uses Paligemma (google/paligemma-3b-pt-224), a gated repo. Without a token, you’ll get:
GatedRepoError: 401 Client Error. Cannot access gated repo google/paligemma-3b-pt-224
Steps:
Create a fine-grained HF token with “Read” permissions.
Login on the server:
Paste the token; confirm saving. After this, PI05 and Paligemma can be fetched.
5. Smoke test: eval_pi05_libero_live.py
eval_pi05_libero_live.pyCreate /data/lerobot/eval_pi05_libero_live.py:
Run:
Expected:
lerobot/pi05_libero_finetuneddownloads,Paligemma downloads (with token),
LIBERO scene runs; episodes print success flags.
6. KNX demo script: PI05 + LIBERO + record video
Save /data/lerobot/pi05_libero_run_and_record.py:
Run:
Use the resulting pi05_libero_task_<id>.mp4 in KNX docs/demos for Text2Action.
7. Live viewing via VNC/desktop (optional)
If you prefer live viewing in a headless environment, run an X server + VNC (pattern below), then launch your scripts inside the desktop session:
Connect via VNC to server_ip:5900 and run your eval scripts in a terminal inside the desktop.
8. Pros and cons for KNX
Pros:
True VLA: natural-language instruction + visual observation → actions.
Ready
pi05_libero_finetunedfor multi-step object-centric LIBERO tasks.Aligns with KNX goals: one generalist policy across tasks; can adapt via fine-tuning or action-space constraints.
Cons:
Requires gated Paligemma access (HF token).
Heavier than PI0; A10/A100 preferred over smaller GPUs.
Depends on robust MuJoCo + robosuite + LIBERO + EGL setup; headless can be finicky.
Optimized for LIBERO formats; porting to custom KNX scenes requires observation mapping and careful instruction design.
Last updated

