🧪PI‑0.5

1. What is PI0.5 / PI05

PI0.5 (PI05) extends PI0 toward a Vision-Language-Action (VLA) policy:

  • visual backbone + language component (LeRobot variant uses Paligemma),

  • input: natural-language instruction (e.g., “put the yellow book on the wooden tray”) + observation history,

  • output: low-level manipulator actions.

Hugging Face:

  • lerobot/pi05_libero_finetuned — PI05 fine-tuned on LIBERO tasks (table/books/shelves, etc.).

For KNX, this is closer to a single policy handling multiple tasks in a rich scene via natural language.


2. Environment prep (same base as PI0)

If you’ve followed PI0 setup:

  • conda env lerobot ready,

  • LeRobot installed.

Additional packages:

  1. EGL/GL for headless rendering:

sudo apt install -y \
  libgl1-mesa-glx \
  libglib2.0-0 \
  libosmesa6 \
  xvfb
  1. CMake for hf-egl-probe:


3. LIBERO assets

LeRobot can create LIBERO envs via its factory, and assets download on-demand from Hugging Face. Quick check:

You should see asset download logs and env init messages.


4. Paligemma access (Hugging Face token)

PI05 uses Paligemma (google/paligemma-3b-pt-224), a gated repo. Without a token, you’ll get:

GatedRepoError: 401 Client Error. Cannot access gated repo google/paligemma-3b-pt-224

Steps:

  1. Create a fine-grained HF token with “Read” permissions.

  2. Login on the server:

Paste the token; confirm saving. After this, PI05 and Paligemma can be fetched.


5. Smoke test: eval_pi05_libero_live.py

Create /data/lerobot/eval_pi05_libero_live.py:

Run:

Expected:

  • lerobot/pi05_libero_finetuned downloads,

  • Paligemma downloads (with token),

  • LIBERO scene runs; episodes print success flags.


6. KNX demo script: PI05 + LIBERO + record video

Save /data/lerobot/pi05_libero_run_and_record.py:

Run:

Use the resulting pi05_libero_task_<id>.mp4 in KNX docs/demos for Text2Action.


7. Live viewing via VNC/desktop (optional)

If you prefer live viewing in a headless environment, run an X server + VNC (pattern below), then launch your scripts inside the desktop session:

Connect via VNC to server_ip:5900 and run your eval scripts in a terminal inside the desktop.


8. Pros and cons for KNX

Pros:

  • True VLA: natural-language instruction + visual observation → actions.

  • Ready pi05_libero_finetuned for multi-step object-centric LIBERO tasks.

  • Aligns with KNX goals: one generalist policy across tasks; can adapt via fine-tuning or action-space constraints.

Cons:

  • Requires gated Paligemma access (HF token).

  • Heavier than PI0; A10/A100 preferred over smaller GPUs.

  • Depends on robust MuJoCo + robosuite + LIBERO + EGL setup; headless can be finicky.

  • Optimized for LIBERO formats; porting to custom KNX scenes requires observation mapping and careful instruction design.

Last updated