🧪PI‑0.5

1. What is PI0.5 / PI05

PI0.5 (PI05) extends PI0 toward a Vision-Language-Action (VLA) policy:

visual backbone + language component (LeRobot variant uses Paligemma),
input: natural-language instruction (e.g., “put the yellow book on the wooden tray”) + observation history,
output: low-level manipulator actions.

Hugging Face:

lerobot/pi05_libero_finetuned — PI05 fine-tuned on LIBERO tasks (table/books/shelves, etc.).

For KNX, this is closer to a single policy handling multiple tasks in a rich scene via natural language.

2. Environment prep (same base as PI0)

If you’ve followed PI0 setup:

conda env lerobot ready,
LeRobot installed.

Additional packages:

EGL/GL for headless rendering:

sudo apt install -y \
  libgl1-mesa-glx \
  libglib2.0-0 \
  libosmesa6 \
  xvfb

CMake for hf-egl-probe:

conda activate lerobot
pip install cmake
pip install hf-egl-probe

3. LIBERO assets

LeRobot can create LIBERO envs via its factory, and assets download on-demand from Hugging Face. Quick check:

conda activate lerobot
python - << 'EOF'
from libero.libero.envs import make as make_libero
env = make_libero(
    suite_names=["libero_spatial"],
    task_ids=[0],
    obs_type="image",
)
obs = env.reset()
print("Libero env ok, obs keys:", obs.keys())
EOF

You should see asset download logs and env init messages.

4. Paligemma access (Hugging Face token)

PI05 uses Paligemma (google/paligemma-3b-pt-224), a gated repo. Without a token, you’ll get:

GatedRepoError: 401 Client Error. Cannot access gated repo google/paligemma-3b-pt-224

Steps:

Create a fine-grained HF token with “Read” permissions.
Login on the server:

conda activate lerobot
hf auth login

Paste the token; confirm saving. After this, PI05 and Paligemma can be fetched.

5. Smoke test: `eval_pi05_libero_live.py`

Create /data/lerobot/eval_pi05_libero_live.py:

# file: eval_pi05_libero_live.py
import torch

from lerobot.envs.factory import make_env
from lerobot.policies.pi05.policy import PI05Policy


def main():
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    env = make_env(
        env_type="libero",
        suite_names=["libero_spatial"],
        task_ids=[2],  # example task
        obs_type="image",
        render_mode="rgb_array",
    )

    policy = PI05Policy.from_pretrained("lerobot/pi05_libero_finetuned")
    policy.to(device)
    policy.eval()

    num_episodes = 3
    for ep in range(num_episodes):
        obs, info = env.reset()
        done = False
        step_idx = 0
        instruction = info.get("lang_goal", None)
        print(f"Episode {ep}, instruction: {instruction}")
        while not done:
            with torch.no_grad():
                action = policy.act(
                    obs=obs,
                    instruction=instruction,
                    device=device,
                )
            obs, reward, terminated, truncated, info = env.step(action)
            done = bool(terminated or truncated)
            step_idx += 1
        print(f"Episode {ep} finished, steps={step_idx}, success={info.get('is_success', False)}")


if __name__ == "__main__":
    main()

Run:

cd /data/lerobot
conda activate lerobot
python eval_pi05_libero_live.py

Expected:

lerobot/pi05_libero_finetuned downloads,
Paligemma downloads (with token),
LIBERO scene runs; episodes print success flags.

6. KNX demo script: PI05 + LIBERO + record video

Save /data/lerobot/pi05_libero_run_and_record.py:

# file: pi05_libero_run_and_record.py
import os
import random
import cv2
import torch
import numpy as np

from lerobot.envs.factory import make_env
from lerobot.policies.pi05.policy import PI05Policy


def make_libero_env(task_id: int):
    env = make_env(
        env_type="libero",
        suite_names=["libero_spatial"],
        task_ids=[task_id],
        obs_type="image",
        render_mode="rgb_array",
        fps=30,
    )
    return env


def main():
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    task_id = random.randint(0, 9)  # libero_spatial tasks 0..9
    print(f"Using libero_spatial task_id={task_id}")
    env = make_libero_env(task_id)

    policy = PI05Policy.from_pretrained("lerobot/pi05_libero_finetuned")
    policy.to(device)
    policy.eval()

    out_path = f"pi05_libero_task_{task_id}.mp4"
    fps = 30
    frames = []

    obs, info = env.reset()
    done = False
    step_idx = 0

    instruction = info.get("lang_goal", None)
    print(f"Instruction: {instruction}")

    while not done:
        with torch.no_grad():
            action = policy.act(
                obs=obs,
                instruction=instruction,
                device=device,
            )
        obs, reward, terminated, truncated, info = env.step(action)
        done = bool(terminated or truncated)
        frame = env.render()
        if frame is not None:
            frames.append(frame)
        step_idx += 1

    print(f"Episode finished, steps={step_idx}, success={info.get('is_success', False)}")

    if len(frames) == 0:
        print("No frames collected, check env.render() and render_mode.")
        return

    h, w, _ = frames[0].shape
    writer = cv2.VideoWriter(
        out_path,
        cv2.VideoWriter_fourcc(*"mp4v"),
        fps,
        (w, h),
    )
    for f in frames:
        bgr = cv2.cvtColor(f, cv2.COLOR_RGB2BGR)
        writer.write(bgr)
    writer.release()
    print(f"Saved video to {out_path}")


if __name__ == "__main__":
    main()

Run:

cd /data/lerobot
conda activate lerobot
python pi05_libero_run_and_record.py

Use the resulting pi05_libero_task_<id>.mp4 in KNX docs/demos for Text2Action.

7. Live viewing via VNC/desktop (optional)

If you prefer live viewing in a headless environment, run an X server + VNC (pattern below), then launch your scripts inside the desktop session:

#!/usr/bin/env bash
set -e
export DISPLAY=:0
Xvfb :0 -screen 0 1600x900x24 &
sleep 2
su - USERNAME -c "DISPLAY=:0 startxfce4" &
x11vnc -display :0 -forever -rfbport 5900 -shared -noxdamage -passwd YOURPASS &
websockify --web=/usr/share/novnc/ 6080 localhost:5900 &

Connect via VNC to server_ip:5900 and run your eval scripts in a terminal inside the desktop.

8. Pros and cons for KNX

Pros:

True VLA: natural-language instruction + visual observation → actions.
Ready pi05_libero_finetuned for multi-step object-centric LIBERO tasks.
Aligns with KNX goals: one generalist policy across tasks; can adapt via fine-tuning or action-space constraints.

Cons:

Requires gated Paligemma access (HF token).
Heavier than PI0; A10/A100 preferred over smaller GPUs.
Depends on robust MuJoCo + robosuite + LIBERO + EGL setup; headless can be finicky.
Optimized for LIBERO formats; porting to custom KNX scenes requires observation mapping and careful instruction design.

PreviousPI‑0 NextAI Verifier

Last updated 2 months ago

Good night

hashtag1. What is PI0.5 / PI05

hashtag2. Environment prep (same base as PI0)

hashtag3. LIBERO assets

hashtag4. Paligemma access (Hugging Face token)

hashtag5. Smoke test: eval_pi05_libero_live.py

hashtag6. KNX demo script: PI05 + LIBERO + record video

hashtag7. Live viewing via VNC/desktop (optional)

hashtag8. Pros and cons for KNX