# PI-0.5

## 1. What is PI0.5 / PI05

PI0.5 (PI05) extends PI0 toward a Vision-Language-Action (VLA) policy:

* visual backbone + language component (LeRobot variant uses Paligemma),
* input: natural-language instruction (e.g., “put the yellow book on the wooden tray”) + observation history,
* output: low-level manipulator actions.

Hugging Face:

* `lerobot/pi05_libero_finetuned` — PI05 fine-tuned on LIBERO tasks (table/books/shelves, etc.).

For KNX, this is closer to a single policy handling multiple tasks in a rich scene via natural language.

***

## 2. Environment prep (same base as PI0)

If you’ve followed PI0 setup:

* `conda` env `lerobot` ready,
* LeRobot installed.

Additional packages:

1. EGL/GL for headless rendering:

```bash
sudo apt install -y \
  libgl1-mesa-glx \
  libglib2.0-0 \
  libosmesa6 \
  xvfb
```

2. CMake for `hf-egl-probe`:

```bash
conda activate lerobot
pip install cmake
pip install hf-egl-probe
```

***

## 3. LIBERO assets

LeRobot can create LIBERO envs via its factory, and assets download on-demand from Hugging Face. Quick check:

```bash
conda activate lerobot
python - << 'EOF'
from libero.libero.envs import make as make_libero
env = make_libero(
    suite_names=["libero_spatial"],
    task_ids=[0],
    obs_type="image",
)
obs = env.reset()
print("Libero env ok, obs keys:", obs.keys())
EOF
```

You should see asset download logs and env init messages.

***

## 4. Paligemma access (Hugging Face token)

PI05 uses Paligemma (`google/paligemma-3b-pt-224`), a gated repo. Without a token, you’ll get:

> GatedRepoError: 401 Client Error. Cannot access gated repo google/paligemma-3b-pt-224

Steps:

1. Create a fine-grained HF token with “Read” permissions.
2. Login on the server:

```bash
conda activate lerobot
hf auth login
```

Paste the token; confirm saving. After this, PI05 and Paligemma can be fetched.

***

## 5. Smoke test: `eval_pi05_libero_live.py`

Create `/data/lerobot/eval_pi05_libero_live.py`:

```python
# file: eval_pi05_libero_live.py
import torch

from lerobot.envs.factory import make_env
from lerobot.policies.pi05.policy import PI05Policy


def main():
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    env = make_env(
        env_type="libero",
        suite_names=["libero_spatial"],
        task_ids=[2],  # example task
        obs_type="image",
        render_mode="rgb_array",
    )

    policy = PI05Policy.from_pretrained("lerobot/pi05_libero_finetuned")
    policy.to(device)
    policy.eval()

    num_episodes = 3
    for ep in range(num_episodes):
        obs, info = env.reset()
        done = False
        step_idx = 0
        instruction = info.get("lang_goal", None)
        print(f"Episode {ep}, instruction: {instruction}")
        while not done:
            with torch.no_grad():
                action = policy.act(
                    obs=obs,
                    instruction=instruction,
                    device=device,
                )
            obs, reward, terminated, truncated, info = env.step(action)
            done = bool(terminated or truncated)
            step_idx += 1
        print(f"Episode {ep} finished, steps={step_idx}, success={info.get('is_success', False)}")


if __name__ == "__main__":
    main()
```

Run:

```bash
cd /data/lerobot
conda activate lerobot
python eval_pi05_libero_live.py
```

Expected:

* `lerobot/pi05_libero_finetuned` downloads,
* Paligemma downloads (with token),
* LIBERO scene runs; episodes print success flags.

***

## 6. KNX demo script: PI05 + LIBERO + record video

Save `/data/lerobot/pi05_libero_run_and_record.py`:

```python
# file: pi05_libero_run_and_record.py
import os
import random
import cv2
import torch
import numpy as np

from lerobot.envs.factory import make_env
from lerobot.policies.pi05.policy import PI05Policy


def make_libero_env(task_id: int):
    env = make_env(
        env_type="libero",
        suite_names=["libero_spatial"],
        task_ids=[task_id],
        obs_type="image",
        render_mode="rgb_array",
        fps=30,
    )
    return env


def main():
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    task_id = random.randint(0, 9)  # libero_spatial tasks 0..9
    print(f"Using libero_spatial task_id={task_id}")
    env = make_libero_env(task_id)

    policy = PI05Policy.from_pretrained("lerobot/pi05_libero_finetuned")
    policy.to(device)
    policy.eval()

    out_path = f"pi05_libero_task_{task_id}.mp4"
    fps = 30
    frames = []

    obs, info = env.reset()
    done = False
    step_idx = 0

    instruction = info.get("lang_goal", None)
    print(f"Instruction: {instruction}")

    while not done:
        with torch.no_grad():
            action = policy.act(
                obs=obs,
                instruction=instruction,
                device=device,
            )
        obs, reward, terminated, truncated, info = env.step(action)
        done = bool(terminated or truncated)
        frame = env.render()
        if frame is not None:
            frames.append(frame)
        step_idx += 1

    print(f"Episode finished, steps={step_idx}, success={info.get('is_success', False)}")

    if len(frames) == 0:
        print("No frames collected, check env.render() and render_mode.")
        return

    h, w, _ = frames[0].shape
    writer = cv2.VideoWriter(
        out_path,
        cv2.VideoWriter_fourcc(*"mp4v"),
        fps,
        (w, h),
    )
    for f in frames:
        bgr = cv2.cvtColor(f, cv2.COLOR_RGB2BGR)
        writer.write(bgr)
    writer.release()
    print(f"Saved video to {out_path}")


if __name__ == "__main__":
    main()
```

Run:

```bash
cd /data/lerobot
conda activate lerobot
python pi05_libero_run_and_record.py
```

Use the resulting `pi05_libero_task_<id>.mp4` in KNX docs/demos for Text2Action.

***

## 7. Live viewing via VNC/desktop (optional)

If you prefer live viewing in a headless environment, run an X server + VNC (pattern below), then launch your scripts inside the desktop session:

```bash
#!/usr/bin/env bash
set -e
export DISPLAY=:0
Xvfb :0 -screen 0 1600x900x24 &
sleep 2
su - USERNAME -c "DISPLAY=:0 startxfce4" &
x11vnc -display :0 -forever -rfbport 5900 -shared -noxdamage -passwd YOURPASS &
websockify --web=/usr/share/novnc/ 6080 localhost:5900 &
```

Connect via VNC to `server_ip:5900` and run your eval scripts in a terminal inside the desktop.

***

## 8. Pros and cons for KNX

Pros:

* True VLA: natural-language instruction + visual observation → actions.
* Ready `pi05_libero_finetuned` for multi-step object-centric LIBERO tasks.
* Aligns with KNX goals: one generalist policy across tasks; can adapt via fine-tuning or action-space constraints.

Cons:

* Requires gated Paligemma access (HF token).
* Heavier than PI0; A10/A100 preferred over smaller GPUs.
* Depends on robust MuJoCo + robosuite + LIBERO + EGL setup; headless can be finicky.
* Optimized for LIBERO formats; porting to custom KNX scenes requires observation mapping and careful instruction design.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.konnex.world/supported-ai-models/pi05.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
