# OpenVLA

## 1. What is Vision-Language-Action (VLA) and what is OpenVLA?

A Vision-Language-Action (VLA) model is a large multimodal foundation model that takes image(s) of a scene and a text instruction and directly predicts low-level robot actions (e.g., end-effector deltas, gripper open/close, done flag).

Typical pipeline:

1. A vision-language encoder (an LLM with a visual module) ingests camera image(s) plus a text command.
2. Produces a latent representation.
3. An action decoder converts it into a sequence of tokens, which are then decoded into a continuous action vector (dx, dy, dz, dθ, gripper, done, etc.).

OpenVLA is an open VLA model (7B) from Stanford/Berkeley and collaborators:

* Website: `https://openvla.github.io`
* Code and weights: GitHub + HuggingFace (e.g., `openvla/openvla-7b-…`).

Key properties:

* Trained on \~970k manipulation demonstrations from multiple robots (WidowX, Franka, etc.).
* Works in the format “camera images + text instruction → robot action” without explicit planning.
* Ready-made weights fine-tuned on LIBERO tasks (e.g., `libero_10`, `libero_object`, `libero_spatial`, etc.).

For KNX/Konnex:

* Serves as the “manipulation brain”: turns a text task + image into an action sequence.
* A strong base layer that can be adapted to your robots/sims/tasks.

***

## 2. Hardware and high-level launch diagram

This reflects what we actually ran on a headless server (e.g., vast.ai) using NVIDIA A10 / A6000 / A40:

Recommended minimum:

* GPU: 1× A10 (24 GB) or similar. 16 GB is borderline but can work with 8-bit/4-bit loading.
* RAM: 32 GB (64+ preferred).
* Disk: 200+ GB for:
  * repository clones,
  * HuggingFace caches (`HF_HOME`),
  * LIBERO data and generated videos.
* OS: Ubuntu 20.04 / 22.04 with NVIDIA drivers compatible with CUDA 12.1.

Suggested directory layout:

```
/data
  ├─ openvla/          # OpenVLA repository
  ├─ LIBERO/           # LIBERO repository (tasks/sims)
  ├─ hf_home/          # HuggingFace cache
  └─ vla-scripts/      # your custom scripts (e.g., replay)
```

***

## 3. Installation: conda env and dependencies

### 3.1. Conda environment

```bash
conda create -n openvla python=3.10 -y
conda activate openvla
```

### 3.2. PyTorch with CUDA 12.1

For A10, the official cu121 wheels are convenient:

```bash
pip install --index-url https://download.pytorch.org/whl/cu121 \
  "torch==2.2.0" "torchvision==0.17.0" "torchaudio==2.2.0"
```

Check:

```bash
python -c "import torch; print(torch.__version__, torch.cuda.is_available())"
# expected: 2.2.0+cu121 True
```

### 3.3. Clone repositories

```bash
cd /data

# OpenVLA
git clone https://github.com/openvla/openvla.git
cd openvla

# Install package + deps
pip install -e .
```

This brings in the main dependencies: `transformers`, `accelerate`, `einops`, `imageio`, `matplotlib`, etc.

### 3.4. LIBERO and simulators

LIBERO provides:

* task suites (bddl files),
* environments via `robosuite + mujoco` (`OffScreenRenderEnv`).

```bash
cd /data
git clone https://github.com/Lifelong-Robot-Learning/LIBERO.git
cd LIBERO
pip install -e .
```

Simulators:

```bash
# Mujoco
pip install mujoco==2.3.7

# robosuite (compatible version)
pip install "robosuite<2.0.0"
```

Additional deps commonly needed on first import:

```bash
pip install gym==0.26.2 gym-notices
pip install pyopengl glfw opencv-python
pip install pynput
```

Common dependency conflict we’ve hit:

* `tensorflow==2.15.0` wants `numpy<2.0.0`
* some `opencv-python` builds want `numpy>=2`

For OpenVLA (without TensorFlow), the simplest fix is pinning NumPy below 2:

```bash
pip install "numpy>=1.23.5,<2.0.0"
```

If another package tries to bump NumPy ≥ 2.0.0, rerun/install with the pin preserved.

### 3.5. Extra LIBERO packages

```bash
pip install easydict bddl
```

If you hit `ModuleNotFoundError: No module named 'XXX'`, install it in the same env.

***

## 4. Environment variables

On a headless-GPU server, the following helps:

```bash
# in ~/.bashrc or your session bootstrap

export HF_HOME=/data/hf_home
export HF_HUB_CACHE=/data/hf_home/hub
export TRANSFORMERS_CACHE=/data/hf_home/hub
export HF_MODULES_CACHE=/data/hf_home/modules

# make Python see both LIBERO and OpenVLA
export PYTHONPATH=/data/LIBERO:/data/openvla:$PYTHONPATH

# headless rendering via EGL
export MUJOCO_GL=egl
export MUJOCO_EGL_DEVICE_ID=0

# minor optimization
export TOKENIZERS_PARALLELISM=false
```

Then:

```bash
source ~/.bashrc
conda activate openvla
```

***

## 5. Quick test: built-in LIBERO script

… (content unchanged from the original file) …

> For the full, up‑to‑date scripts, see the upstream repository.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.konnex.world/supported-ai-models/openvla.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
