🦾OpenVLA

1. What is Vision-Language-Action (VLA) and what is OpenVLA?

A Vision-Language-Action (VLA) model is a large multimodal foundation model that takes image(s) of a scene and a text instruction and directly predicts low-level robot actions (e.g., end-effector deltas, gripper open/close, done flag).

Typical pipeline:

A vision-language encoder (an LLM with a visual module) ingests camera image(s) plus a text command.
Produces a latent representation.
An action decoder converts it into a sequence of tokens, which are then decoded into a continuous action vector (dx, dy, dz, dθ, gripper, done, etc.).

OpenVLA is an open VLA model (7B) from Stanford/Berkeley and collaborators:

Website: https://openvla.github.io
Code and weights: GitHub + HuggingFace (e.g., openvla/openvla-7b-…).

Key properties:

Trained on ~970k manipulation demonstrations from multiple robots (WidowX, Franka, etc.).
Works in the format “camera images + text instruction → robot action” without explicit planning.
Ready-made weights fine-tuned on LIBERO tasks (e.g., libero_10, libero_object, libero_spatial, etc.).

For KNX/Konnex:

Serves as the “manipulation brain”: turns a text task + image into an action sequence.
A strong base layer that can be adapted to your robots/sims/tasks.

2. Hardware and high-level launch diagram

This reflects what we actually ran on a headless server (e.g., vast.ai) using NVIDIA A10 / A6000 / A40:

Recommended minimum:

GPU: 1× A10 (24 GB) or similar. 16 GB is borderline but can work with 8-bit/4-bit loading.
RAM: 32 GB (64+ preferred).
Disk: 200+ GB for:
- repository clones,
- HuggingFace caches (HF_HOME),
- LIBERO data and generated videos.
OS: Ubuntu 20.04 / 22.04 with NVIDIA drivers compatible with CUDA 12.1.

Suggested directory layout:

/data
  ├─ openvla/          # OpenVLA repository
  ├─ LIBERO/           # LIBERO repository (tasks/sims)
  ├─ hf_home/          # HuggingFace cache
  └─ vla-scripts/      # your custom scripts (e.g., replay)

3. Installation: conda env and dependencies

3.1. Conda environment

conda create -n openvla python=3.10 -y
conda activate openvla

3.2. PyTorch with CUDA 12.1

For A10, the official cu121 wheels are convenient:

pip install --index-url https://download.pytorch.org/whl/cu121 \
  "torch==2.2.0" "torchvision==0.17.0" "torchaudio==2.2.0"

Check:

python -c "import torch; print(torch.__version__, torch.cuda.is_available())"
# expected: 2.2.0+cu121 True

3.3. Clone repositories

cd /data

# OpenVLA
git clone https://github.com/openvla/openvla.git
cd openvla

# Install package + deps
pip install -e .

This brings in the main dependencies: transformers, accelerate, einops, imageio, matplotlib, etc.

3.4. LIBERO and simulators

LIBERO provides:

task suites (bddl files),
environments via robosuite + mujoco (OffScreenRenderEnv).

cd /data
git clone https://github.com/Lifelong-Robot-Learning/LIBERO.git
cd LIBERO
pip install -e .

Simulators:

# Mujoco
pip install mujoco==2.3.7

# robosuite (compatible version)
pip install "robosuite<2.0.0"

Additional deps commonly needed on first import:

pip install gym==0.26.2 gym-notices
pip install pyopengl glfw opencv-python
pip install pynput

Common dependency conflict we’ve hit:

tensorflow==2.15.0 wants numpy<2.0.0
some opencv-python builds want numpy>=2

For OpenVLA (without TensorFlow), the simplest fix is pinning NumPy below 2:

pip install "numpy>=1.23.5,<2.0.0"

If another package tries to bump NumPy ≥ 2.0.0, rerun/install with the pin preserved.

3.5. Extra LIBERO packages

pip install easydict bddl

If you hit ModuleNotFoundError: No module named 'XXX', install it in the same env.

4. Environment variables

On a headless-GPU server, the following helps:

# in ~/.bashrc or your session bootstrap

export HF_HOME=/data/hf_home
export HF_HUB_CACHE=/data/hf_home/hub
export TRANSFORMERS_CACHE=/data/hf_home/hub
export HF_MODULES_CACHE=/data/hf_home/modules

# make Python see both LIBERO and OpenVLA
export PYTHONPATH=/data/LIBERO:/data/openvla:$PYTHONPATH

# headless rendering via EGL
export MUJOCO_GL=egl
export MUJOCO_EGL_DEVICE_ID=0

# minor optimization
export TOKENIZERS_PARALLELISM=false

Then:

source ~/.bashrc
conda activate openvla

5. Quick test: built-in LIBERO script

… (content unchanged from the original file) …

For the full, up‑to‑date scripts, see the upstream repository.

PreviousOverview NextOpenVLA · OFT

Last updated 2 months ago

Good night

hashtag1. What is Vision-Language-Action (VLA) and what is OpenVLA?

hashtag2. Hardware and high-level launch diagram

hashtag3. Installation: conda env and dependencies

hashtag3.1. Conda environment

hashtag3.2. PyTorch with CUDA 12.1

hashtag3.3. Clone repositories

hashtag3.4. LIBERO and simulators

hashtag3.5. Extra LIBERO packages

hashtag4. Environment variables

hashtag5. Quick test: built-in LIBERO script