# OpenVLA · OFT

## 1. What is OpenVLA-OFT?

OpenVLA-OFT is a set of methods and code for parameter-efficient fine-tuning (OFT = Orthogonal/Offset Fine-Tuning) on top of the base OpenVLA 7B model. The goal is to cheaply adapt the same model to new domains/tasks by training small add-on parameterizations instead of touching all weights.

Official resources:

* Website: `https://openvla-oft.github.io`
* Code: `https://github.com/moojink/openvla-oft`

Conceptually:

* Start with the base 7B OpenVLA model.
* Add small OFT components (akin to PEFT/LoRA-family ideas).
* Train only those components on new data (e.g., LIBERO variants or new scenes).
* At inference, the base model + OFT layers together provide adapted behavior.

***

## 2. Architecture and differences vs. “vanilla” OpenVLA

Key differences:

* Depends on a custom Transformers fork: `transformers-openvla-oft` that knows how to load/use OFT layers.
* Checkpoints like `moojink/openvla-7b-oft-finetuned-libero-10` bundle the base model plus OFT parameters.
* The `openvla-oft` repository contains:
  * training scripts,
  * evaluation scripts on LIBERO,
  * configs for different modes (e.g., L1 regression vs. diffusion action decoders).

For KNX, this means you can:

* Efficiently adapt model behavior to your sim/robots without full retraining.
* Use the existing OFT checkpoints as strong baselines on LIBERO and extend from there.

***

## 3. Installation: a working pipeline

Below is a setup that ran `run_libero_eval.py` end-to-end without errors.

### 3.1. Conda environment

```bash
conda create -n openvla_oft python=3.10 -y
conda activate openvla_oft
```

### 3.2. PyTorch

```bash
pip install --index-url https://download.pytorch.org/whl/cu121 \
  "torch==2.2.0" "torchvision==0.17.0" "torchaudio==2.2.0"
```

Check:

```bash
python -c "import torch; print('torch:', torch.__version__)"
# torch: 2.2.0+cu121
```

### 3.3. Clone the repo and base install

```bash
cd /data
git clone https://github.com/moojink/openvla-oft.git
cd openvla-oft

pip install -e .
```

This will install part of the dependencies, but some will remain missing or have mismatched versions.

***

## 4. Installing additional dependencies

In practice, you’ll need the following (versions chosen for compatibility):

```bash
pip install accelerate==0.28.0 \
  diffusers==0.30.3 \
  einops \
  fastapi \
  huggingface_hub \
  imageio \
  json-numpy \
  jsonlines \
  matplotlib \
  "peft==0.11.1" \
  protobuf \
  rich \
  "sentencepiece==0.1.99" \
  "tensorflow==2.15.0" \
  "tensorflow_datasets==4.9.3" \
  "tensorflow_graphics==2021.12.3" \
  "timm==0.9.10" \
  "tokenizers==0.19.1" \
  uvicorn \
  wandb \
  "draccus==0.8.0"
```

Then install the custom Transformers fork and `dlimp`:

```bash
pip install "transformers @ git+https://github.com/moojink/transformers-openvla-oft.git"
pip install "dlimp @ git+https://github.com/moojink/dlimp_openvla.git"
```

Sanity checks:

```bash
python -c "import transformers, draccus, diffusers; print('OK')"
python -c "import tensorflow as tf; print('tf:', tf.__version__)"
# tf: 2.15.0
```

***

## 5. LIBERO + simulators (robosuite, MuJoCo)

Same as with base OpenVLA:

```bash
cd /data
git clone https://github.com/Lifelong-Robot-Learning/LIBERO.git
cd LIBERO
pip install -e .
```

Then:

```bash
pip install mujoco==2.3.7
pip install "robosuite<2.0.0"
pip install gym==0.26.2 gym-notices
pip install pyopengl glfw opencv-python pynput easydict bddl
```

NumPy/TF conflict (same story):

```bash
pip install "numpy>=1.23.5,<2.0.0"
```

If `opencv-python` attempts to bump NumPy to 2.x, re-pin NumPy (e.g., `numpy==1.26.4`).

***

## 6. Environment variables

Use the same headless-friendly configuration; adjust `PYTHONPATH` for `openvla-oft`:

```bash
export HF_HOME=/data/hf_home
export HF_HUB_CACHE=/data/hf_home/hub
export TRANSFORMERS_CACHE=/data/hf_home/hub
export HF_MODULES_CACHE=/data/hf_home/modules

export PYTHONPATH=/data/LIBERO:/data/openvla-oft:$PYTHONPATH

export MUJOCO_GL=egl
export MUJOCO_EGL_DEVICE_ID=0
export TOKENIZERS_PARALLELISM=false
```

***

## 7. Test: `run_libero_eval.py` with an OFT checkpoint

Run the official evaluation script:

```bash
cd /data/openvla-oft
conda activate openvla_oft

python experiments/robot/libero/run_libero_eval.py \
  --model_family openvla \
  --pretrained_checkpoint moojink/openvla-7b-oft-finetuned-libero-10 \
  --task_suite_name libero_10 \
  --use_l1_regression True \
  --use_diffusion False \
  --use_film False \
  --num_images_in_input 2 \
  --use_proprio True \
  --center_crop True \
  --num_open_loop_steps 8 \
  --num_trials_per_task 1 \
  --env_img_res 256 \
  --local_log_dir ./experiments/logs_oft_libero10
```

Expected output:

* TensorFlow logs about CPU optimizations (fine).
* robosuite warnings about missing “private macro file” (fine).
* episodes progress and video saving under `./rollouts/...`.

If you see:

```
ValueError: Could not find a backend to open `...mp4` with iomode `w?`.
...
FFMPEG: pip install imageio[ffmpeg]
```

Install:

```bash
pip install "imageio[ffmpeg]"
```

***

## 8. Behavior: hi-res vs. low-res and seeds

We’ve observed small discrepancies between low-res control runs and hi-res replays:

* e.g., an object slightly tilted in one video but upright in another, even with the same `actions_log`.

Reasons:

* `OffScreenRenderEnv` and robosuite/MuJoCo can produce slightly different numeric trajectories with different resolutions/backends.
* Seeds:
  * we set `env.seed(0)` and global `set_seed_everywhere(cfg.seed)`,
  * but additional randomness in LIBERO/robosuite can exist if not fixed everywhere.
* Over long rollouts, tiny integration differences can accumulate.

Practically:

* For demos, this is acceptable—intent is preserved.
* For strict replication and comparisons:
  * fix seeds everywhere,
  * ensure identical init states and physics params,
  * consider keeping the same resolution for control and replay.

***

## 9. Pros and cons for KNX

Pros:

* Parameter-efficient fine-tuning:
  * no need to train all 7B parameters,
  * cheaper and faster adaptation to new tasks.
* Ready OFT checkpoints for LIBERO (e.g., `moojink/openvla-7b-oft-finetuned-libero-10`) with strong performance.
* Great fit for KNX:
  * base OpenVLA = “universal brain”,
  * OFT layers = “stickers” for specific robots/business use-cases.

Cons:

* Heavier dependency stack than base OpenVLA:
  * TensorFlow + TF-graphics + TFDS,
  * diffusers,
  * custom Transformers fork,
  * `dlimp`, etc.
* Requires careful env assembly, especially NumPy/TF/OpenCV versions.
* Research-oriented code and docs; for production KNX you’ll still need orchestration, safety layers, and integration with real robots and web infra.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.konnex.world/supported-ai-models/openvla-oft.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
