🔗OpenVLA · OFT

1. What is OpenVLA-OFT?

OpenVLA-OFT is a set of methods and code for parameter-efficient fine-tuning (OFT = Orthogonal/Offset Fine-Tuning) on top of the base OpenVLA 7B model. The goal is to cheaply adapt the same model to new domains/tasks by training small add-on parameterizations instead of touching all weights.

Official resources:

  • Website: https://openvla-oft.github.io

  • Code: https://github.com/moojink/openvla-oft

Conceptually:

  • Start with the base 7B OpenVLA model.

  • Add small OFT components (akin to PEFT/LoRA-family ideas).

  • Train only those components on new data (e.g., LIBERO variants or new scenes).

  • At inference, the base model + OFT layers together provide adapted behavior.


2. Architecture and differences vs. “vanilla” OpenVLA

Key differences:

  • Depends on a custom Transformers fork: transformers-openvla-oft that knows how to load/use OFT layers.

  • Checkpoints like moojink/openvla-7b-oft-finetuned-libero-10 bundle the base model plus OFT parameters.

  • The openvla-oft repository contains:

    • training scripts,

    • evaluation scripts on LIBERO,

    • configs for different modes (e.g., L1 regression vs. diffusion action decoders).

For KNX, this means you can:

  • Efficiently adapt model behavior to your sim/robots without full retraining.

  • Use the existing OFT checkpoints as strong baselines on LIBERO and extend from there.


3. Installation: a working pipeline

Below is a setup that ran run_libero_eval.py end-to-end without errors.

3.1. Conda environment

3.2. PyTorch

Check:

3.3. Clone the repo and base install

This will install part of the dependencies, but some will remain missing or have mismatched versions.


4. Installing additional dependencies

In practice, you’ll need the following (versions chosen for compatibility):

Then install the custom Transformers fork and dlimp:

Sanity checks:


5. LIBERO + simulators (robosuite, MuJoCo)

Same as with base OpenVLA:

Then:

NumPy/TF conflict (same story):

If opencv-python attempts to bump NumPy to 2.x, re-pin NumPy (e.g., numpy==1.26.4).


6. Environment variables

Use the same headless-friendly configuration; adjust PYTHONPATH for openvla-oft:


7. Test: run_libero_eval.py with an OFT checkpoint

Run the official evaluation script:

Expected output:

  • TensorFlow logs about CPU optimizations (fine).

  • robosuite warnings about missing “private macro file” (fine).

  • episodes progress and video saving under ./rollouts/....

If you see:

Install:


8. Behavior: hi-res vs. low-res and seeds

We’ve observed small discrepancies between low-res control runs and hi-res replays:

  • e.g., an object slightly tilted in one video but upright in another, even with the same actions_log.

Reasons:

  • OffScreenRenderEnv and robosuite/MuJoCo can produce slightly different numeric trajectories with different resolutions/backends.

  • Seeds:

    • we set env.seed(0) and global set_seed_everywhere(cfg.seed),

    • but additional randomness in LIBERO/robosuite can exist if not fixed everywhere.

  • Over long rollouts, tiny integration differences can accumulate.

Practically:

  • For demos, this is acceptable—intent is preserved.

  • For strict replication and comparisons:

    • fix seeds everywhere,

    • ensure identical init states and physics params,

    • consider keeping the same resolution for control and replay.


9. Pros and cons for KNX

Pros:

  • Parameter-efficient fine-tuning:

    • no need to train all 7B parameters,

    • cheaper and faster adaptation to new tasks.

  • Ready OFT checkpoints for LIBERO (e.g., moojink/openvla-7b-oft-finetuned-libero-10) with strong performance.

  • Great fit for KNX:

    • base OpenVLA = “universal brain”,

    • OFT layers = “stickers” for specific robots/business use-cases.

Cons:

  • Heavier dependency stack than base OpenVLA:

    • TensorFlow + TF-graphics + TFDS,

    • diffusers,

    • custom Transformers fork,

    • dlimp, etc.

  • Requires careful env assembly, especially NumPy/TF/OpenCV versions.

  • Research-oriented code and docs; for production KNX you’ll still need orchestration, safety layers, and integration with real robots and web infra.

Last updated