OpenFly & OpenFly-Agent

What it is

OpenFly is a platform and benchmark for outdoor aerial Vision–Language Navigation (VLN): a UAV follows natural-language instructions and uses egocentric vision to decide flight actions. The work (arXiv:2502.18041, ICLR 2026 on OpenReview) provides:

  • A data-generation toolchain (point clouds, semantic segmentation, trajectories, instructions) using multiple simulators and renderers (Unreal / AirSim, GTA V, Google Earth, 3D Gaussian Splatting, etc.).

  • A large-scale aerial VLN dataset (on the order of 100k trajectories, 18 scenes, varied altitude and path length).

  • OpenFly-Agent — a keyframe-aware VLN model (derived from the OpenVLA line) that emphasizes informative frames to improve success rates vs baselines in the published evaluation.

For Konnex, this matches the Drone navigation workload: text mission → visual observations → flight decisions that validators can score against a signed task and a PoPW sensor bundle.

OpenFly-Agent (at a glance)

  • Inputs: language instruction, current images, and history keyframes (as in the public architecture).

  • Outputs: action prediction for the VLN head; the published real-robot setup pairs the policy with a separate local planner and MPC for tracking (see the paper for the full stack).

  • OpenVLA: OpenFly-Agent is described as a full fine-tune from an OpenVLA checkpoint for aerial VLN. Use the upstream README for unnorm_key, tokenizer, and weight paths.

Official resources

Resource
URL

Weights

Listed on Hugging Face in the repository / model card (e.g. community repos naming openfly-agent; names can change)

Follow the upstream repo for CUDA, flash-attn, dlimp, and licensing. GPU memory requirements are release-specific.

Integration sketch for miners

  1. Receive a text mission (and any subnet schema for altitude, geofence, or safety).

  2. Run OpenFly-Agent (or a fine-tune) on timestamped camera frames; connect outputs to your planner / FCU bridge as your stack requires.

  3. Submit model outputs and signed telemetry for validator scoring and PoPW per subnet API rules.

See also

Last updated