Recommendations Summary

Design files, chunks, and aggregated chunk manifests around typical use case patterns

The most important design input for a virtual store is a clear understanding of how the data will actually be accessed. Chunk layout and manifest structure should be optimized for the dominant access pattern of the intended user community.

Example: NISAR time series analysis

For NISAR data where time series analysis is the primary use case, a frame-oriented chunk manifest design ensures efficient per-frame loading. This decision should be made deliberately, with awareness of the trade-offs for other access patterns (see Known Limitations).

Adopt icechunk

Icechunk should be adopted but with risk mitigation measures. Specifically, the ability to parse chunk manifests out of icechunk will enable chunk manifests to be read back out of icechunk stores and stored in another format.

Adopt GeoZarr standards

Adoption of GeoZarr is recommended for future virtual store implementations to ensure interoperability and long-term maintainability.

Adoption of multiscales

Adoption of multiscales should be considered, especially for high resolution data.

Engage with the evolving chunk manifest protocol

TBD: The chunk manifest protocol specification is speculative at this time.

Leverage existing tools, services and available chunk metadata.

To build virtual data stores efficiently, leverage the existing open-source ecosystem (Icechunk, Kerchunk, etc.) rather than implementing solutions from scratch. When a collection is consistent and OPeNDAP-supported, we should try using DMRPP metadata instead of reading metadata from source files—it’s faster and can represent various archival formats. For collections lacking DMRPP, fall back to native metadata parser. However, DMRPP has caveats that OPeNDAP should address: standardizing DMRPP across all collections, adding checksum validation at generation time, and possibly adopting a lighter schema or Parquet serialization.

Prioritize EGIS integration planning

Integration with the Earthdata Geographic Information System (EGIS) has been identified as a future priority. Scoping should begin to ensure integration with EGIS is smooth.

Address Governance Gaps

The governance decisions identified in Governance — metadata placement standards, versioning policies, and stewardship ownership — should be addressed as virtual store technology is deployed more broadly across DAACs.

Streamline end-user experience

The authentication and credential complexity currently required to open a virtual store is a significant barrier to adoption. For virtual stores to see broad use, the path from Earthdata Login to an open xarray Dataset or Datatree should be reduced to a store identifier and authentication — comparable to the experience earthaccess already provides for direct file access.

Documentation and onboarding

Virtual store documentation is already underway — PO.DAAC’s cookbook chapter, ASDC’s demo notebooks, and the Onboarding Guide in this report all represent ongoing efforts. The next step is consolidating and improving these materials to serve two distinct audiences: data providers virtualizing datasets, and data users accessing virtual stores. Provider-facing documentation should develop the existing worked examples into reusable templates covering format-specific considerations, chunking decisions, and validation. User-facing documentation should lower the barrier to working with virtual stores — particularly around authentication setup, available access patterns, and how virtual store access differs from traditional file-based workflows. Coordinating these efforts across DAACs will reduce duplication and help establish consistent guidance as adoption grows.