Dynamic Client: Costs

Overview

The primary categories of costs associated with the dynamic client approach are post-processing costs, data storage costs, and data egress costs.

Post-processing costs

The post-processing costs are associated with the time required to generate pyramids for rendering. As the post-processing costs should be reduced by planned performance improvements to ndpyramid and the generation of pyramids during dataset creation, we do not provide specific estimates for the post-processing costs in this cookbook.

Data storage costs

The data storage costs refer to the increased costs associated with storing pyramids along with the original data. For context, current S3 pricing is detailed at https://aws.amazon.com/s3/pricing/. The rates vary based on the the object size and the storage class; we consider the average rate in the widget below for simplicity.

For datasets with non-dimensional coordinates and a single non-spatial dimension, the cost associated with storing coordinates and metadata is negligible compared to the storage costs of the variable data.

The data type can dramatically affect the storage costs. For example, increasing the precision from a single precision to a double precision float or reducing to a half precision would double or half the storage costs respectively, again treating coordinate and metadata storage as negligible. It is important to consider whether the selected data type is supported by all applications that will use the data.

The compression can also dramatically impact the storage costs. The benchmarks in this experiment all used gzip level 1 compression, single precision floats for the variable data and spatial coordinates, and integers for the time dimension. Future work should consider the impact of data type, compression, and bitrounding on performance, as these variables have the potential to dramatically reduce costs.

Data storage costs also be reduced by leveraging the Zarr feature that allows not writing chunks comprised only of the specified fill value. For example, this would allow not writing chunks that correspond to oceans for a global, land-based dataset, yielding tremendous savings at high zoom levels. The primary downside is less confidence that all non-empty chunks were written, as the reason for not writing specific chunks is not currently stored in Zarr metadata.

The small interactive application below shows the storage costs for different data configurations. The tool explores storage costs for global pyramids generated using ndpyramid’s pyramid_reproject or pyramid_regrid methods for the @carbonplan/maps library. Future iterations on this tool should explore costs of coarsened pyramids for use in other visualization libraries. Only a small portion of the parameter space is included in the embedded application; please download this notebook to fully explore the exercise. The application shows the dramatic impact of the number of zoom levels included on storage costs.

import panel as pn
from cost_widgets import (
    calculate_pyramid_cost,
    compression_widget,
    dtype_widget,
    extra_dim_widget,
    price_widget,
    zoom_level_widget,
)

pn.extension()
# Create panel app
bound_disp = pn.bind(
    calculate_pyramid_cost,
    number_of_zoom_levels=zoom_level_widget,
    pixels_per_tile=128,
    extra_dimension_length=extra_dim_widget,
    data_dtype=dtype_widget,
    data_compression_ratio=compression_widget,
    price_per_GB=price_widget,
)
pn.Column(
    bound_disp,
    dtype_widget,
    zoom_level_widget,
    extra_dim_widget,
    compression_widget,
    price_widget,
).embed()

Data request and egress costs

Data egress costs are associated with serving data from a cloud storage location to the client. Typically, the costs are based on the amount of data transferred from the cloud storage location to the internet, the number of requests made against the buckets and objects, and the location that the data will be transferred to. For specific estimations of AWS pricing for data requests and transfer, see https://aws.amazon.com/s3/pricing/. Currently, the first 100GB each month are covered by AWS’s free tier, which additional GB charged at a rate per GB with the rate incrementally decreasing at several thresholds.