Experiment Configuration with Hydra#

In this lesson, we will configure experiments for our PyTorch Lightning training workflow. This will allow us to test various combinations of hyperparameter settings and assess relative model performance.

to set up the jupyter env, set up hatch as described in the readme, then:

hatch -e nb shell
exit
jupyter lab

Let’s start by importing the required packages we will use.

Notice that at this point, our data module and custom model module that we created from the prior lessons are now abstracted enough such that we can import them as modules!

import os
from pathlib import Path
from typing import List

import hydra
import wandb
from omegaconf import OmegaConf
from lightning.pytorch.callbacks import Callback
from lightning.pytorch.loggers import CSVLogger, WandbLogger

from ml_pipeline.datasets.datamodule import BurnScarsDataModule
from ml_pipeline.model.lightningmodule import BurnScarsSegmentationModel

Now we will configure Hydra, an experiment configuration tool. Hydra leverages configuration files for groups of related, tunable hyperparameters, allowing you to easily organize, version control, and automate multiple experiment configurations.

with hydra.initialize(config_path="../../config", version_base="1.3.0"):
    cfg = hydra.compose(
        config_name="config",
        overrides=["seed=0", "author=devseed", "name=test-exp-nb-1"],
        return_hydra_config=True,
    )
    print(OmegaConf.to_yaml(cfg))
hydra:
  run:
    dir: logs/runs/${now:%Y-%m-%d}/${now:%H-%M-%S}
  sweep:
    dir: logs/multiruns/${now:%Y-%m-%d}/${now:%H-%M-%S}
    subdir: ${hydra.job.num}
  launcher:
    _target_: hydra._internal.core_plugins.basic_launcher.BasicLauncher
  sweeper:
    _target_: hydra._internal.core_plugins.basic_sweeper.BasicSweeper
    max_batch_size: null
    params: null
  help:
    app_name: ${hydra.job.name}
    header: '${hydra.help.app_name} is powered by Hydra.

      '
    footer: 'Powered by Hydra (https://hydra.cc)

      Use --hydra-help to view Hydra specific help

      '
    template: '${hydra.help.header}

      == Configuration groups ==

      Compose your configuration from those groups (group=option)


      $APP_CONFIG_GROUPS


      == Config ==

      Override anything in the config (foo.bar=value)


      $CONFIG


      ${hydra.help.footer}

      '
  hydra_help:
    template: 'Hydra (${hydra.runtime.version})

      See https://hydra.cc for more info.


      == Flags ==

      $FLAGS_HELP


      == Configuration groups ==

      Compose your configuration from those groups (For example, append hydra/job_logging=disabled
      to command line)


      $HYDRA_CONFIG_GROUPS


      Use ''--cfg hydra'' to Show the Hydra config.

      '
    hydra_help: ???
  hydra_logging:
    version: 1
    formatters:
      simple:
        format: '[%(asctime)s][HYDRA] %(message)s'
    handlers:
      console:
        class: logging.StreamHandler
        formatter: simple
        stream: ext://sys.stdout
    root:
      level: INFO
      handlers:
      - console
    loggers:
      logging_example:
        level: DEBUG
    disable_existing_loggers: false
  job_logging:
    version: 1
    formatters:
      simple:
        format: '[%(asctime)s][%(name)s][%(levelname)s] - %(message)s'
    handlers:
      console:
        class: logging.StreamHandler
        formatter: simple
        stream: ext://sys.stdout
      file:
        class: logging.FileHandler
        formatter: simple
        filename: ${hydra.runtime.output_dir}/${hydra.job.name}.log
    root:
      level: INFO
      handlers:
      - console
      - file
    disable_existing_loggers: false
  env: {}
  mode: null
  searchpath: []
  callbacks: {}
  output_subdir: .hydra
  overrides:
    hydra: []
    task:
    - seed=0
    - author=devseed
    - name=test-exp-nb-1
  job:
    name: notebook
    chdir: null
    override_dirname: author=devseed,name=test-exp-nb-1,seed=0
    id: ???
    num: ???
    config_name: config
    env_set: {}
    env_copy: []
    config:
      override_dirname:
        kv_sep: '='
        item_sep: ','
        exclude_keys: []
  runtime:
    version: 1.3.2
    version_base: '1.3'
    cwd: /home/runner/work/ml-pipeline/ml-pipeline/jbook/docs
    config_sources:
    - path: hydra.conf
      schema: pkg
      provider: hydra
    - path: /home/runner/work/ml-pipeline/ml-pipeline/config
      schema: file
      provider: main
    - path: ''
      schema: structured
      provider: schema
    output_dir: ???
    choices:
      callbacks: minimal
      trainer: basic
      datamodule: burnscars
      model: burnscars-resnet18
      hydra/env: default
      hydra/callbacks: null
      hydra/job_logging: default
      hydra/hydra_logging: default
      hydra/hydra_help: default
      hydra/help: default
      hydra/sweeper: basic
      hydra/launcher: basic
      hydra/output: default
  verbose: false
seed: 0
author: devseed
name: test-exp-nb-1
work_dir: ${hydra.runtime.cwd}
data_dir: ${hydra.runtime.cwd}/data
model:
  encoder_name: resnet18
  encoder_depth: 5
  encoder_weights: null
  in_channels: 5
  classes: 1
  activation: null
  lr: 0.001
datamodule:
  image_query:
    bbox:
    - -119.1
    - 36.2
    - -118.2
    - 36.9
    datetime:
    - '2021-08-15T00:00:00Z'
    - '2021-09-15T23:59:59Z'
    collections:
    - HLSS30.v2.0
  vector_url: https://gist.githubusercontent.com/weiji14/286032ac2498d10e050ba585257dd50d/raw/c897c7c1b3b8354ec8c6e8327df38fcfee79b4ef/burn_scars.geojson
  batch_size: 4
trainer:
  _target_: lightning.pytorch.Trainer
  accelerator: auto
  min_epochs: 1
  max_epochs: 1
  limit_train_batches: 3
  limit_val_batches: 3
  log_every_n_steps: 1
callbacks:
  model_summary:
    _target_: lightning.pytorch.callbacks.RichModelSummary
    max_depth: 4
  rich_progress_bar:
    _target_: lightning.pytorch.callbacks.RichProgressBar
  model_checkpoint:
    _target_: lightning.pytorch.callbacks.ModelCheckpoint
    monitor: val_loss
    mode: min
    save_top_k: 1
    save_last: true
    verbose: true
    dirpath: ${work_dir}/logs/checkpoint/
    filename: ${name}-epoch={epoch:02d}-val_loss={val_loss:.4f}
    auto_insert_metric_name: false
  learning_rate_monitor:
    _target_: lightning.pytorch.callbacks.LearningRateMonitor
    logging_interval: step

We will also authenticate and configure our Weights and Biases (W&B) logger. W&B is an ML model visualization platform used for examining graphs of metrics and comparing inputs, intermediates and output results.

# store experiment logs inside logs/
cwd = os.getcwd()
(Path(cwd) / "logs").mkdir(exist_ok=True)

# loggers
csv_logger = CSVLogger(save_dir="logs", name=cfg.name)
wandb_logger = wandb_logger = WandbLogger(
    name=cfg.name,
    save_dir="logs",
    offline=True,  # set to False if logging online
    project="ml-pipeline",
    entity="nasa-impact",
    log_model=False,
)
wandb: WARNING `resume` will be ignored since W&B syncing is set to `offline`. Starting a new run with run id 0orso6nb.
wandb: Tracking run with wandb version 0.15.9
wandb: W&B syncing is set to `offline` in this directory.  
wandb: Run `wandb online` or set WANDB_MODE=online to enable cloud syncing.

This establishes a callback to set up the experiment parameters with Hydra during Pytorch Lightning model training.

# callbacks
callbacks: List[Callback] = []
if "callbacks" in cfg:
    for _, cb_conf in cfg.callbacks.items():
        if "_target_" in cb_conf:
            callbacks.append(hydra.utils.instantiate(cb_conf))

Now let’s compile everything together. Hydra will examine what is in config_path specified earlier to know which combinations of experiments to run.

datamodule = BurnScarsDataModule(**cfg.datamodule)  # datamodule
model = BurnScarsSegmentationModel(**cfg.model)  # model
trainer = hydra.utils.instantiate(
    config=cfg.trainer,
    callbacks=callbacks,
    logger=[csv_logger, wandb_logger],
)  # trainer
Trainer already configured with model summary callbacks: [<class 'lightning.pytorch.callbacks.rich_model_summary.RichModelSummary'>]. Skipping setting a default `ModelSummary` callback.
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs

Let’s train some experiments.

# fit
trainer.fit(model=model, datamodule=datamodule)
┏━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃     Name                                Type              Params ┃
┡━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ 0  │ model                              │ Backbone         │ 14.3 M │
│ 1  │ model.backbone                     │ Unet             │ 14.3 M │
│ 2  │ model.backbone.encoder             │ ResNetEncoder    │ 11.2 M │
│ 3  │ model.backbone.encoder.conv1       │ Conv2d           │ 15.7 K │
│ 4  │ model.backbone.encoder.bn1         │ BatchNorm2d      │    128 │
│ 5  │ model.backbone.encoder.relu        │ ReLU             │      0 │
│ 6  │ model.backbone.encoder.maxpool     │ MaxPool2d        │      0 │
│ 7  │ model.backbone.encoder.layer1      │ Sequential       │  147 K │
│ 8  │ model.backbone.encoder.layer2      │ Sequential       │  525 K │
│ 9  │ model.backbone.encoder.layer3      │ Sequential       │  2.1 M │
│ 10 │ model.backbone.encoder.layer4      │ Sequential       │  8.4 M │
│ 11 │ model.backbone.decoder             │ UnetDecoder      │  3.2 M │
│ 12 │ model.backbone.decoder.center      │ Identity         │      0 │
│ 13 │ model.backbone.decoder.blocks      │ ModuleList       │  3.2 M │
│ 14 │ model.backbone.segmentation_head   │ SegmentationHead │    145 │
│ 15 │ model.backbone.segmentation_head.0 │ Conv2d           │    145 │
│ 16 │ model.backbone.segmentation_head.1 │ Identity         │      0 │
│ 17 │ model.backbone.segmentation_head.2 │ Activation       │      0 │
└────┴────────────────────────────────────┴──────────────────┴────────┘
Trainable params: 14.3 M                                                                                           
Non-trainable params: 0                                                                                            
Total params: 14.3 M                                                                                               
Total estimated model params size (MB): 57                                                                         
/home/runner/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/lightning/pytorch/utilities/data.py:120: 
UserWarning: Your `IterableDataset` has `__len__` defined. In combination with multi-process data loading (when 
num_workers > 1), `__len__` could be inaccurate if each worker is not configured independently to avoid having 
duplicate data.
  rank_zero_warn(


---------------------------------------------------------------------------
CPLE_OpenFailedError                      Traceback (most recent call last)
File rasterio/_base.pyx:310, in rasterio._base.DatasetBase.__init__()

File rasterio/_base.pyx:221, in rasterio._base.open_dataset()

File rasterio/_err.pyx:221, in rasterio._err.exc_wrap_pointer()

CPLE_OpenFailedError: '/vsicurl/https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T11SLA.2021231T183919.v2.0/HLS.S30.T11SLA.2021231T183919.v2.0.B03.tif' not recognized as a supported file format.

During handling of the above exception, another exception occurred:

RasterioIOError                           Traceback (most recent call last)
File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/stackstac/rio_reader.py:326, in AutoParallelRioReader._open(self)
    325 try:
--> 326     ds = SelfCleaningDatasetReader(
    327         self.url, sharing=False
    328     )
    329 except Exception as e:

File rasterio/_base.pyx:312, in rasterio._base.DatasetBase.__init__()

RasterioIOError: '/vsicurl/https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T11SLA.2021231T183919.v2.0/HLS.S30.T11SLA.2021231T183919.v2.0.B03.tif' not recognized as a supported file format.

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
Cell In[6], line 2
      1 # fit
----> 2 trainer.fit(model=model, datamodule=datamodule)

File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/lightning/pytorch/trainer/trainer.py:532, in Trainer.fit(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
    530 self.strategy._lightning_module = model
    531 _verify_strategy_supports_compile(model, self.strategy)
--> 532 call._call_and_handle_interrupt(
    533     self, self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
    534 )

File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/lightning/pytorch/trainer/call.py:43, in _call_and_handle_interrupt(trainer, trainer_fn, *args, **kwargs)
     41     if trainer.strategy.launcher is not None:
     42         return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
---> 43     return trainer_fn(*args, **kwargs)
     45 except _TunerExitException:
     46     _call_teardown_hook(trainer)

File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/lightning/pytorch/trainer/trainer.py:571, in Trainer._fit_impl(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
    561 self._data_connector.attach_data(
    562     model, train_dataloaders=train_dataloaders, val_dataloaders=val_dataloaders, datamodule=datamodule
    563 )
    565 ckpt_path = self._checkpoint_connector._select_ckpt_path(
    566     self.state.fn,
    567     ckpt_path,
    568     model_provided=True,
    569     model_connected=self.lightning_module is not None,
    570 )
--> 571 self._run(model, ckpt_path=ckpt_path)
    573 assert self.state.stopped
    574 self.training = False

File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/lightning/pytorch/trainer/trainer.py:980, in Trainer._run(self, model, ckpt_path)
    975 self._signal_connector.register_signal_handlers()
    977 # ----------------------------
    978 # RUN THE TRAINER
    979 # ----------------------------
--> 980 results = self._run_stage()
    982 # ----------------------------
    983 # POST-Training CLEAN UP
    984 # ----------------------------
    985 log.debug(f"{self.__class__.__name__}: trainer tearing down")

File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/lightning/pytorch/trainer/trainer.py:1021, in Trainer._run_stage(self)
   1019 if self.training:
   1020     with isolate_rng():
-> 1021         self._run_sanity_check()
   1022     with torch.autograd.set_detect_anomaly(self._detect_anomaly):
   1023         self.fit_loop.run()

File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/lightning/pytorch/trainer/trainer.py:1050, in Trainer._run_sanity_check(self)
   1047 call._call_callback_hooks(self, "on_sanity_check_start")
   1049 # run eval step
-> 1050 val_loop.run()
   1052 call._call_callback_hooks(self, "on_sanity_check_end")
   1054 # reset logger connector

File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/lightning/pytorch/loops/utilities.py:181, in _no_grad_context.<locals>._decorator(self, *args, **kwargs)
    179     context_manager = torch.no_grad
    180 with context_manager():
--> 181     return loop_run(self, *args, **kwargs)

File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/lightning/pytorch/loops/evaluation_loop.py:108, in _EvaluationLoop.run(self)
    106 while True:
    107     try:
--> 108         batch, batch_idx, dataloader_idx = next(data_fetcher)
    109         self.batch_progress.is_last_batch = data_fetcher.done
    110         if previous_dataloader_idx != dataloader_idx:
    111             # the dataloader has changed, notify the logger connector

File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/lightning/pytorch/loops/fetchers.py:126, in _PrefetchDataFetcher.__next__(self)
    123         self.done = not self.batches
    124 elif not self.done:
    125     # this will run only when no pre-fetching was done.
--> 126     batch = super().__next__()
    127 else:
    128     # the iterator is empty
    129     raise StopIteration

File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/lightning/pytorch/loops/fetchers.py:58, in _DataFetcher.__next__(self)
     56 self._start_profiler()
     57 try:
---> 58     batch = next(iterator)
     59 except StopIteration:
     60     self.done = True

File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/lightning/pytorch/utilities/combined_loader.py:285, in CombinedLoader.__next__(self)
    283 def __next__(self) -> Any:
    284     assert self._iterator is not None
--> 285     out = next(self._iterator)
    286     if isinstance(self._iterator, _Sequential):
    287         return out

File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/lightning/pytorch/utilities/combined_loader.py:123, in _Sequential.__next__(self)
    120             raise StopIteration
    122 try:
--> 123     out = next(self.iterators[0])
    124     index = self._idx
    125     self._idx += 1

File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/torch/utils/data/dataloader.py:634, in _BaseDataLoaderIter.__next__(self)
    631 if self._sampler_iter is None:
    632     # TODO(https://github.com/pytorch/pytorch/issues/76750)
    633     self._reset()  # type: ignore[call-arg]
--> 634 data = self._next_data()
    635 self._num_yielded += 1
    636 if self._dataset_kind == _DatasetKind.Iterable and \
    637         self._IterableDataset_len_called is not None and \
    638         self._num_yielded > self._IterableDataset_len_called:

File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/torch/utils/data/dataloader.py:678, in _SingleProcessDataLoaderIter._next_data(self)
    676 def _next_data(self):
    677     index = self._next_index()  # may raise StopIteration
--> 678     data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
    679     if self._pin_memory:
    680         data = _utils.pin_memory.pin_memory(data, self._pin_memory_device)

File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py:41, in _IterableDatasetFetcher.fetch(self, possibly_batched_index)
     39         raise StopIteration
     40 else:
---> 41     data = next(self.dataset_iter)
     42 return self.collate_fn(data)

File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/torch/utils/data/datapipes/_hook_iterator.py:144, in hook_iterator.<locals>.IteratorDecorator.__next__(self)
    142         return self._get_next()
    143 else:  # Decided against using `contextlib.nullcontext` for performance reasons
--> 144     return self._get_next()

File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/torch/utils/data/datapipes/_hook_iterator.py:132, in hook_iterator.<locals>.IteratorDecorator._get_next(self)
    128 r"""
    129 Return next with logic related to iterator validity, profiler, and incrementation of samples yielded.
    130 """
    131 _check_iterator_valid(self.source_dp, self.iterator_id)
--> 132 result = next(self.iterator)
    133 if not self.self_and_has_next_method:
    134     self.source_dp._number_of_samples_yielded += 1

File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/torch/utils/data/datapipes/_hook_iterator.py:215, in hook_iterator.<locals>.wrap_next(*args, **kwargs)
    213         result = next_func(*args, **kwargs)
    214 else:
--> 215     result = next_func(*args, **kwargs)
    216 datapipe = args[0]
    217 datapipe._number_of_samples_yielded += 1

File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/torch/utils/data/datapipes/datapipe.py:369, in _IterDataPipeSerializationWrapper.__next__(self)
    367 def __next__(self) -> T_co:
    368     assert self._datapipe_iter is not None
--> 369     return next(self._datapipe_iter)

File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/torch/utils/data/datapipes/_hook_iterator.py:173, in hook_iterator.<locals>.wrap_generator(*args, **kwargs)
    171         response = gen.send(None)
    172 else:
--> 173     response = gen.send(None)
    175 while True:
    176     datapipe._number_of_samples_yielded += 1

File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/torchdata/datapipes/iter/util/randomsplitter.py:184, in SplitterIterator.__iter__(self)
    182 def __iter__(self):
    183     self.main_datapipe.reset()
--> 184     for sample in self.main_datapipe.source_datapipe:
    185         if self.main_datapipe.draw() == self.target:
    186             yield sample

File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/torch/utils/data/datapipes/_hook_iterator.py:173, in hook_iterator.<locals>.wrap_generator(*args, **kwargs)
    171         response = gen.send(None)
    172 else:
--> 173     response = gen.send(None)
    175 while True:
    176     datapipe._number_of_samples_yielded += 1

File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/torch/utils/data/datapipes/iter/callable.py:122, in MapperIterDataPipe.__iter__(self)
    121 def __iter__(self) -> Iterator[T_co]:
--> 122     for data in self.datapipe:
    123         yield self._apply_fn(data)

File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/torch/utils/data/datapipes/_hook_iterator.py:173, in hook_iterator.<locals>.wrap_generator(*args, **kwargs)
    171         response = gen.send(None)
    172 else:
--> 173     response = gen.send(None)
    175 while True:
    176     datapipe._number_of_samples_yielded += 1

File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/torch/utils/data/datapipes/iter/grouping.py:70, in BatcherIterDataPipe.__iter__(self)
     68 def __iter__(self) -> Iterator[DataChunk]:
     69     batch: List = []
---> 70     for x in self.datapipe:
     71         batch.append(x)
     72         if len(batch) == self.batch_size:

File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/torch/utils/data/datapipes/_hook_iterator.py:173, in hook_iterator.<locals>.wrap_generator(*args, **kwargs)
    171         response = gen.send(None)
    172 else:
--> 173     response = gen.send(None)
    175 while True:
    176     datapipe._number_of_samples_yielded += 1

File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/torch/utils/data/datapipes/iter/combining.py:589, in ZipperIterDataPipe.__iter__(self)
    587 def __iter__(self) -> Iterator[Tuple[T_co]]:
    588     iterators = [iter(datapipe) for datapipe in self.datapipes]
--> 589     yield from zip(*iterators)

File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/torch/utils/data/datapipes/_hook_iterator.py:144, in hook_iterator.<locals>.IteratorDecorator.__next__(self)
    142         return self._get_next()
    143 else:  # Decided against using `contextlib.nullcontext` for performance reasons
--> 144     return self._get_next()

File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/torch/utils/data/datapipes/_hook_iterator.py:132, in hook_iterator.<locals>.IteratorDecorator._get_next(self)
    128 r"""
    129 Return next with logic related to iterator validity, profiler, and incrementation of samples yielded.
    130 """
    131 _check_iterator_valid(self.source_dp, self.iterator_id)
--> 132 result = next(self.iterator)
    133 if not self.self_and_has_next_method:
    134     self.source_dp._number_of_samples_yielded += 1

File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/torch/utils/data/datapipes/iter/combining.py:163, in _ForkerIterDataPipe.get_next_element_by_instance(self, instance_id)
    161 self.leading_ptr = self.child_pointers[instance_id]
    162 try:
--> 163     return_val = next(self._datapipe_iterator)  # type: ignore[arg-type]
    164     self.buffer.append(return_val)
    165 except StopIteration:

File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/torch/utils/data/datapipes/_hook_iterator.py:173, in hook_iterator.<locals>.wrap_generator(*args, **kwargs)
    171         response = gen.send(None)
    172 else:
--> 173     response = gen.send(None)
    175 while True:
    176     datapipe._number_of_samples_yielded += 1

File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/zen3geo/datapipes/xbatcher.py:107, in XbatcherSlicerIterDataPipe.__iter__(self)
    105 def __iter__(self) -> Iterator[Union[xr.DataArray, xr.Dataset]]:
    106     for dataarray in self.source_datapipe:
--> 107         for chip in dataarray.batch.generator(
    108             input_dims=self.input_dims, **self.kwargs
    109         ):
    110             yield chip

File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/xbatcher/generators.py:416, in BatchGenerator.__iter__(self)
    414 def __iter__(self) -> Iterator[Union[xr.DataArray, xr.Dataset]]:
    415     for idx in self._batch_selectors.selectors:
--> 416         yield self[idx]

File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/xbatcher/generators.py:461, in BatchGenerator.__getitem__(self, idx)
    459         batch_ds = self.ds.isel(self._batch_selectors.selectors[idx][0])
    460         if self.preload_batch:
--> 461             batch_ds.load()
    462         return _maybe_stack_batch_dims(
    463             batch_ds,
    464             list(self.input_dims),
    465         )
    466 else:

File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/xarray/core/dataarray.py:1108, in DataArray.load(self, **kwargs)
   1090 def load(self: T_DataArray, **kwargs) -> T_DataArray:
   1091     """Manually trigger loading of this array's data from disk or a
   1092     remote source into memory and return this array.
   1093 
   (...)
   1106     dask.compute
   1107     """
-> 1108     ds = self._to_temp_dataset().load(**kwargs)
   1109     new = self._from_temp_dataset(ds)
   1110     self._variable = new._variable

File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/xarray/core/dataset.py:825, in Dataset.load(self, **kwargs)
    822 chunkmanager = get_chunked_array_type(*lazy_data.values())
    824 # evaluate all the chunked arrays simultaneously
--> 825 evaluated_data = chunkmanager.compute(*lazy_data.values(), **kwargs)
    827 for k, data in zip(lazy_data, evaluated_data):
    828     self.variables[k].data = data

File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/xarray/core/daskmanager.py:70, in DaskManager.compute(self, *data, **kwargs)
     67 def compute(self, *data: DaskArray, **kwargs) -> tuple[np.ndarray, ...]:
     68     from dask.array import compute
---> 70     return compute(*data, **kwargs)

File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/dask/threaded.py:89, in get(dsk, keys, cache, num_workers, pool, **kwargs)
     86     elif isinstance(pool, multiprocessing.pool.Pool):
     87         pool = MultiprocessingPoolExecutor(pool)
---> 89 results = get_async(
     90     pool.submit,
     91     pool._max_workers,
     92     dsk,
     93     keys,
     94     cache=cache,
     95     get_id=_thread_get_id,
     96     pack_exception=pack_exception,
     97     **kwargs,
     98 )
    100 # Cleanup pools associated to dead threads
    101 with pools_lock:

File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/dask/local.py:511, in get_async(submit, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, chunksize, **kwargs)
    509         _execute_task(task, data)  # Re-execute locally
    510     else:
--> 511         raise_exception(exc, tb)
    512 res, worker_id = loads(res_info)
    513 state["cache"][key] = res

File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/dask/local.py:319, in reraise(exc, tb)
    317 if exc.__traceback__ is not tb:
    318     raise exc.with_traceback(tb)
--> 319 raise exc

File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/dask/local.py:224, in execute_task(key, task_info, dumps, loads, get_id, pack_exception)
    222 try:
    223     task, data = loads(task_info)
--> 224     result = _execute_task(task, data)
    225     id = get_id()
    226     result = dumps((result, id))

File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/stackstac/to_dask.py:185, in fetch_raster_window(reader_table, slices, dtype, fill_value)
    178 # Only read if the window we're fetching actually overlaps with the asset
    179 if windows.intersect(current_window, asset_window):
    180     # NOTE: when there are multiple assets, we _could_ parallelize these reads with our own threadpool.
    181     # However, that would probably increase memory usage, since the internal, thread-local GDAL datasets
    182     # would end up copied to even more threads.
    183 
    184     # TODO when the Reader won't be rescaling, support passing `output` to avoid the copy?
--> 185     data = reader.read(current_window)
    187     if all_empty:
    188         # Turn `output` from a broadcast-trick array to a real array, so it's writeable
    189         if (
    190             np.isnan(data)
    191             if np.isnan(fill_value)
    192             else np.equal(data, fill_value)
    193         ).all():
    194             # Unless the data we just read is all empty anyway

File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/stackstac/rio_reader.py:385, in AutoParallelRioReader.read(self, window, **kwargs)
    384 def read(self, window: Window, **kwargs) -> np.ndarray:
--> 385     reader = self.dataset
    386     try:
    387         result = reader.read(
    388             window=window,
    389             masked=True,
   (...)
    392             **kwargs,
    393         )

File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/stackstac/rio_reader.py:381, in AutoParallelRioReader.dataset(self)
    379 with self._dataset_lock:
    380     if self._dataset is None:
--> 381         self._dataset = self._open()
    382     return self._dataset

File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/stackstac/rio_reader.py:337, in AutoParallelRioReader._open(self)
    332             warnings.warn(msg)
    333             return NodataReader(
    334                 dtype=self.dtype, fill_value=self.fill_value
    335             )
--> 337         raise RuntimeError(msg) from e
    338 if ds.count != 1:
    339     ds.close()

RuntimeError: Error opening 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T11SLA.2021231T183919.v2.0/HLS.S30.T11SLA.2021231T183919.v2.0.B03.tif': RasterioIOError("'/vsicurl/https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T11SLA.2021231T183919.v2.0/HLS.S30.T11SLA.2021231T183919.v2.0.B03.tif' not recognized as a supported file format.")
This exception is thrown by __iter__ of XbatcherSlicerIterDataPipe(input_dims={'time': 1, 'y': 512, 'x': 512}, kwargs={}, source_datapipe=StackSTACStackerIterDataPipe)

Finally, we will test the experiments we trained. The logged results are viewable in the W&B dashboard.

# test
# trainer.test(model=model, datamodule=datamodule)