Experiment Configuration with Hydra#
In this lesson, we will configure experiments for our PyTorch Lightning training workflow. This will allow us to test various combinations of hyperparameter settings and assess relative model performance.
to set up the jupyter env, set up hatch as described in the readme, then:
hatch -e nb shell
exit
jupyter lab
Let’s start by importing the required packages we will use.
Notice that at this point, our data module and custom model module that we created from the prior lessons are now abstracted enough such that we can import them as modules!
import os
from pathlib import Path
from typing import List
import hydra
import wandb
from omegaconf import OmegaConf
from lightning.pytorch.callbacks import Callback
from lightning.pytorch.loggers import CSVLogger, WandbLogger
from ml_pipeline.datasets.datamodule import BurnScarsDataModule
from ml_pipeline.model.lightningmodule import BurnScarsSegmentationModel
Now we will configure Hydra, an experiment configuration tool. Hydra leverages configuration files for groups of related, tunable hyperparameters, allowing you to easily organize, version control, and automate multiple experiment configurations.
with hydra.initialize(config_path="../../config", version_base="1.3.0"):
cfg = hydra.compose(
config_name="config",
overrides=["seed=0", "author=devseed", "name=test-exp-nb-1"],
return_hydra_config=True,
)
print(OmegaConf.to_yaml(cfg))
hydra:
run:
dir: logs/runs/${now:%Y-%m-%d}/${now:%H-%M-%S}
sweep:
dir: logs/multiruns/${now:%Y-%m-%d}/${now:%H-%M-%S}
subdir: ${hydra.job.num}
launcher:
_target_: hydra._internal.core_plugins.basic_launcher.BasicLauncher
sweeper:
_target_: hydra._internal.core_plugins.basic_sweeper.BasicSweeper
max_batch_size: null
params: null
help:
app_name: ${hydra.job.name}
header: '${hydra.help.app_name} is powered by Hydra.
'
footer: 'Powered by Hydra (https://hydra.cc)
Use --hydra-help to view Hydra specific help
'
template: '${hydra.help.header}
== Configuration groups ==
Compose your configuration from those groups (group=option)
$APP_CONFIG_GROUPS
== Config ==
Override anything in the config (foo.bar=value)
$CONFIG
${hydra.help.footer}
'
hydra_help:
template: 'Hydra (${hydra.runtime.version})
See https://hydra.cc for more info.
== Flags ==
$FLAGS_HELP
== Configuration groups ==
Compose your configuration from those groups (For example, append hydra/job_logging=disabled
to command line)
$HYDRA_CONFIG_GROUPS
Use ''--cfg hydra'' to Show the Hydra config.
'
hydra_help: ???
hydra_logging:
version: 1
formatters:
simple:
format: '[%(asctime)s][HYDRA] %(message)s'
handlers:
console:
class: logging.StreamHandler
formatter: simple
stream: ext://sys.stdout
root:
level: INFO
handlers:
- console
loggers:
logging_example:
level: DEBUG
disable_existing_loggers: false
job_logging:
version: 1
formatters:
simple:
format: '[%(asctime)s][%(name)s][%(levelname)s] - %(message)s'
handlers:
console:
class: logging.StreamHandler
formatter: simple
stream: ext://sys.stdout
file:
class: logging.FileHandler
formatter: simple
filename: ${hydra.runtime.output_dir}/${hydra.job.name}.log
root:
level: INFO
handlers:
- console
- file
disable_existing_loggers: false
env: {}
mode: null
searchpath: []
callbacks: {}
output_subdir: .hydra
overrides:
hydra: []
task:
- seed=0
- author=devseed
- name=test-exp-nb-1
job:
name: notebook
chdir: null
override_dirname: author=devseed,name=test-exp-nb-1,seed=0
id: ???
num: ???
config_name: config
env_set: {}
env_copy: []
config:
override_dirname:
kv_sep: '='
item_sep: ','
exclude_keys: []
runtime:
version: 1.3.2
version_base: '1.3'
cwd: /home/runner/work/ml-pipeline/ml-pipeline/jbook/docs
config_sources:
- path: hydra.conf
schema: pkg
provider: hydra
- path: /home/runner/work/ml-pipeline/ml-pipeline/config
schema: file
provider: main
- path: ''
schema: structured
provider: schema
output_dir: ???
choices:
callbacks: minimal
trainer: basic
datamodule: burnscars
model: burnscars-resnet18
hydra/env: default
hydra/callbacks: null
hydra/job_logging: default
hydra/hydra_logging: default
hydra/hydra_help: default
hydra/help: default
hydra/sweeper: basic
hydra/launcher: basic
hydra/output: default
verbose: false
seed: 0
author: devseed
name: test-exp-nb-1
work_dir: ${hydra.runtime.cwd}
data_dir: ${hydra.runtime.cwd}/data
model:
encoder_name: resnet18
encoder_depth: 5
encoder_weights: null
in_channels: 5
classes: 1
activation: null
lr: 0.001
datamodule:
image_query:
bbox:
- -119.1
- 36.2
- -118.2
- 36.9
datetime:
- '2021-08-15T00:00:00Z'
- '2021-09-15T23:59:59Z'
collections:
- HLSS30.v2.0
vector_url: https://gist.githubusercontent.com/weiji14/286032ac2498d10e050ba585257dd50d/raw/c897c7c1b3b8354ec8c6e8327df38fcfee79b4ef/burn_scars.geojson
batch_size: 4
trainer:
_target_: lightning.pytorch.Trainer
accelerator: auto
min_epochs: 1
max_epochs: 1
limit_train_batches: 3
limit_val_batches: 3
log_every_n_steps: 1
callbacks:
model_summary:
_target_: lightning.pytorch.callbacks.RichModelSummary
max_depth: 4
rich_progress_bar:
_target_: lightning.pytorch.callbacks.RichProgressBar
model_checkpoint:
_target_: lightning.pytorch.callbacks.ModelCheckpoint
monitor: val_loss
mode: min
save_top_k: 1
save_last: true
verbose: true
dirpath: ${work_dir}/logs/checkpoint/
filename: ${name}-epoch={epoch:02d}-val_loss={val_loss:.4f}
auto_insert_metric_name: false
learning_rate_monitor:
_target_: lightning.pytorch.callbacks.LearningRateMonitor
logging_interval: step
We will also authenticate and configure our Weights and Biases (W&B) logger. W&B is an ML model visualization platform used for examining graphs of metrics and comparing inputs, intermediates and output results.
# store experiment logs inside logs/
cwd = os.getcwd()
(Path(cwd) / "logs").mkdir(exist_ok=True)
# loggers
csv_logger = CSVLogger(save_dir="logs", name=cfg.name)
wandb_logger = wandb_logger = WandbLogger(
name=cfg.name,
save_dir="logs",
offline=True, # set to False if logging online
project="ml-pipeline",
entity="nasa-impact",
log_model=False,
)
wandb: WARNING `resume` will be ignored since W&B syncing is set to `offline`. Starting a new run with run id 0orso6nb.
wandb: Tracking run with wandb version 0.15.9
wandb: W&B syncing is set to `offline` in this directory.
wandb: Run `wandb online` or set WANDB_MODE=online to enable cloud syncing.
This establishes a callback to set up the experiment parameters with Hydra during Pytorch Lightning model training.
# callbacks
callbacks: List[Callback] = []
if "callbacks" in cfg:
for _, cb_conf in cfg.callbacks.items():
if "_target_" in cb_conf:
callbacks.append(hydra.utils.instantiate(cb_conf))
Now let’s compile everything together. Hydra will examine what is in
config_path
specified earlier to know which combinations of experiments to
run.
datamodule = BurnScarsDataModule(**cfg.datamodule) # datamodule
model = BurnScarsSegmentationModel(**cfg.model) # model
trainer = hydra.utils.instantiate(
config=cfg.trainer,
callbacks=callbacks,
logger=[csv_logger, wandb_logger],
) # trainer
Trainer already configured with model summary callbacks: [<class 'lightning.pytorch.callbacks.rich_model_summary.RichModelSummary'>]. Skipping setting a default `ModelSummary` callback.
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Let’s train some experiments.
# fit
trainer.fit(model=model, datamodule=datamodule)
┏━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓ ┃ ┃ Name ┃ Type ┃ Params ┃ ┡━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩ │ 0 │ model │ Backbone │ 14.3 M │ │ 1 │ model.backbone │ Unet │ 14.3 M │ │ 2 │ model.backbone.encoder │ ResNetEncoder │ 11.2 M │ │ 3 │ model.backbone.encoder.conv1 │ Conv2d │ 15.7 K │ │ 4 │ model.backbone.encoder.bn1 │ BatchNorm2d │ 128 │ │ 5 │ model.backbone.encoder.relu │ ReLU │ 0 │ │ 6 │ model.backbone.encoder.maxpool │ MaxPool2d │ 0 │ │ 7 │ model.backbone.encoder.layer1 │ Sequential │ 147 K │ │ 8 │ model.backbone.encoder.layer2 │ Sequential │ 525 K │ │ 9 │ model.backbone.encoder.layer3 │ Sequential │ 2.1 M │ │ 10 │ model.backbone.encoder.layer4 │ Sequential │ 8.4 M │ │ 11 │ model.backbone.decoder │ UnetDecoder │ 3.2 M │ │ 12 │ model.backbone.decoder.center │ Identity │ 0 │ │ 13 │ model.backbone.decoder.blocks │ ModuleList │ 3.2 M │ │ 14 │ model.backbone.segmentation_head │ SegmentationHead │ 145 │ │ 15 │ model.backbone.segmentation_head.0 │ Conv2d │ 145 │ │ 16 │ model.backbone.segmentation_head.1 │ Identity │ 0 │ │ 17 │ model.backbone.segmentation_head.2 │ Activation │ 0 │ └────┴────────────────────────────────────┴──────────────────┴────────┘
Trainable params: 14.3 M Non-trainable params: 0 Total params: 14.3 M Total estimated model params size (MB): 57
/home/runner/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/lightning/pytorch/utilities/data.py:120: UserWarning: Your `IterableDataset` has `__len__` defined. In combination with multi-process data loading (when num_workers > 1), `__len__` could be inaccurate if each worker is not configured independently to avoid having duplicate data. rank_zero_warn(
---------------------------------------------------------------------------
CPLE_OpenFailedError Traceback (most recent call last)
File rasterio/_base.pyx:310, in rasterio._base.DatasetBase.__init__()
File rasterio/_base.pyx:221, in rasterio._base.open_dataset()
File rasterio/_err.pyx:221, in rasterio._err.exc_wrap_pointer()
CPLE_OpenFailedError: '/vsicurl/https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T11SLA.2021231T183919.v2.0/HLS.S30.T11SLA.2021231T183919.v2.0.B03.tif' not recognized as a supported file format.
During handling of the above exception, another exception occurred:
RasterioIOError Traceback (most recent call last)
File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/stackstac/rio_reader.py:326, in AutoParallelRioReader._open(self)
325 try:
--> 326 ds = SelfCleaningDatasetReader(
327 self.url, sharing=False
328 )
329 except Exception as e:
File rasterio/_base.pyx:312, in rasterio._base.DatasetBase.__init__()
RasterioIOError: '/vsicurl/https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T11SLA.2021231T183919.v2.0/HLS.S30.T11SLA.2021231T183919.v2.0.B03.tif' not recognized as a supported file format.
The above exception was the direct cause of the following exception:
RuntimeError Traceback (most recent call last)
Cell In[6], line 2
1 # fit
----> 2 trainer.fit(model=model, datamodule=datamodule)
File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/lightning/pytorch/trainer/trainer.py:532, in Trainer.fit(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
530 self.strategy._lightning_module = model
531 _verify_strategy_supports_compile(model, self.strategy)
--> 532 call._call_and_handle_interrupt(
533 self, self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
534 )
File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/lightning/pytorch/trainer/call.py:43, in _call_and_handle_interrupt(trainer, trainer_fn, *args, **kwargs)
41 if trainer.strategy.launcher is not None:
42 return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
---> 43 return trainer_fn(*args, **kwargs)
45 except _TunerExitException:
46 _call_teardown_hook(trainer)
File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/lightning/pytorch/trainer/trainer.py:571, in Trainer._fit_impl(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
561 self._data_connector.attach_data(
562 model, train_dataloaders=train_dataloaders, val_dataloaders=val_dataloaders, datamodule=datamodule
563 )
565 ckpt_path = self._checkpoint_connector._select_ckpt_path(
566 self.state.fn,
567 ckpt_path,
568 model_provided=True,
569 model_connected=self.lightning_module is not None,
570 )
--> 571 self._run(model, ckpt_path=ckpt_path)
573 assert self.state.stopped
574 self.training = False
File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/lightning/pytorch/trainer/trainer.py:980, in Trainer._run(self, model, ckpt_path)
975 self._signal_connector.register_signal_handlers()
977 # ----------------------------
978 # RUN THE TRAINER
979 # ----------------------------
--> 980 results = self._run_stage()
982 # ----------------------------
983 # POST-Training CLEAN UP
984 # ----------------------------
985 log.debug(f"{self.__class__.__name__}: trainer tearing down")
File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/lightning/pytorch/trainer/trainer.py:1021, in Trainer._run_stage(self)
1019 if self.training:
1020 with isolate_rng():
-> 1021 self._run_sanity_check()
1022 with torch.autograd.set_detect_anomaly(self._detect_anomaly):
1023 self.fit_loop.run()
File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/lightning/pytorch/trainer/trainer.py:1050, in Trainer._run_sanity_check(self)
1047 call._call_callback_hooks(self, "on_sanity_check_start")
1049 # run eval step
-> 1050 val_loop.run()
1052 call._call_callback_hooks(self, "on_sanity_check_end")
1054 # reset logger connector
File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/lightning/pytorch/loops/utilities.py:181, in _no_grad_context.<locals>._decorator(self, *args, **kwargs)
179 context_manager = torch.no_grad
180 with context_manager():
--> 181 return loop_run(self, *args, **kwargs)
File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/lightning/pytorch/loops/evaluation_loop.py:108, in _EvaluationLoop.run(self)
106 while True:
107 try:
--> 108 batch, batch_idx, dataloader_idx = next(data_fetcher)
109 self.batch_progress.is_last_batch = data_fetcher.done
110 if previous_dataloader_idx != dataloader_idx:
111 # the dataloader has changed, notify the logger connector
File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/lightning/pytorch/loops/fetchers.py:126, in _PrefetchDataFetcher.__next__(self)
123 self.done = not self.batches
124 elif not self.done:
125 # this will run only when no pre-fetching was done.
--> 126 batch = super().__next__()
127 else:
128 # the iterator is empty
129 raise StopIteration
File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/lightning/pytorch/loops/fetchers.py:58, in _DataFetcher.__next__(self)
56 self._start_profiler()
57 try:
---> 58 batch = next(iterator)
59 except StopIteration:
60 self.done = True
File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/lightning/pytorch/utilities/combined_loader.py:285, in CombinedLoader.__next__(self)
283 def __next__(self) -> Any:
284 assert self._iterator is not None
--> 285 out = next(self._iterator)
286 if isinstance(self._iterator, _Sequential):
287 return out
File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/lightning/pytorch/utilities/combined_loader.py:123, in _Sequential.__next__(self)
120 raise StopIteration
122 try:
--> 123 out = next(self.iterators[0])
124 index = self._idx
125 self._idx += 1
File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/torch/utils/data/dataloader.py:634, in _BaseDataLoaderIter.__next__(self)
631 if self._sampler_iter is None:
632 # TODO(https://github.com/pytorch/pytorch/issues/76750)
633 self._reset() # type: ignore[call-arg]
--> 634 data = self._next_data()
635 self._num_yielded += 1
636 if self._dataset_kind == _DatasetKind.Iterable and \
637 self._IterableDataset_len_called is not None and \
638 self._num_yielded > self._IterableDataset_len_called:
File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/torch/utils/data/dataloader.py:678, in _SingleProcessDataLoaderIter._next_data(self)
676 def _next_data(self):
677 index = self._next_index() # may raise StopIteration
--> 678 data = self._dataset_fetcher.fetch(index) # may raise StopIteration
679 if self._pin_memory:
680 data = _utils.pin_memory.pin_memory(data, self._pin_memory_device)
File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py:41, in _IterableDatasetFetcher.fetch(self, possibly_batched_index)
39 raise StopIteration
40 else:
---> 41 data = next(self.dataset_iter)
42 return self.collate_fn(data)
File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/torch/utils/data/datapipes/_hook_iterator.py:144, in hook_iterator.<locals>.IteratorDecorator.__next__(self)
142 return self._get_next()
143 else: # Decided against using `contextlib.nullcontext` for performance reasons
--> 144 return self._get_next()
File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/torch/utils/data/datapipes/_hook_iterator.py:132, in hook_iterator.<locals>.IteratorDecorator._get_next(self)
128 r"""
129 Return next with logic related to iterator validity, profiler, and incrementation of samples yielded.
130 """
131 _check_iterator_valid(self.source_dp, self.iterator_id)
--> 132 result = next(self.iterator)
133 if not self.self_and_has_next_method:
134 self.source_dp._number_of_samples_yielded += 1
File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/torch/utils/data/datapipes/_hook_iterator.py:215, in hook_iterator.<locals>.wrap_next(*args, **kwargs)
213 result = next_func(*args, **kwargs)
214 else:
--> 215 result = next_func(*args, **kwargs)
216 datapipe = args[0]
217 datapipe._number_of_samples_yielded += 1
File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/torch/utils/data/datapipes/datapipe.py:369, in _IterDataPipeSerializationWrapper.__next__(self)
367 def __next__(self) -> T_co:
368 assert self._datapipe_iter is not None
--> 369 return next(self._datapipe_iter)
File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/torch/utils/data/datapipes/_hook_iterator.py:173, in hook_iterator.<locals>.wrap_generator(*args, **kwargs)
171 response = gen.send(None)
172 else:
--> 173 response = gen.send(None)
175 while True:
176 datapipe._number_of_samples_yielded += 1
File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/torchdata/datapipes/iter/util/randomsplitter.py:184, in SplitterIterator.__iter__(self)
182 def __iter__(self):
183 self.main_datapipe.reset()
--> 184 for sample in self.main_datapipe.source_datapipe:
185 if self.main_datapipe.draw() == self.target:
186 yield sample
File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/torch/utils/data/datapipes/_hook_iterator.py:173, in hook_iterator.<locals>.wrap_generator(*args, **kwargs)
171 response = gen.send(None)
172 else:
--> 173 response = gen.send(None)
175 while True:
176 datapipe._number_of_samples_yielded += 1
File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/torch/utils/data/datapipes/iter/callable.py:122, in MapperIterDataPipe.__iter__(self)
121 def __iter__(self) -> Iterator[T_co]:
--> 122 for data in self.datapipe:
123 yield self._apply_fn(data)
File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/torch/utils/data/datapipes/_hook_iterator.py:173, in hook_iterator.<locals>.wrap_generator(*args, **kwargs)
171 response = gen.send(None)
172 else:
--> 173 response = gen.send(None)
175 while True:
176 datapipe._number_of_samples_yielded += 1
File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/torch/utils/data/datapipes/iter/grouping.py:70, in BatcherIterDataPipe.__iter__(self)
68 def __iter__(self) -> Iterator[DataChunk]:
69 batch: List = []
---> 70 for x in self.datapipe:
71 batch.append(x)
72 if len(batch) == self.batch_size:
File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/torch/utils/data/datapipes/_hook_iterator.py:173, in hook_iterator.<locals>.wrap_generator(*args, **kwargs)
171 response = gen.send(None)
172 else:
--> 173 response = gen.send(None)
175 while True:
176 datapipe._number_of_samples_yielded += 1
File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/torch/utils/data/datapipes/iter/combining.py:589, in ZipperIterDataPipe.__iter__(self)
587 def __iter__(self) -> Iterator[Tuple[T_co]]:
588 iterators = [iter(datapipe) for datapipe in self.datapipes]
--> 589 yield from zip(*iterators)
File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/torch/utils/data/datapipes/_hook_iterator.py:144, in hook_iterator.<locals>.IteratorDecorator.__next__(self)
142 return self._get_next()
143 else: # Decided against using `contextlib.nullcontext` for performance reasons
--> 144 return self._get_next()
File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/torch/utils/data/datapipes/_hook_iterator.py:132, in hook_iterator.<locals>.IteratorDecorator._get_next(self)
128 r"""
129 Return next with logic related to iterator validity, profiler, and incrementation of samples yielded.
130 """
131 _check_iterator_valid(self.source_dp, self.iterator_id)
--> 132 result = next(self.iterator)
133 if not self.self_and_has_next_method:
134 self.source_dp._number_of_samples_yielded += 1
File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/torch/utils/data/datapipes/iter/combining.py:163, in _ForkerIterDataPipe.get_next_element_by_instance(self, instance_id)
161 self.leading_ptr = self.child_pointers[instance_id]
162 try:
--> 163 return_val = next(self._datapipe_iterator) # type: ignore[arg-type]
164 self.buffer.append(return_val)
165 except StopIteration:
File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/torch/utils/data/datapipes/_hook_iterator.py:173, in hook_iterator.<locals>.wrap_generator(*args, **kwargs)
171 response = gen.send(None)
172 else:
--> 173 response = gen.send(None)
175 while True:
176 datapipe._number_of_samples_yielded += 1
File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/zen3geo/datapipes/xbatcher.py:107, in XbatcherSlicerIterDataPipe.__iter__(self)
105 def __iter__(self) -> Iterator[Union[xr.DataArray, xr.Dataset]]:
106 for dataarray in self.source_datapipe:
--> 107 for chip in dataarray.batch.generator(
108 input_dims=self.input_dims, **self.kwargs
109 ):
110 yield chip
File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/xbatcher/generators.py:416, in BatchGenerator.__iter__(self)
414 def __iter__(self) -> Iterator[Union[xr.DataArray, xr.Dataset]]:
415 for idx in self._batch_selectors.selectors:
--> 416 yield self[idx]
File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/xbatcher/generators.py:461, in BatchGenerator.__getitem__(self, idx)
459 batch_ds = self.ds.isel(self._batch_selectors.selectors[idx][0])
460 if self.preload_batch:
--> 461 batch_ds.load()
462 return _maybe_stack_batch_dims(
463 batch_ds,
464 list(self.input_dims),
465 )
466 else:
File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/xarray/core/dataarray.py:1108, in DataArray.load(self, **kwargs)
1090 def load(self: T_DataArray, **kwargs) -> T_DataArray:
1091 """Manually trigger loading of this array's data from disk or a
1092 remote source into memory and return this array.
1093
(...)
1106 dask.compute
1107 """
-> 1108 ds = self._to_temp_dataset().load(**kwargs)
1109 new = self._from_temp_dataset(ds)
1110 self._variable = new._variable
File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/xarray/core/dataset.py:825, in Dataset.load(self, **kwargs)
822 chunkmanager = get_chunked_array_type(*lazy_data.values())
824 # evaluate all the chunked arrays simultaneously
--> 825 evaluated_data = chunkmanager.compute(*lazy_data.values(), **kwargs)
827 for k, data in zip(lazy_data, evaluated_data):
828 self.variables[k].data = data
File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/xarray/core/daskmanager.py:70, in DaskManager.compute(self, *data, **kwargs)
67 def compute(self, *data: DaskArray, **kwargs) -> tuple[np.ndarray, ...]:
68 from dask.array import compute
---> 70 return compute(*data, **kwargs)
File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/dask/threaded.py:89, in get(dsk, keys, cache, num_workers, pool, **kwargs)
86 elif isinstance(pool, multiprocessing.pool.Pool):
87 pool = MultiprocessingPoolExecutor(pool)
---> 89 results = get_async(
90 pool.submit,
91 pool._max_workers,
92 dsk,
93 keys,
94 cache=cache,
95 get_id=_thread_get_id,
96 pack_exception=pack_exception,
97 **kwargs,
98 )
100 # Cleanup pools associated to dead threads
101 with pools_lock:
File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/dask/local.py:511, in get_async(submit, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, chunksize, **kwargs)
509 _execute_task(task, data) # Re-execute locally
510 else:
--> 511 raise_exception(exc, tb)
512 res, worker_id = loads(res_info)
513 state["cache"][key] = res
File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/dask/local.py:319, in reraise(exc, tb)
317 if exc.__traceback__ is not tb:
318 raise exc.with_traceback(tb)
--> 319 raise exc
File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/dask/local.py:224, in execute_task(key, task_info, dumps, loads, get_id, pack_exception)
222 try:
223 task, data = loads(task_info)
--> 224 result = _execute_task(task, data)
225 id = get_id()
226 result = dumps((result, id))
File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/stackstac/to_dask.py:185, in fetch_raster_window(reader_table, slices, dtype, fill_value)
178 # Only read if the window we're fetching actually overlaps with the asset
179 if windows.intersect(current_window, asset_window):
180 # NOTE: when there are multiple assets, we _could_ parallelize these reads with our own threadpool.
181 # However, that would probably increase memory usage, since the internal, thread-local GDAL datasets
182 # would end up copied to even more threads.
183
184 # TODO when the Reader won't be rescaling, support passing `output` to avoid the copy?
--> 185 data = reader.read(current_window)
187 if all_empty:
188 # Turn `output` from a broadcast-trick array to a real array, so it's writeable
189 if (
190 np.isnan(data)
191 if np.isnan(fill_value)
192 else np.equal(data, fill_value)
193 ).all():
194 # Unless the data we just read is all empty anyway
File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/stackstac/rio_reader.py:385, in AutoParallelRioReader.read(self, window, **kwargs)
384 def read(self, window: Window, **kwargs) -> np.ndarray:
--> 385 reader = self.dataset
386 try:
387 result = reader.read(
388 window=window,
389 masked=True,
(...)
392 **kwargs,
393 )
File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/stackstac/rio_reader.py:381, in AutoParallelRioReader.dataset(self)
379 with self._dataset_lock:
380 if self._dataset is None:
--> 381 self._dataset = self._open()
382 return self._dataset
File ~/micromamba-root/envs/mlpipeline/lib/python3.9/site-packages/stackstac/rio_reader.py:337, in AutoParallelRioReader._open(self)
332 warnings.warn(msg)
333 return NodataReader(
334 dtype=self.dtype, fill_value=self.fill_value
335 )
--> 337 raise RuntimeError(msg) from e
338 if ds.count != 1:
339 ds.close()
RuntimeError: Error opening 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T11SLA.2021231T183919.v2.0/HLS.S30.T11SLA.2021231T183919.v2.0.B03.tif': RasterioIOError("'/vsicurl/https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T11SLA.2021231T183919.v2.0/HLS.S30.T11SLA.2021231T183919.v2.0.B03.tif' not recognized as a supported file format.")
This exception is thrown by __iter__ of XbatcherSlicerIterDataPipe(input_dims={'time': 1, 'y': 512, 'x': 512}, kwargs={}, source_datapipe=StackSTACStackerIterDataPipe)
Finally, we will test the experiments we trained. The logged results are viewable in the W&B dashboard.
# test
# trainer.test(model=model, datamodule=datamodule)