Event Analysis¶
This guide explains how to use dominosee for analyzing time series events.
Introduction¶
Event analysis is a fundamental part of understanding hydroclimatic phenomena. dominosee provides tools to select, analyze, and visualize event data across temporal and spatial dimensions.
Basic Event Selection¶
For numpy array-based data, the select_first_period
function allows you to extract the initial phases of events based on their duration:
import numpy as np
import matplotlib.pyplot as plt
from dominosee import select_first_period
# Create sample data (20 time steps, 5 events)
events = np.random.randn(20, 5)
# Define event durations
durations = np.array([5, 10, 7, 12, 6])
# Select the first 3 days of each event
first_period = select_first_period(events, durations, days=3)
# Visualize the results
plt.figure(figsize=(10, 6))
for i in range(5):
plt.plot(first_period[:, i], label=f'Event {i+1}')
plt.legend()
plt.title('First 3 Days of Events')
plt.xlabel('Time Step')
plt.ylabel('Value')
plt.grid(True)
plt.show()
Working with Multidimensional Data¶
For gridded climate data and other multidimensional datasets, dominosee provides xarray-compatible functions that maintain dimension information:
import xarray as xr
import pandas as pd
from dominosee import select_first_period_xr
# Create a sample xarray DataArray with dimensions (time, event, location)
times = pd.date_range('2025-01-01', periods=30)
data = np.random.randn(30, 4, 3)
da = xr.DataArray(
data,
dims=('time', 'event', 'location'),
coords={
'time': times,
'event': np.arange(4),
'location': ['A', 'B', 'C']
}
)
# Define durations as an xarray DataArray
durations = xr.DataArray(
np.array([7, 12, 5, 10]),
dims=('event'),
coords={'event': np.arange(4)}
)
# Select the first 5 days of each event
result = select_first_period_xr(da, durations, days=5)
# The result maintains all the dimensions and coordinates of the original DataArray
print(result.dims) # ('time', 'event', 'location')
# Plot the results for one location
result.sel(location='A').plot.line(x='time', hue='event')
plt.title('First 5 Days of Events at Location A')
plt.grid(True)
plt.show()
Handling Edge Cases¶
The event selection functions in dominosee are designed to handle various edge cases:
Events shorter than the requested period: If an event’s duration is less than the requested number of days, the entire event is returned.
Missing values: NaN values in the input data are preserved in the output.
Zero-duration events: These are handled gracefully by returning empty data.
Example with mixed durations:
# Events with varying durations
events = np.random.randn(20, 4)
durations = np.array([2, 10, 1, 5])
# Select first 4 days
result = select_first_period(events, durations, days=4)
# Result will contain:
# - All data for event 0 (2 days)
# - First 4 days for event 1 (10 days)
# - All data for event 2 (1 day)
# - First 4 days for event 3 (5 days)
Using dask for Large Datasets¶
When working with large datasets, you can leverage dask through the xarray integration:
import dask.array as da
# Create a large dask array
dask_data = da.random.random((1000, 20, 50), chunks=(100, 5, 10))
# Convert to xarray with dask backend
dask_da = xr.DataArray(
dask_data,
dims=('time', 'event', 'location'),
coords={
'time': pd.date_range('2025-01-01', periods=1000),
'event': np.arange(20),
'location': np.arange(50)
}
)
# Create duration array
durations = xr.DataArray(
np.random.randint(5, 30, size=20),
dims=('event'),
coords={'event': np.arange(20)}
)
# Apply function with dask backend
result = select_first_period_xr(dask_da, durations, days=7)
# Operations are computed lazily until you request results
computed_result = result.compute()