Event Analysis

This guide explains how to use dominosee for analyzing time series events.

Introduction

Event analysis is a fundamental part of understanding hydroclimatic phenomena. dominosee provides tools to select, analyze, and visualize event data across temporal and spatial dimensions.

Basic Event Selection

For numpy array-based data, the select_first_period function allows you to extract the initial phases of events based on their duration:

import numpy as np
import matplotlib.pyplot as plt
from dominosee import select_first_period

# Create sample data (20 time steps, 5 events)
events = np.random.randn(20, 5)

# Define event durations
durations = np.array([5, 10, 7, 12, 6])

# Select the first 3 days of each event
first_period = select_first_period(events, durations, days=3)

# Visualize the results
plt.figure(figsize=(10, 6))
for i in range(5):
    plt.plot(first_period[:, i], label=f'Event {i+1}')
plt.legend()
plt.title('First 3 Days of Events')
plt.xlabel('Time Step')
plt.ylabel('Value')
plt.grid(True)
plt.show()

Working with Multidimensional Data

For gridded climate data and other multidimensional datasets, dominosee provides xarray-compatible functions that maintain dimension information:

import xarray as xr
import pandas as pd
from dominosee import select_first_period_xr

# Create a sample xarray DataArray with dimensions (time, event, location)
times = pd.date_range('2025-01-01', periods=30)
data = np.random.randn(30, 4, 3)

da = xr.DataArray(
    data,
    dims=('time', 'event', 'location'),
    coords={
        'time': times,
        'event': np.arange(4),
        'location': ['A', 'B', 'C']
    }
)

# Define durations as an xarray DataArray
durations = xr.DataArray(
    np.array([7, 12, 5, 10]),
    dims=('event'),
    coords={'event': np.arange(4)}
)

# Select the first 5 days of each event
result = select_first_period_xr(da, durations, days=5)

# The result maintains all the dimensions and coordinates of the original DataArray
print(result.dims)  # ('time', 'event', 'location')

# Plot the results for one location
result.sel(location='A').plot.line(x='time', hue='event')
plt.title('First 5 Days of Events at Location A')
plt.grid(True)
plt.show()

Handling Edge Cases

The event selection functions in dominosee are designed to handle various edge cases:

  1. Events shorter than the requested period: If an event’s duration is less than the requested number of days, the entire event is returned.

  2. Missing values: NaN values in the input data are preserved in the output.

  3. Zero-duration events: These are handled gracefully by returning empty data.

Example with mixed durations:

# Events with varying durations
events = np.random.randn(20, 4)
durations = np.array([2, 10, 1, 5])

# Select first 4 days
result = select_first_period(events, durations, days=4)

# Result will contain:
# - All data for event 0 (2 days)
# - First 4 days for event 1 (10 days)
# - All data for event 2 (1 day)
# - First 4 days for event 3 (5 days)

Using dask for Large Datasets

When working with large datasets, you can leverage dask through the xarray integration:

import dask.array as da

# Create a large dask array
dask_data = da.random.random((1000, 20, 50), chunks=(100, 5, 10))

# Convert to xarray with dask backend
dask_da = xr.DataArray(
    dask_data,
    dims=('time', 'event', 'location'),
    coords={
        'time': pd.date_range('2025-01-01', periods=1000),
        'event': np.arange(20),
        'location': np.arange(50)
    }
)

# Create duration array
durations = xr.DataArray(
    np.random.randint(5, 30, size=20),
    dims=('event'),
    coords={'event': np.arange(20)}
)

# Apply function with dask backend
result = select_first_period_xr(dask_da, durations, days=7)

# Operations are computed lazily until you request results
computed_result = result.compute()