Study notes from material at https://www.vanderschaar-lab.com/synthetic-data-breaking-the-data-logjam-in-machine-learning-for-healthcare/

Researchers are hamstrung by a lack of access to high-quality data.

Lots of risks with private data, but also lots of benefit: need to find a sweet-spot.

Synthetic data could be the solution

There are several types of synthetic data, but the term essentially refers to the generation of artificial data with the aim of reproducing the statistical properties of an original dataset.

Fourier Flows

(Generative Time-series Modeling with Fourier Flows, Alaa et al. 2021)

Most recent proposals for generating synthetic time-series reply on implicit likelihood modeling using GANs, but such models can be difficult to train, and may jeopardize privacy by “memorizing” temporal patterns in training data.

In this paper we propose an explicit likelihood model based on a novel class of normalizing flows that view time-series data in the frequency-domain rather than the time-domain.

The method, Fourier flow, uses a discrete Fourier transform (DFT) to convert variable-length time-series with arbitrary sampling periods into fixed-length spectral representations, then applies a (data-dependent) spectral filter to the frequency-transformed time-series.

The Fourier flows compare favourably to RealNVP flows, but as the latter as simpler, I’ll take a sidetrack to study them a bit

## RealNVP