TimeNet

Download and explore time-series datasets through one standardized format. Ready to be deployed locally or in your Ray cluster.

A Call for Open Collaboration

Join us in building the Data Foundational Layer for
Time-Series Language Models (TSLMs)

Meet Contributors from

Overview

Time-series data is fragmented. TimeNet standardizes it.

Every dataset ships a different layout, and teams need to rewrite the loading and conversion code for each one. TimeNet replaces that with a single format and a single ingestion pipeline.

What

One format for every dataset

A Python library and CLI for downloading and exploring time-series datasets in one unified format.

Why

Data fragmentation costs money

Time-series data lives all across the internet, each with its own structure. That fragmentation, not modeling, is where most of the work goes. Standardize the data once and that cost disappears.

How - standardization and infrastructure

Standardization

Write a connector

For each dataset, write a connector contract. This connector runs on the timenet's engine and is in charge of downloading, converting and storing the data in the standard format. Writing connectors is straightforward thanks to our intuitive API and examples.

Infrastructure

Laptop to cluster, same code

An async engine runs connectors over a scheduler, with parallelism built in. The same code scales from a local process pool to a Ray cluster.

Object stores

S3, GCS

Databases

Postgres, MySQL

APIs

REST, gRPC

Repositories

PhysioNet, Kaggle

custom connectors

TimeNet Engine

download→convert→store

Scheduler

local ↔ Ray

Every phase runs as one or more scheduled jobs, executed on a local process pool or a Ray cluster.

Standardized store

mit_bih

mimic_iv

finance_fx

iot_sensors

one format · one place

Read

client.read(…)

Pulls standardized datasets back

Object stores

S3, GCS

Databases

Postgres, MySQL

APIs

REST, gRPC

Repositories

PhysioNet, Kaggle

custom connectors

TimeNet Engine

download→convert→store

Scheduler

local ↔ Ray

Every phase runs as one or more scheduled jobs, executed on a local process pool or a Ray cluster.

Standardized store

mit_bih

mimic_iv

finance_fx

iot_sensors

one format · one place

Read

client.read(…)

Pulls standardized datasets back

Each dataset from a source gets a custom connector. The engine runs download, convert, and store as scheduled jobs. Every dataset is standardized and stored in one place.

A standardized format

Whatever the domain is, every dataset is stored in the same format. Here is one small example.

What is stored?

Identity

Dataset id, version, domain, license...

Samples

Every dataset stores one or more samples.

Signals

Samples contain Signals, which we define as unitary channels

Annotations

Can be an interval, point, or a static annotation to support events and metadata.

Tasks

What a model should predict from the record.

YAML
dataset_id: mit_bih_arrhythmia
version: "2.0.0"
domain: cardiology
license: CC BY 4.0

samples:
  - subject_id: patient_42
    view: window

    channels:
      - spec: ecg_lead
        channel: II
        sampling_rate: 360
        unit: Hz
        t_start_s: 0.0
        t_end_s: 10.0

    annotations:
      - type: interval
        name: artifact
        start_s: 4.2
        end_s: 4.8
      - type: point
        name: r_peak
        t_s: 1.93
      - type: static
        key: age
        value: 67
      - type: static
        key: sex
        value: M

    tasks:
      - type: classification
        label: afib

  - subject_id: patient_77
    view: full

    channels:                       # 12-lead diagnostic ECG
      - spec: ecg_lead
        channel: I
        sampling_rate: 500
        unit: Hz
        t_start_s: 0.0
        t_end_s: 10.0
      - spec: ecg_lead
        channel: II
        sampling_rate: 500
        unit: Hz
        t_start_s: 0.0
        t_end_s: 10.0
      - spec: ecg_lead
        channel: III
        sampling_rate: 500
        unit: Hz
        t_start_s: 0.0
        t_end_s: 10.0
      - spec: ecg_lead
        channel: aVR
        sampling_rate: 500
        unit: Hz
        t_start_s: 0.0
        t_end_s: 10.0
      - spec: ecg_lead
        channel: aVL
        sampling_rate: 500
        unit: Hz
        t_start_s: 0.0
        t_end_s: 10.0
      - spec: ecg_lead
        channel: aVF
        sampling_rate: 500
        unit: Hz
        t_start_s: 0.0
        t_end_s: 10.0
      - spec: ecg_lead
        channel: V1
        sampling_rate: 500
        unit: Hz
        t_start_s: 0.0
        t_end_s: 10.0
      - spec: ecg_lead
        channel: V2
        sampling_rate: 500
        unit: Hz
        t_start_s: 0.0
        t_end_s: 10.0
      - spec: ecg_lead
        channel: V3
        sampling_rate: 500
        unit: Hz
        t_start_s: 0.0
        t_end_s: 10.0
      - spec: ecg_lead
        channel: V4
        sampling_rate: 500
        unit: Hz
        t_start_s: 0.0
        t_end_s: 10.0
      - spec: ecg_lead
        channel: V5
        sampling_rate: 500
        unit: Hz
        t_start_s: 0.0
        t_end_s: 10.0
      - spec: ecg_lead
        channel: V6
        sampling_rate: 500
        unit: Hz
        t_start_s: 0.0
        t_end_s: 10.0

    annotations:
      - type: point
        name: r_peak
        t_s: 0.84
      - type: static
        key: age
        value: 58
      - type: static
        key: sex
        value: F

    tasks:
      - type: classification
        label: normal

Create a connector

Connect a Dataset

Subclass TimeNet's base connector or one of the built-in connectors and adapt them to your data. TimeNet will handle the rest: validation, parallelization, scheduling, and storage.

@dataclass(frozen=True)
class Trial(RawEntry):
    activity: str
    placement: Literal["wrist", "waist", "ankle"]
    

class ActiGraphGT3X(Device):
    device_id = "actigraph_gt3x"
    name = "ActiGraph GT3X"
    manufacturer = "ActiGraph"


class AccelerometerSpec(TimeSeriesSpec):
    spec_id = "accelerometer"
    name = "Accelerometer"
    device = ActiGraphGT3X
    unit_sampling_rate = SamplingRateUnit.HZ
    unit_timestamp = TimestampUnit.SECONDS
    unit_value = ValueUnit.G


class Placement(Annotation):      # static: where the sensor was worn
    key = "placement"
    kind = AnnotationKind.STATIC
    value: Literal["wrist", "waist", "ankle"]


class WISDMConnector(BaseConnector[Trial]):

    def metadata(self) -> DatasetMetadata:
        return DatasetMetadata(
            dataset_id="wisdm_activity",
            version=Version(1, 0, 0),
            description="Wrist accelerometer activity recognition.",
            license=License.CC_BY_4,
            domains=(Domain.HUMAN_ACTIVITY,),
        )

    def download(self) -> list[Trial]:
        files = self.fetch("wisdm.cis.fordham.edu/dataset")
        return [to_trial(p) for p in files]

    @parallelize
    def convert(self, trial: Trial) -> Sample:
        sample = Sample(
            subject_ids=(trial.subject_id,),
            view=View.WINDOW,
            time_series=tuple(
                TimeSeries(
                    spec=AccelerometerSpec(axis=axis),
                    source_id=trial.file_name,
                    sampling_rate=Frequency.Hz(20.0),
                    reader=lambda t=trial, a=axis: read_axis(t.path, a),
                )
                for axis in ("x", "y", "z")
            ),
        )

        sample.add_annotation(Placement(value=trial.placement))
        sample.add_task(ClassificationTask(label=trial.activity))
        return sample

    # store() inherited; TimeNet handles the rest

Tasks

TimeNet supports multiple training goals

TimeNet is originally built to feed OpenTSLM. A task is the training goal: what the model should learn to produce from a sample. Each sample can carry one or more, so the same data can train several objectives.

Question answering

qa

Natural-language questions grounded in the sample. Ex: is there atrial fibrillation in this ECG? Yes/No

Reasoning

reasoning

Textual reasoning over temporal patterns. Ex: the sell-off follows a spike in trading volume.

Classification

classification

Assign a label to a window or a whole series. Ex: walking vs. running from the accelerometer.

Forecasting

forecasting

Predict future values from a past signal window. Ex: the next 10s from the previous 10s.

UCI

TimeNet

Coming soon

Documentation

Guides, API references, and end-to-end tutorials for building on TimeNet are on the way.