Join Us

TimeNet

Download and explore time-series datasets through one standardized format. Ready to be deployed locally or in your Ray cluster.

A Call for Open Collaboration

Join us in building the Data Foundational Layer for
Time-Series Language Models (TSLMs)

Meet Contributors from

ETH
APPLE
Stanford
Amazon
Harvard
IBM
Cambridge
CERN
NUS
Citadel
Dartmouth
Optiver
TUM
ETH
CDTM
Stanford
MaxPlanck
Harvard
Polimi2
Cambridge
Google
NUS
META
Dartmouth

Do you want to contribute with datasets or research with us?

Join Us
Overview

Time-series data is fragmented. TimeNet standardizes it.

Every dataset ships a different layout, and teams need to rewrite the loading and conversion code for each one. TimeNet replaces that with a single format and a single ingestion pipeline.

What

One format for every dataset

A Python library and CLI for downloading and exploring time-series datasets in one unified format.

Why

Data fragmentation costs money

Time-series data lives all across the internet, each with its own structure. That fragmentation, not modeling, is where most of the work goes. Standardize the data once and that cost disappears.

How - standardization and infrastructure

Standardization

Write a connector

For each dataset, write a connector contract. This connector runs on the timenet's engine and is in charge of downloading, converting and storing the data in the standard format. Writing connectors is straightforward thanks to our intuitive API and examples.

Infrastructure

Laptop to cluster, same code

An async engine runs connectors over a scheduler, with parallelism built in. The same code scales from a local process pool to a Ray cluster.

Object stores
S3, GCS
Databases
Postgres, MySQL
APIs
REST, gRPC
Repositories
PhysioNet, Kaggle
custom connectors
TimeNet Engine
downloadconvertstore
Scheduler
local ↔ Ray

Every phase runs as one or more scheduled jobs, executed on a local process pool or a Ray cluster.

Standardized store
mit_bih
mimic_iv
finance_fx
iot_sensors

one format · one place

Read
client.read()

Pulls standardized datasets back

Each dataset from a source gets a custom connector. The engine runs download, convert, and store as scheduled jobs. Every dataset is standardized and stored in one place.

A standardized format

Whatever the domain is, every dataset is stored in the same format. Here is one small example.

What is stored?
Identity

Dataset id, version, domain, license...

Samples

Every dataset stores one or more samples.

Signals

Samples contain Signals, which we define as unitary channels

Annotations

Can be an interval, point, or a static annotation to support events and metadata.

Tasks

What a model should predict from the record.

YAML
1dataset_id: mit_bih_arrhythmia 2version: "2.0.0" 3domain: cardiology 4license: CC BY 4.0 5 6samples: 7 - subject_id: patient_42 8 view: window 9 10 channels: 11 - spec: ecg_lead 12 channel: II 13 sampling_rate: 360 14 unit: Hz 15 t_start_s: 0.0 16 t_end_s: 10.0 17 18 annotations: 19 - type: interval 20 name: artifact 21 start_s: 4.2 22 end_s: 4.8 23 - type: point 24 name: r_peak 25 t_s: 1.93 26 - type: static 27 key: age 28 value: 67 29 - type: static 30 key: sex 31 value: M 32 33 tasks: 34 - type: classification 35 label: afib 36 37 - subject_id: patient_77 38 view: full 39 40 channels: # 12-lead diagnostic ECG 41 - spec: ecg_lead 42 channel: I 43 sampling_rate: 500 44 unit: Hz 45 t_start_s: 0.0 46 t_end_s: 10.0 47 - spec: ecg_lead 48 channel: II 49 sampling_rate: 500 50 unit: Hz 51 t_start_s: 0.0 52 t_end_s: 10.0 53 - spec: ecg_lead 54 channel: III 55 sampling_rate: 500 56 unit: Hz 57 t_start_s: 0.0 58 t_end_s: 10.0 59 - spec: ecg_lead 60 channel: aVR 61 sampling_rate: 500 62 unit: Hz 63 t_start_s: 0.0 64 t_end_s: 10.0 65 - spec: ecg_lead 66 channel: aVL 67 sampling_rate: 500 68 unit: Hz 69 t_start_s: 0.0 70 t_end_s: 10.0 71 - spec: ecg_lead 72 channel: aVF 73 sampling_rate: 500 74 unit: Hz 75 t_start_s: 0.0 76 t_end_s: 10.0 77 - spec: ecg_lead 78 channel: V1 79 sampling_rate: 500 80 unit: Hz 81 t_start_s: 0.0 82 t_end_s: 10.0 83 - spec: ecg_lead 84 channel: V2 85 sampling_rate: 500 86 unit: Hz 87 t_start_s: 0.0 88 t_end_s: 10.0 89 - spec: ecg_lead 90 channel: V3 91 sampling_rate: 500 92 unit: Hz 93 t_start_s: 0.0 94 t_end_s: 10.0 95 - spec: ecg_lead 96 channel: V4 97 sampling_rate: 500 98 unit: Hz 99 t_start_s: 0.0 100 t_end_s: 10.0 101 - spec: ecg_lead 102 channel: V5 103 sampling_rate: 500 104 unit: Hz 105 t_start_s: 0.0 106 t_end_s: 10.0 107 - spec: ecg_lead 108 channel: V6 109 sampling_rate: 500 110 unit: Hz 111 t_start_s: 0.0 112 t_end_s: 10.0 113 114 annotations: 115 - type: point 116 name: r_peak 117 t_s: 0.84 118 - type: static 119 key: age 120 value: 58 121 - type: static 122 key: sex 123 value: F 124 125 tasks: 126 - type: classification 127 label: normal
Create a connector

Connect a Dataset

Subclass TimeNet's base connector or one of the built-in connectors and adapt them to your data. TimeNet will handle the rest: validation, parallelization, scheduling, and storage.

19@dataclass(frozen=True) 20class Trial(RawEntry): 21 activity: str 22 placement: Literal["wrist", "waist", "ankle"] 23 24 25class ActiGraphGT3X(Device): 26 device_id = "actigraph_gt3x" 27 name = "ActiGraph GT3X" 28 manufacturer = "ActiGraph" 29 30 31class AccelerometerSpec(TimeSeriesSpec): 32 spec_id = "accelerometer" 33 name = "Accelerometer" 34 device = ActiGraphGT3X 35 unit_sampling_rate = SamplingRateUnit.HZ 36 unit_timestamp = TimestampUnit.SECONDS 37 unit_value = ValueUnit.G 38 39 40class Placement(Annotation): # static: where the sensor was worn 41 key = "placement" 42 kind = AnnotationKind.STATIC 43 value: Literal["wrist", "waist", "ankle"] 44 45 46class WISDMConnector(BaseConnector[Trial]): 47 48 def metadata(self) -> DatasetMetadata: 49 return DatasetMetadata( 50 dataset_id="wisdm_activity", 51 version=Version(1, 0, 0), 52 description="Wrist accelerometer activity recognition.", 53 license=License.CC_BY_4, 54 domains=(Domain.HUMAN_ACTIVITY,), 55 ) 56 57 def download(self) -> list[Trial]: 58 files = self.fetch("wisdm.cis.fordham.edu/dataset") 59 return [to_trial(p) for p in files] 60 61 @parallelize 62 def convert(self, trial: Trial) -> Sample: 63 sample = Sample( 64 subject_ids=(trial.subject_id,), 65 view=View.WINDOW, 66 time_series=tuple( 67 TimeSeries( 68 spec=AccelerometerSpec(axis=axis), 69 source_id=trial.file_name, 70 sampling_rate=Frequency.Hz(20.0), 71 reader=lambda t=trial, a=axis: read_axis(t.path, a), 72 ) 73 for axis in ("x", "y", "z") 74 ), 75 ) 76 77 sample.add_annotation(Placement(value=trial.placement)) 78 sample.add_task(ClassificationTask(label=trial.activity)) 79 return sample 80 81 # store() inherited; TimeNet handles the rest
Tasks

TimeNet supports multiple training goals

TimeNet is originally built to feed OpenTSLM. A task is the training goal: what the model should learn to produce from a sample. Each sample can carry one or more, so the same data can train several objectives.

Question answering

qa

Natural-language questions grounded in the sample. Ex: is there atrial fibrillation in this ECG? Yes/No

Reasoning

reasoning

Textual reasoning over temporal patterns. Ex: the sell-off follows a spike in trading volume.

Classification

classification

Assign a label to a window or a whole series. Ex: walking vs. running from the accelerometer.

Forecasting

forecasting

Predict future values from a past signal window. Ex: the next 10s from the previous 10s.

S3
Hugging Face
Snowflake
PhysioNet
PostgreSQL
Kaggle
Azure Blob
UCI
Databricks
TimeNet
Coming soon

Documentation

Guides, API references, and end-to-end tutorials for building on TimeNet are on the way.