Yuval Avidani
Author
Key Takeaway
TimesFM is a decoder-only transformer foundation model that delivers state-of-the-art zero-shot forecasting for time-series data across retail, finance, weather, and other domains. Created by Google Research, it eliminates the need for extensive historical data collection and domain-specific model training that traditionally bottleneck forecasting projects.
What is TimesFM?
TimesFM (Time Series Foundation Model) is a 200M parameter foundation model pretrained on 100 billion real-world and synthetic time points that enables zero-shot forecasting on entirely new time-series data. The project timesfm solves the problem of cold start forecasting that we all face when working with limited historical data or launching predictions in new domains.
Unlike traditional statistical methods like ARIMA or Prophet that require training on our specific datasets, TimesFM learns universal temporal patterns during pretraining. Think of it like GPT for time-series - just as language models understand grammar and semantics without seeing our specific text, TimesFM understands how things change over time without training on our specific metrics.
The Problem We All Know
We spend weeks or months on every new forecasting project. First, we collect extensive historical data. Then we experiment with different statistical models - ARIMA, exponential smoothing, Prophet. We tune hyperparameters. We validate on holdout sets. And when we move to a different domain or client, we start from scratch.
The retail demand forecasting model we built doesn't help with weather prediction. Our financial market forecaster can't transfer to supply chain optimization. Each new time-series means a new modeling project. This cold start problem - accurately predicting without extensive domain-specific training - has been the fundamental bottleneck in time-series forecasting.
Existing deep learning approaches haven't fully solved this either. Models like N-BEATS or DeepAR require thousands of related time-series for training. When we face a genuinely new forecasting problem with limited history, we're still stuck with manual statistical modeling.
How TimesFM Works
Google Research took the foundation model approach that revolutionized NLP and applied it to time-series. They pretrained a decoder-only transformer architecture on a massive corpus of 100 billion time points spanning diverse domains - retail sales, web traffic, financial metrics, sensor data, weather patterns, and synthetic series.
The architecture uses attention mechanisms - meaning the model learns which past time points are most relevant for predicting future values. The decoder-only design processes sequences left-to-right (or past-to-future), similar to GPT. It supports context lengths up to 16k points, allowing it to capture long-term temporal patterns when available.
During pretraining, the model learned universal patterns: seasonality - meaning recurring patterns at regular intervals like daily or weekly cycles; trends - meaning persistent upward or downward movements; sudden changes and anomalies; autocorrelation - meaning how values depend on previous values; cross-series relationships in the training data.
Quick Start
Here's how we get started with TimesFM:
# Installation
pip install timesfm
# Import and initialize
import timesfm
import numpy as np
# Load the pretrained model
tfm = timesfm.TimesFm(
context_len=512, # Historical window
horizon_len=128, # Forecast horizon
input_patch_len=32,
output_patch_len=128,
num_layers=20,
model_dims=1280,
)
# Point to checkpoint
tfm.load_from_checkpoint(repo_id="google/timesfm-1.0-200m")
# Make predictions
forecast_input = [
np.sin(np.linspace(0, 20, 100)), # Your time-series data
]
frequency_input = [0] # Frequency hint (optional)
# Generate forecast
point_forecast, experimental_quantile = tfm.forecast(
forecast_input,
freq=frequency_input,
)
print(point_forecast) # Your predictions
A Real Example
Let's say we're launching a new product line and need demand forecasts. We only have 3 months of sales history - not enough for traditional methods. Here's how TimesFM helps:
import pandas as pd
import timesfm
# Load our limited historical data
sales_data = pd.read_csv('new_product_sales.csv')
timeseries = sales_data['daily_units'].values
# Initialize TimesFM
tfm = timesfm.TimesFm(
context_len=90, # 3 months of daily data
horizon_len=30, # Forecast next month
input_patch_len=32,
output_patch_len=128,
)
tfm.load_from_checkpoint(repo_id="google/timesfm-1.0-200m")
# Generate zero-shot forecast
forecast, quantiles = tfm.forecast(
[timeseries],
freq=[0] # Daily frequency
)
# Extract predictions
next_month_forecast = forecast[0]
print(f"Predicted sales for next 30 days: {next_month_forecast}")
# Get uncertainty estimates
lower_bound = quantiles[0][0] # 10th percentile
upper_bound = quantiles[0][1] # 90th percentile
Key Features
- Zero-Shot Forecasting - We feed the model a time-series it has never seen, and it generates accurate predictions immediately. No training, no fine-tuning, no hyperparameter search. Think of it like asking GPT to write about a topic it wasn't specifically trained on - it leverages universal language patterns. TimesFM leverages universal temporal patterns.
- Long Context Window - The model supports up to 16k time points of context. This means when we do have extensive history available, TimesFM can look far back to capture long-term patterns and seasonal cycles. For weekly data, that's over 300 years of context if needed.
- Domain Agnostic - The same model handles retail demand, financial prices, weather measurements, web traffic, IoT sensors, and more. We don't need separate forecasters for each domain. The model learned patterns that generalize across use cases.
- Uncertainty Quantification - TimesFM provides prediction intervals alongside point forecasts. We get not just "sales will be 1000 units" but "sales will likely be between 800-1200 units with 80% confidence." This probabilistic output helps us make better decisions under uncertainty.
- Open Source - Released under Apache 2.0 license. We can inspect the code, understand the architecture, modify it for our needs, and deploy it in our infrastructure without licensing restrictions. The pretrained checkpoints are publicly available.
When to Use TimesFM vs. Alternatives
TimesFM excels when we need quick forecasts for new time-series with limited history. If we're launching a new product, entering a new market, or forecasting a metric we've never tracked before - TimesFM delivers accurate predictions from day one.
For univariate forecasting - meaning predicting one metric based on its own history - TimesFM is extremely competitive. It matches or exceeds specialized statistical models without the manual tuning overhead.
However, traditional methods still have their place. If we have extensive domain expertise and years of historical data, a well-tuned ARIMA or Prophet model might deliver slightly better accuracy for that specific use case. The benefit of TimesFM is we don't need that tuning process.
For multivariate forecasting - meaning using multiple related time-series to predict one target - specialized architectures like Temporal Fusion Transformers or N-BEATS might perform better. TimesFM focuses on univariate series. If we need to predict retail sales using price, promotions, weather, and competitor data simultaneously, other tools designed for multivariate input would be more appropriate.
For real-time streaming predictions at extreme scale, lightweight statistical methods might be more efficient. TimesFM's 200M parameters require more compute than simple exponential smoothing. The tradeoff is accuracy vs. computational cost.
My Take - Will I Use This?
In my view, TimesFM represents a genuine paradigm shift in how we approach time-series forecasting. The ability to get accurate predictions without training on our specific data changes the economics of forecasting projects dramatically.
I see this being perfect for our workflow in several scenarios. First, rapid prototyping and POCs - meaning proof-of-concepts. When we need to quickly assess if forecasting is viable for a use case, TimesFM gives us results in minutes instead of weeks. Second, forecasting long-tail metrics. Many organizations have hundreds or thousands of metrics they'd like to forecast but lack resources to build custom models for each. TimesFM makes this feasible. Third, cold start situations where we're launching something new and historical data is scarce.
The open-source release is significant. We can deploy this in our own infrastructure, fine-tune on our proprietary data if beneficial, and integrate it into our MLOps pipelines without vendor lock-in.
The limitation to watch: TimesFM is optimized for univariate series. If our forecasting problem requires complex feature engineering with multiple exogenous variables, we'll need to either preprocess the inputs creatively or use specialized multivariate models. Google Research mentions future work on multivariate capabilities, which would make this even more powerful.
Another consideration: model size. At 200M parameters, TimesFM requires more compute than classical statistical methods. For applications where we need millions of forecasts updated every second, the computational cost might be prohibitive. But for most business forecasting use cases where we generate predictions hourly or daily, the accuracy improvement justifies the compute.
Check out the repo: timesfm
Frequently Asked Questions
What is TimesFM?
TimesFM is a 200M parameter decoder-only transformer foundation model that performs zero-shot time-series forecasting without requiring training on your specific data.
Who created TimesFM?
TimesFM was created by Google Research. It represents their application of foundation model approaches to time-series forecasting, similar to how transformers revolutionized NLP.
When should we use TimesFM?
Use TimesFM when you need accurate forecasts for new time-series with limited historical data, when you want to avoid weeks of model tuning, or when you need to forecast many different metrics efficiently.
What are the alternatives to TimesFM?
Traditional statistical methods like ARIMA, Prophet, and exponential smoothing remain viable for well-understood domains with extensive data. Deep learning alternatives include N-BEATS for univariate forecasting and Temporal Fusion Transformers for multivariate problems. TimesFM's advantage is zero-shot capability without domain-specific training.
What are the limitations of TimesFM?
TimesFM is currently optimized for univariate time-series. Complex multivariate forecasting problems that require modeling interactions between many features may benefit from specialized architectures. The 200M parameter model also requires more compute than simple statistical methods.
