The Magic of In-Context Learning (ICL): When Your Model Already Knows Your Data

Have you ever looked at a freshly plotted scatter plot and immediately thought, “Ah, this is clearly a logarithmic curve with some heteroskedastic noise,” without running a single line of modeling code? How do you do that? You don’t perform gradient descent in your head. You use your intuition!

As an experienced data scientist, you have seen thousands of datasets in your career. When confronted with new data, your natural neural network (a.k.a. brain) simply draws on this vast library of past mathematical shapes and immediately recognizes the pattern. But what if an artificial neural network could do exactly the same thing? What if it could predict your data without actually being trained on it?

Welcome to the mind-bending world of In-Context Learning (ICL) for tabular data, brought to R via the incredible new TabPFN package (on CRAN).

The Transformer: From Text to Tables

To understand ICL, we have to talk about Large Language Models like ChatGPT (see also Building Your Own Mini-ChatGPT with R: From Markov Chains to Transformers!). When you give a chatbot an unfinished sentence, it doesn’t retrain its weights to guess the next word. It uses a Transformer architecture equipped with an attention mechanism (see also Attention! What lies at the Core of ChatGPT? (Also as a Video!)). It reads the words you provided, understands the dependencies between them (the grammar and context), and instantly extrapolates what comes next.

The genius of TabPFN is taking this exact architecture and applying it to spreadsheets. Instead of a sequence of words, the Transformer reads a sequence of data rows. It treats your features (X) and your target (Y) like the grammar of a language. By comparing all the rows and columns simultaneously in its “context window,” it figures out the dependencies in the table just like a language model figures out dependencies in text.

The model that arises is a foundation model for tabular data, or tabular foundation model for short.

This process is formally known as Few-Shot Learning. You aren’t giving the model an empty brain to train; you are “prompting” a pre-trained brain with a few dozen (or a few hundred) “shots” (rows) of your data to establish the pattern!

The Training Matrix: Learning the Shape of Maths

You might be wondering: If it isn’t training on my data, what exactly was it trained on?

This is where it gets incredibly cool. The researchers who built TabPFN didn’t train it on real-world datasets like housing prices or medical records. Instead, they wrote algorithms to generate millions of completely random, artificially created mathematical dependency structures.

They forced the network to practice on synthetic datasets containing every statistical quirk imaginable: linear trends, severe non-linearities, bizarre interaction effects, extreme missing data mechanisms, and sheer noise. Because it spent its entire training solving billions of abstract maths puzzles, the model learned the fundamental shape of causal mathematical dependencies. When it sees your real-world data, it’s just recognizing a pattern it has already solved synthetically a thousand times before.

Let’s see it in action

Let’s use the venerable iris dataset. Because iris is small and the mathematical boundaries are very clear, it’s the perfect candidate for few-shot learning. Notice how the code looks exactly like traditional machine learning, but under the hood, no training is actually happening!

# Load the package
library(tabpfn)

# 1. Prepare the Data
set.seed(42)
train_indices <- sample(seq_len(nrow(iris)), size = 0.7 * nrow(iris))

iris_train <- iris[train_indices, ]
iris_test  <- iris[-train_indices, ]

# 2. Fit the Model
cat("Generating embeddings...\n")
## Generating embeddings...
tab_fit <- tab_pfn(Species ~ ., data = iris_train)

# 3. Make Predictions
cat("Predicting...\n")
## Predicting...
predictions <- predict(tab_fit, new_data = iris_test)

# 4. Check the accuracy
accuracy <- sum(predictions$.pred_class == iris_test$Species) / nrow(iris_test)
cat("\nSuccess! Overall Accuracy:", round(accuracy * 100, 1), "%\n")
## 
## Success! Overall Accuracy: 97.8 %

When you run this, you will see an accuracy of 97.8%. The model looked at the few examples in iris_train, instantly recognized the multidimensional shapes separating the species using its synthetic intuition, and accurately classified the new test data without a single epoch of traditional backpropagation.

Conclusion

TabPFN is a paradigm shift. For small to medium tabular datasets, we no longer need to spend hours tuning hyperparameters for Random Forests or XGBoost. We can simply hand the data to an experienced, mathematically omniscient Transformer and let In-Context Learning do the heavy lifting.

Give it a try on your own data, and tell us about your experience in the comments below!

One thought on “The Magic of In-Context Learning (ICL): When Your Model Already Knows Your Data”

Pingback: Building Your Own Mini-ChatGPT with R: From Markov Chains to Transformers! – Learning Machines