Network Topology Predicts Pruning Success: What Neural Curvature Reveals

Key Finding

According to the paper "Analyzing Neural Network Information Flow Using Differential Geometry" by Shuhang Tan, Jayson Sia, Paul Bogdan, and Radoslav Ivanov, neural networks have an underlying geometric structure where negative curvature edges act as critical information bottlenecks while positive curvature edges represent redundancy. This has significant implications for how we approach model compression and interpretability in production systems.

What Does Neural Network Curvature Mean?

Neural network curvature is a mathematical measure borrowed from differential geometry that describes how information flows through network connections. The paper "Analyzing Neural Network Information Flow Using Differential Geometry" applies Ollivier-Ricci Curvature (ORC) - a concept from graph theory - to understand which parts of our neural networks are structurally essential versus redundant.

Think of it like analyzing a transportation network. Roads through mountain passes (negative curvature) are critical bottlenecks - remove them and you isolate entire regions. Roads converging into a city center (positive curvature) show redundancy - many alternative paths exist. The researchers apply this same geometric reasoning to our neural networks.

The Problem We All Face

We spend countless hours trying to compress our models for deployment. Production systems demand smaller, faster models, but pruning is often a shot in the dark. We remove weights based on magnitude, gradient flows, or information-theoretic measures, but the results are inconsistent across architectures.

The core issue? We lack a principled mathematical framework for understanding which connections in our networks are truly critical versus redundant. Traditional pruning methods give us numbers and heuristics, but they don't explain the underlying structure of how information flows through our models. We're essentially doing surgery without understanding the anatomy.

What the Researchers Found

The researchers developed what they call "Neural Curvature" (NC) - a metric based on Ollivier-Ricci Curvature that treats our neural network as a weighted graph. Here's how it works in practical terms:

Graph Representation: The neural network becomes a weighted graph where edge weights correspond to activation patterns during forward passes. Each connection between neurons is an edge, and the weight reflects how strongly information flows along that path.

Curvature Measurement: Using differential geometry concepts, they measure the curvature of each edge. In mathematical terms, negative curvature (like a saddle point) means paths diverge - this edge is a structural bottleneck that's critical for connectivity. Positive curvature (like a sphere) means paths converge - this edge has redundant alternative pathways.

Critical Discovery: Edges with negative ORC values are essential. Remove them and performance degrades significantly. Edges with positive ORC values are redundant and can be pruned without accuracy loss. This isn't just correlation - the geometric structure predicts pruning success better than traditional weight-based methods.

Practical Implementation

Here's what applying this research might look like in practice:

# Example: Analyzing network curvature for pruning
import torch
import numpy as np
from neural_curvature import OllivierRicciCurvature

class CurvatureBasedPruner:
    def __init__(self, model, dataloader):
        self.model = model
        self.dataloader = dataloader
        
    def compute_neural_curvature(self, layer):
        """
        Compute Ollivier-Ricci Curvature for network edges
        based on activation patterns
        """
        # Build weighted graph from activations
        graph = self.build_activation_graph(layer)
        
        # Compute ORC for each edge
        orc = OllivierRicciCurvature(graph)
        curvatures = orc.compute_ricci_curvature()
        
        return curvatures
    
    def identify_prunable_edges(self, curvatures, threshold=0.0):
        """
        Edges with positive curvature are redundant
        Edges with negative curvature are critical bottlenecks
        """
        prunable = []
        critical = []
        
        for edge, curvature in curvatures.items():
            if curvature > threshold:
                prunable.append(edge)  # Positive = redundant
            else:
                critical.append(edge)  # Negative = bottleneck
        
        return prunable, critical

# Usage in production pipeline
pruner = CurvatureBasedPruner(our_model, validation_loader)
curvatures = pruner.compute_neural_curvature(model.layer3)
prunable_edges, critical_edges = pruner.identify_prunable_edges(curvatures)

print(f"Safe to prune: {len(prunable_edges)} edges")
print(f"Critical bottlenecks: {len(critical_edges)} edges")

Another practical example showing how this differs from traditional magnitude-based pruning:

# Comparison: Traditional vs Topology-Aware Pruning
import torch.nn.utils.prune as prune

def compare_pruning_strategies(model, layer, amount=0.3):
    """
    Compare magnitude-based pruning vs curvature-based pruning
    """
    # Traditional approach: prune by weight magnitude
    magnitude_pruned = prune.l1_unstructured(
        layer, name='weight', amount=amount
    )
    magnitude_accuracy = evaluate_model(model)
    
    # Geometric approach: prune by topology
    curvatures = compute_neural_curvature(layer)
    positive_curvature_mask = create_pruning_mask(
        curvatures, keep_negative=True
    )
    apply_curvature_mask(layer, positive_curvature_mask)
    topology_accuracy = evaluate_model(model)
    
    print(f"Magnitude-based accuracy: {magnitude_accuracy:.3f}")
    print(f"Topology-based accuracy: {topology_accuracy:.3f}")
    print(f"Improvement: {topology_accuracy - magnitude_accuracy:.3f}")
    
    return topology_accuracy > magnitude_accuracy

Key Results & Numbers

Bottleneck Identification - Edges with negative curvature values consistently proved critical across MNIST, CIFAR-10, and CIFAR-100 datasets. Removing these edges caused significant performance degradation.
Redundancy Detection - Edges with positive curvature values could be pruned without accuracy loss, identifying a larger subset of unimportant parameters compared to state-of-the-art baseline methods.
Cross-Architecture Performance - The geometric approach outperformed traditional pruning methods across different network architectures, suggesting the topology-based framework generalizes better than magnitude or gradient-based heuristics.
Interpretability Gain - Unlike black-box pruning scores, curvature values provide geometric interpretation: we can now explain why a connection is critical (it's a structural bottleneck) rather than just saying "the number is high."

How This Fits Our Toolkit

This research complements rather than replaces our existing compression approaches. Traditional magnitude-based pruning is fast and works reasonably well for many cases. Gradient-based methods like lottery ticket hypothesis give us insights into trainability. Information-theoretic approaches help us understand data flow.

The geometric curvature approach adds a new dimension: structural understanding. We can now ask "is this connection topologically critical?" alongside "is this weight large?" For models where we need interpretability - medical AI, financial systems, safety-critical applications - knowing that we're preserving information bottlenecks gives us confidence that traditional magnitude pruning cannot provide.

When would we use this? Consider compression pipelines where accuracy preservation is paramount, or when we need to explain our pruning decisions to stakeholders. The computational overhead of calculating curvatures is higher than simple magnitude ranking, but the payoff is principled pruning backed by differential geometry rather than heuristics.

My Take - Should We Pay Attention?

In my view, this is a significant step toward understanding our models rather than just optimizing them. The mathematical rigor of applying differential geometry to neural networks moves us beyond "try it and see" pruning strategies.

The practical value is clearest for production systems where we need both compression and confidence. Instead of iteratively pruning and retraining until accuracy drops, we can identify topologically redundant connections upfront. For teams working on edge deployment or model serving at scale, this could reduce the trial-and-error cycle significantly.

The limitation is computational cost. Calculating Ollivier-Ricci Curvature for very large networks (billions of parameters) may be prohibitive. But for moderate-sized models or critical layers in larger networks, the interpretability gain justifies the overhead.

Read the full paper: "Analyzing Neural Network Information Flow Using Differential Geometry"

Frequently Asked Questions

What does this paper find?

The paper finds that neural network connections have geometric structure measurable through curvature, where negative curvature edges are critical information bottlenecks and positive curvature edges are redundant pathways that can be safely pruned.

Who conducted this research?

The paper was authored by Shuhang Tan, Jayson Sia, Paul Bogdan, and Radoslav Ivanov, published on arXiv in January 2025. The research applies concepts from differential geometry and graph theory to deep learning.

Why does this matter for production systems?

This gives us a principled mathematical framework for model compression instead of heuristic-based pruning. We can identify which connections are structurally essential, leading to more reliable compression with less trial-and-error.

What should we do based on this research?

Consider incorporating topology-aware pruning into our compression pipelines, especially for models where we need interpretability alongside efficiency. Start with moderate-sized networks where curvature computation is feasible.

What are the limitations of this study?

The main limitation is computational cost - calculating Ollivier-Ricci Curvature for very large networks (billions of parameters) may be prohibitive. The approach is most practical for moderate-sized models or analyzing critical layers in larger architectures.