Gemma4 31B - Layer Analysis Study

60-layer transformer analysis: which layers matter, which are redundant, and what can be safely pruned.

60
Total Layers
30.7B
Parameters
300
Probe Prompts (6 categories)

Architecture

Sliding Attention (50)
Full Attention (10): layers 5,11,17,23,29,35,41,47,53,59

Phase A: Block Influence & Activation Analysis

Measures how much each layer modifies the residual stream. Low BI = layer barely changes anything = redundant.

Block Influence (BI) Score per Layer

BI Score by Category (Heatmap)

Residual Stream Norm Evolution

Phase A Key Findings

Phase B: Logit Lens — Where Decisions Happen

Projects each layer's hidden state through the final norm+lm_head to see what token it would predict. Reveals at which layer the model "commits" to an answer.

Average Target Token Rank per Layer (log scale)

Decision Layer per Category

Average layer at which the ground-truth token first enters top-1

Phase B Key Findings

Phase C: Ablation Study — What Breaks When You Remove a Layer

Each layer is individually disabled (skip) and the perplexity delta is measured. Negative delta = removing this layer IMPROVES the model.

Single-Layer Ablation: Perplexity Delta (% change from baseline)

Block Ablation: Dropping Multiple Layers Together

Phase C Key Findings

Synthesis: Evidence-Based Drop Plan

Combined Safety Score per Layer

Combining BI score (Phase A), cross-category variance, and ablation impact (Phase C)

All 60 Layers — Complete Profile

LayerTypeBI ScoreDelta (abs)Logit RankAblation %SafetyVerdict

Final Recommendations