Why Reading Papers Matters
The ML field moves extremely fast. The techniques being used in production systems today were described in papers 1-5 years ago. To stay current, to understand why techniques work (not just that they do), and to build on existing work rather than reinventing it, you need to be able to read research papers.
The good news: you do not need to understand every paper perfectly. You need to be able to extract what is useful, evaluate claims critically, and know when to dig deeper.
The Three-Pass Method
Do not read a paper front-to-back in one sitting. Use three passes of increasing depth.
Pass 1: The 5-minute skim (title, abstract, headings, conclusion)
Goal: Decide if this paper is worth your time.
- Read the title and abstract completely
- Scan all section headings to understand the paper's structure
- Glance at figures and their captions
- Read the conclusion
After this pass, you should be able to answer: What problem does this solve? What is the main claim? Is this relevant to what I am working on?
Pass 2: The 30-minute read (skip the math)
Goal: Understand the paper at a high level without getting lost in details.
- Read the introduction carefully (usually the best-written section)
- Read the methodology at a high level - understand the approach without following every equation
- Study the figures and tables - the key results are often in 2-3 figures
- Read the experiments section - what was compared, on what data, with what metrics
After this pass, you should be able to explain the paper to someone else: what they proposed and why it works better than alternatives.
Pass 3: The full read (follow every equation)
Goal: Understand the paper deeply enough to implement it or build on it.
- Work through all equations step by step
- Identify any assumptions that were glossed over
- Read the appendix and supplementary materials
- Look up cited papers you do not know
- Consider: could you implement this? Where would you start?
Not every paper deserves a Pass 3. Most papers you will read to Pass 2 and stop.
Anatomy of a Research Paper
Most ML papers follow a standard structure:
| Section | What it contains | Reading priority |
|---|---|---|
| Abstract | 150-word summary of contribution | Always read first |
| Introduction | Problem motivation, key contributions, paper overview | High |
| Related Work | How this differs from prior work | Medium (skim first pass) |
| Method / Model | Technical description of the approach | High (Pass 2+) |
| Experiments | Dataset, baselines, results, ablations | High |
| Conclusion | Summary and future work | Read in Pass 1 |
| Appendix | Proofs, extra experiments, implementation details | Pass 3 only |
The experiments section is where papers live or die. This is where you evaluate the actual evidence for the paper's claims.
Reading the Math
When you encounter an equation you cannot follow:
Step 1: Identify every symbol. Authors should define every symbol when they introduce it. Find the definitions. Sometimes they are in a notation table in the appendix.
Step 2: Check dimensions. If you are doing matrix operations, confirm the shapes make sense. If is and is , then is . Dimension mismatches expose implementation bugs and help you understand what each operation does.
Step 3: Try a simple example. Instantiate the equation with small concrete numbers. If the paper defines a loss function for a sequence model, try it with a sequence of length 2 and vocabulary size 3. Can you compute the loss by hand?
Step 4: Rewrite in your own notation. Sometimes changing the symbols makes it click. Replace abstract symbols with names that match your intuition.
Interactive example
Paper reading guide - annotated example of a real ML paper with each section labeled
Coming soon
Evaluating Empirical Claims Critically
Many papers overstate their results. Here is what to check:
Are the baselines strong and fair?
The paper should compare against the best existing methods, using the same computational budget, dataset splits, and evaluation protocol. Many papers cherry-pick weak baselines to make their method look better.
Are improvements meaningful or just noise?
If the baseline achieves 83.2% accuracy and the new method achieves 83.7%, is that real? Check:
- Are error bars or confidence intervals reported? A 0.5% improvement with standard deviation 1.2% is not significant.
- Was the same train/val/test split used? Different splits give different numbers.
- Was the comparison reported on the test set or a development set?
Ablation studies:
Good papers include ablations - experiments that systematically remove components of the proposed method to show which parts are necessary. If a paper does not include ablations, be more skeptical: you cannot tell which of the method's five novelties actually contributed to the improvement.
Building Your Reading List
Start with seminal papers that introduced the major building blocks:
Foundations:
- "Gradient-based learning applied to document recognition" (LeCun 1998) - CNNs
- "Attention is all you need" (Vaswani 2017) - Transformers
- "ImageNet classification with deep convolutional neural networks" (Krizhevsky 2012) - AlexNet
Techniques:
- "Dropout: A simple way to prevent neural networks from overfitting" (Srivastava 2014)
- "Batch normalization" (Ioffe and Szegedy 2015)
- "Adam: A method for stochastic optimization" (Kingma and Ba 2014)
For current work, follow:
- Follow (cs.LG, cs.CV, cs.CL sections)
- Papers With Code (links papers to code implementations)
- NeurIPS, ICML, ICLR conference proceedings
Interactive example
Paper reading tracker - log papers you have read and want to read, organized by topic
Coming soon