Bookmarks
collection of interesting links on the web to visit later, sorted in chronological order and fetched from the curius api, the daily sync happens at 12 am UTC
Total Bookmarks: 197 | Showing: 197
Wittgenstein's Games by A. C. Grayling - YouTube
Follow along using the transcript. Follow along using the transcript.
Using group theory to explore the space of positional encodings for attention
Attention is a computational primitive at the core of modern language models, allowing internal representations to reference and influence each other. It’s h...
The World Inside Neural Networks
How neural geometry will unlock understanding and control of AI
Another foray into active imagination - by ftlsid
I’ve written several times here and on Twitter about my practice of Jungian active imagination.
The Many Lives of Inkhaven 2.0 - by Sophie Kim
a collection of 40 life stories
[2604.27743] Why Self-Supervised Encoders Want to Be Normal
Abstract:Self-supervised learning has achieved remarkable empirical success in learning robust representations without explicit labels, most recently demonstrated within the framework of...
Learning the integral of a diffusion model – Sander Dieleman
A deep dive on flow maps.
[2510.01634] CAT: Curvature-Adaptive Transformers for Geometry-Aware Learning
Abstract:Transformers achieve strong performance across diverse domains but implicitly assume Euclidean geometry in their attention mechanisms, limiting their effectiveness on data with non-Euclidean...
Raj Lab basic Adobe Illustrator (CC) guide - Google Docs
Raj Lab basic Adobe Illustrator (CC) guide Originally published 8/21/2019 by Connie Jiang; last updated 7/24/2021 (personal edit updated/saved separately) Raj Lab checklist for Illustrator final...
Problem solving is often a matter of cooking up an appropriate Markox chain
N/A
Steering Along Manifolds to Control Neural Networks
Concept geometry provides a blueprint for controlling the behavior of neural networks—if you know how to look. Intervening on a model's internal representations to steer behavior, i.e.,...
Sending Samples Without Bits-Back
I’ll describe a fun little problem in information theory, and a solution–a compression algorithm–based on rejection sampling. This problem is motivated by the coding interpretation of the variational...
A shallow dive into formal verification
Over the last couple of months, a new programming paradigm has been rapidly gaining traction within Ethereum's frontier research and development circles, and many other corners of computing: writing...
Meta-Roadmap - AGI - Google Docs
Thesis: Neural networks do not generalize out of distribution. Thus, a model needs to be able to (continuously) expand its training distribution such that a given problem becomes solvable by...
The Sacrifices We Choose to Make
It is 8 o'clock in the morning of 11 June, 1963. In the city of Saigon in South Vietnam more than 300 Buddhist monks and nuns have gathered inside the largest Pagoda in Saigon, the Xa Loi Pagoda....
[2511.16652] Evolution Strategies at the Hyperscale
Abstract:Evolution Strategies (ES) is a class of powerful black-box optimisation methods that are highly parallelisable and can handle non-differentiable and noisy objectives. However, naïve ES...
The Unreasonable Effectiveness of the Chaotic Tent Map in Engineering Applications
Chaos Theory and Applications | Volume: 4 Issue: 4
Cognitive glues are shared models of relative scarcities: the economics of collective intelligence
Michael Levin, Benjamin Lyons; Cognitive glues are shared models of relative scarcities: the economics of collective intelligence. Philos Trans A Math Phys Eng Sci 14 May 2026; 384 (2320): 20240528....
Topological constraints on self-organization in locally interacting systems
Francesco Sacco, Dalton Sakthivadivel, Michael Levin; Topological constraints on self-organization in locally interacting systems. Philos Trans A Math Phys Eng Sci 14 May 2026; 384 (2320): 20250011....
Why PhD students should read the history of science - Vaishnavh Nagarajan
I accidentally discovered a trick during my PhD: reading the history of science is a powerful way to nurture your emo...
Black Hole Tech?—Stephen Wolfram Writings
In celebration of the detection of gravitational waves, Stephen Wolfram looks forward and discusses what technology black holes could make possible.
New Strides Made on Deceptively Simple ‘Lonely Runner’ Problem | Quanta Magazine
A straightforward conjecture about runners moving around a track turns out to be equivalent to many complex mathematical questions. Three new proofs mark the first significant progress on the problem...
WTF Is Happening Inside a Transformer
An intuition-first guide to what transformers actually compute, explained through the lens of linear algebra. Q, K, V demystified.
Differentiable Logic CA: from Game of Life to Pattern Generation
Imagine trying to reverse-engineer the complex, often unexpected patterns and behaviors that emerge from simple rules. This challenge has inspired researchers and enthusiasts that work with cellular...
Screeps: How A Game About Programming Sold Its Players a Remote Access Trojan
Note: After this article went viral on X, Screeps fixed the exploit, though while continuing to deny that it ever posed a risk and claiming that it would be the player's fault if they fell victim to...
Graph Operations | Alex Meiburg / Timeroot
Quantum ⊕ Physics ⊗ Algorithms
Differentiating through optimization with the IFT
Differentiating through optimization
GRaM Competition @ ICLR 2026 | Competition track of the GRaM workshop
Competition track of the GRaM workshop
Recursive Field Theory: An Attempt at a Metaphysical Systems Theory - YouTube
This talk explores the foundations of AI and consciousness, challenging the traditional agent-environment paradigm. Bennett proposes "stack theory," a novel framework based on abstraction layers,...
A generalization of Mirzakhani's identity, and geometric recursion - YouTube
Follow along using the transcript. Follow along using the transcript.
Tuning GPT-3 on a Single GPU
Cross-posted from Microsoft Research Blog
Jakob N. Foerster - How To Rebuttal ML Paper
Skip to main content Skip to navigation Jakob N. Foerster HomeHow To ML PaperJakob's Cold Email to NandoHow To Rebuttal ML PaperHow To Review ML PaperMore Google Sites Report abuse
Computers, Geometry and Einstein - Jason Lotay - YouTube
Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.
Behavior, Purpose and Teleology on JSTOR
When you visit our websites, we store cookies on your browser to collect information. The information collected might relate to you, your preferences or your device, and is mostly used to make the...
Mathematicians say ant colony problem-solving is isomorphic to machine learning
The discovery suggests collective intelligence, whether biological or artificial, emerges from the same universal principle
How Attention Residuals Rewire Modern LLMs - YouTube
Explore how modifying traditional transformer connections can enhance deep network training. This approach replaces fixed skip connections with learned, data-dependent weights to address gradient...
Jacobi Fields in Machine Learning — Olga Zaghen
An intuitive introduction to Jacobi fields and their applications in machine learning on Riemannian manifolds.
The PhD Metagame: Your Paper Is an Ad - Maxwell Forbes
Insiders to our social practice of science understand that a research paper serves several purposes. In increasing...
The Ego Trip - by Hadas Weiss - The Hinternet
Continuing our occasional series of “Woman on Unlikely Pilgrimages” (see Daphné Tamage on the trail of John Fante, or the same en français), today we bring you Hadas Weiss on a very different sort of...
A PhD expectations guide - by Patrik Reizinger
The unknown unknowns of a PhD — and a checklist to make them known
Dual Approaches to Projective Geometric Algebra - Eric Lengyel
Projective geometric algebra, where we model simple objects and transformations in a space containing one extra dimension, is full of interesting dualities. Scalars are embedded in two ways, every...
Hacking Super Mario 64 using covering spaces (+ hyperbolic geometry)
Visualization of the universal cover of a surface of genus 2, which is a hyperbolic space.
Read Less, Steer More : ezyang's blog
I was coaching some juniors on effective use of AI coding agents (like, looking over their shoulder as they were prompting the LLMs) and one reoccurring theme was that the AI agents were demanding a...
ezyangs' workflow
Previously in AI-assisted programming for spmd_types, I mentioned that I have been enjoying using Sapling (the version control system Meta uses internally) to manage parallel agents on worktrees. In...
PhD Year 1: Joy and Rejection | Hongyu Hè
A reflection on my 1st year of the PhD, where joy and rejection came together to reshape how I think about research, solitude, and life.
What Universal Human Experiences Are You Missing Without Realizing It?
Remember Galton’s experiments on visual imagination? Some people just don’t have it. And they never figured it out. They assumed no one had it, and when people talked about being able...
When Einstein Met Tagore: A Remarkable Meeting of Minds on the Edge of Science and Spirituality – The Marginalian
Collision and convergence in Truth and Beauty at the intersection of science and spirituality.
Bluebooking for happiness | Zhengdong
Zhengdong Wang’s personal website
The Moore's Law of Synthetic Gene Circuits
Why biology's circuits stopped scaling and what might change that
Chandrasekhar’s Voyage into Black Holes
Welcome to Flashcards Friday here at Math! Science! History! where every Friday, we take a little idea and make a big discovery out of it. I’m your host, Gabrielle Birchak, and today’s...
Thinking About a Bias for Better Semantic Coherence and Reuse - jchencxh
For a general intelligence, good generalisation requires good reuse of semantic components. Good reuse of semantics requires that there exists a consistent internal handle for the same semantic...
How not to do research - Rajan Agarwal
I don't usually share when things go wrong. Like most, my public work tends to be the stuff that worked, but I learned a lot from this project, and I want to share what I learned about the problem,...
Braitenberg vehicle - Wikipedia
A Braitenberg vehicle is a concept presented as a thought experiment by the Italian cyberneticist Valentino Braitenberg in his book Vehicles: Experiments in Synthetic Psychology. The book models the...
Radial Basis Function
Radial Basis Functions (RBF) play an essential role in Machine Learning, particularly in addressing non-linear problems. They are used to approximate complex functions, classify data, and solve...
REAP- model pruning
Cerebras is the go-to platform for fast and effortless AI training. Learn more at cerebras.ai.
The Hidden Mathematics in Stranger Things - YouTube
Ellie Sleightholm explores the mathematical concepts in Stranger Things 5. The video analyzes Dustin's circle calculations and delves into complex equations from general relativity. Learn about the...
Visualizing transformers and attention | Talk for TNG Big Tech Day '24 - YouTube
Follow along using the transcript. Follow along using the transcript. New New New New
‘grokking (NN)’ directory · Gwern.net
Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond The slingshot helps with learning Emergent properties with repeated...
Dario Amodei — The Adolescence of Technology
Confronting and Overcoming the Risks of Powerful AI
Eleven years in AI: What does it actually mean to be a researcher?
A tool that connects everyday work into one space. It gives you and your teams AI tools—search, writing, note-taking—inside an all-in-one, flexible workspace.
Life on Claude Nine - Igor Babuschkin
It started with automating his emails.
Five ways to be stupid- Le Cun
Five Ways to Act Deluded, Stupid, Ineffective, or Evil Yann LeCun 2025-04-28 [semi-humorous, geeky political satire ahead] Introduction Cognitive Science has proposed various models of how humans...
Giving University Exams in the Age of Chatbots
Giving University Exams in the Age of Chatbots par Ploum - Lionel Dricot.
All 325+ Competing Consciousness Theories In One Video
Follow along using the transcript. Follow along using the transcript.
An Unofficial Guide to Prepare for a Research Position Application
A candid look at what we look for when interviewing research candidates at Sakana AI. The core principle? Understanding over implementation.
Illicit Love Letters: Albert Camus and Maria Casares
For the past few weeks, I’ve fixated on a collection of primary source material that reads like a tidy work of epistolary fiction. It’s a big book, nearly 1,300 pages, transcribed from original...
‘He was in mystic delirium’: was this hermit mathematician a forgotten genius whose ideas could transform AI – or a lonely madman? | Mathematics | The Guardian
In isolation, Alexander Grothendieck seemed to have lost touch with reality, but some say his metaphysical theories could contain wonders
The Mythology Of Conscious AI
Why consciousness is more likely a property of life than of computation and why creating conscious, or even conscious-seeming AI, is a bad idea.
Where physics and biology meet - ScienceDirect
All content on this site: Copyright © 2026 Elsevier B.V., its licensors, and contributors. All rights are reserved, including those for text and data mining, AI training, and similar technologies....
Matthew Walker's "Why We Sleep" Is Riddled with Scientific and Factual Errors - Alexey Guzey
See discussion of this essay on the forum, Hacker News (a), Marginal Revolution (a), Andrew Gelman’s blog 1 (a), 2 (a), 3 (a), 4 (a), /r/slatestarcodex (a), Twitter (a), listen to BBC interviewing me...
Post 51: Socratic Persuasion: Giving Opinionated Yet Truth-Seeking Advice — Neel Nanda
I recommend giving advice by asking questions to walk someone through key steps in my argument — often I’m missing key info, which comes up quickly as an unexpected answer, while if...
What the humans like is responsiveness - by Sasha Chapin
What do the humans like? Apparently, they like this woman ordering food in a slightly flirtatious manner at a food truck. A total of 1.9 million souls have clicked “heart” on this brief clip. Okay,...
Discovery fiction
To help me understand a scientific result, I often find it helpful to write what I call discovery fiction. By this I mean: a plausible story of how I could have discovered that result – an arc of...
The Intelligence Curse
This series examines the incoming crisis of human irrelevance and provides a map towards a future where people remain the masters of their destiny.
DeepSeek's mHC: When Residual Connections Explode - Taylor Kolasinski
Taylor Kolasinski - Engineering at FlowMode. ML systems & research, reinforcement learning, robotics. Based in Brooklyn, NY.
[2008.03936] Intelligent Matrix Exponentiation
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and...
He Co-Invented the Transformer. Now: Continuous Thought Machines [Llion Jones / Luke Darlow] - YouTube
A Transformer inventor shifts focus, exploring novel recurrent models. This Machine Learning Street Talk episode delves into the Continuous Thought Machine, a biologically-inspired architecture....
Understanding Image Gradients
In the previous blogs, we discussed different smoothing filters. Before moving forward, let’s first discuss Image Gradients which will be useful in edge detection, robust feature and texture...
"Autoregressive Transformers vs Text Diffusion Models" / X
To view keyboard shortcuts, press question mark View keyboard shortcuts Added to your Bookmarks Add to Folder Home Explore Notifications Chat Grok Bookmarks Creator...
Don't fall into the anti-AI hype -
First-order Derivative kernels for Edge Detection | TheAILearner
Remember that derivatives only exists for continuous functions but the image is a discrete 2D light intensity function. Thus in the last blog, we approximated the image gradients using finite...
Using AI, Mathematicians Find Hidden Glitches in Fluid Equations | Quanta Magazine
Nearly 200 years ago, the physicists Claude-Louis Navier and George Gabriel Stokes put the finishing touches on a set of equations that describe how fluids swirl. And for nearly 200 years, the...
Position: Categorical Deep Learning is an Algebraic Theory of All Architectures
N/A
John Carmack on Idea Generation
Last year at an internal talk at Facebook I was fortunate to see [John Carmack](https://en.wikipedia.org/wiki/John_Carmack) speak about his idea generation system. At first I was disappointed...
Category: The Essence of Composition | Bartosz Milewski's Programming Cafe
I was overwhelmed by the positive response to my previous post, the Preface to Category Theory for Programmers. At the same time, it scared the heck out of me because I realized what high...
Mixture of Experts Explained
With the release of Mixtral 8x7B (announcement, model card), a class of transformer has become the hottest topic in the open AI community: Mixture of Experts, or MoEs for short. In this blog post, we...
The Futurist Manifesto, by Filippo Tommaso Marinetti
We had stayed up all night, my friends and I, under hanging mosque lamps with domes of filigreed brass, domes starred like our spirits, shining like them with the prisoned radiance of electric...
returning to stanford | writing
Reflecting on my first quarter of school, after two gap years.
Ithaka | The Poetry Foundation
At a Slight Angle to the Universe Advertise with Poetry Poetry Magazine Poetry Magazine Archive Submit to Poetry
The Smallest Eigenvalues of a Graph Laplacian
Given a graph $ G = (V, E) $, its adjacency matrix $ A $ contains an entry at $ A_{ij} $ if vertices $ i $ and $ j $ have an edge between them. The degree matrix $ D $ contains the degree of each...
A Short Tutorial on Graph Laplacians, Laplacian Embedding, and Spectral Clustering
N/A
Metric Learning and Manifolds: Preserving the Intrinsic Geometry - YouTube
Follow along using the transcript. Follow along using the transcript.
Differential geometry of ML
Machine learning has achieved remarkable advancements largely due to the success of gradient descent algorithms. To gain deeper mathematical insight into these algorithms, it is essential to adopt an...
Togelius: Math and me
For most of my adult life, I was too cowardly to write this text, never mind posting it. I was worried about what people would think, and th...
Fibre integrals and the Thom isomorphism
Today, I want to briefly go over the formal definition of a fibre integral, a very useful construction which is sometimes invoked a bit flippantly in some differential topology proofs. The idea is...
A Decade of Residuals: History & Effects on modern ML
Skip to content A Decade of Residuals: History & Effects on modern ML Introduction: Gradient Highways A decade ago training deep neural nets was quite the bottleneck in ML. Increasing depth,...
Learning world model learning from scratch
Learning world model learning from scratch. Contribute to pham-tuan-binh/learning-world-model-learning development by creating an account on GitHub.
AlmondGod/tinyworlds: A minimal implementation of DeepMind's Genie world model
A minimal implementation of DeepMind's Genie world model
ongoing survey on starcraft research
I have wanted to get into StarCraft research for a long time. Actually, right after I played StarCraft II for the first time. (I did not care about research when I tried the BroodWar at school =( )....
I miss thinking hard.
By “thinking hard,” I mean encountering a specific, difficult problem and spending multiple days just sitting with it to overcome it. a) All the time. b) Never. c) Somewhere in between. If your...
Advice for research projects
Every year we get contacted by students who wish to work on short-term machine learning research projects with us. By now, we have supervised a good number of them and we noticed that some of the...
Discrete Calculus | Ji-Ha's Blog
An introduction to Discrete Calculus, a theory for sums and differences of sequences as opposed to derivatives and integrals of functions in infinitesimal calculus.
The Splintered Mind: Is Signal Strength a Confound in Consciousness Research?
But Michel does make one claim that bugs me, and that claim is central to the article. And Hakwan Lau -- another otherwise terrific methodologist -- makes a similar claim in his 2022 book In...
Post 38: On Slack - Having room to be excited — Neel Nanda
On the importance of Slack - the freedom and spare capacity left on your life. How to guard and protect your Slack, notice the bottlenecks which bleed away your Slack, notice the drive to optimise...
writing RSS reader in 80 lines of bash
I consume most of my information through RSS. RSS is amazing. Up until yesterday, I used Newsboat to go through my feed, and had a ,a macro that appended a link from the item to a text file with...
How the Brain and AI Reuse Old Knowledge in New Situations - Kempner Institute
Humans and other animals are remarkably good at using old knowledge in new situations. This ability — known as generalization — allows us to recognize a friend in an unexpected […]
Why Diffusion Language Models Are the Future | Dimitri von Rütte
Lessons learned from working with discrete diffusion language models, and more or less speculative predictions about their future.
Spiritual practices strongly associated with reduced risk for hazardous alcohol and drug use
Spirituality—religious or otherwise—may be protective against substance misuse, according to a new Harvard Chan School study.
Exploring the Asymmetry of Life
Recent alumnus S. Furkan Ozturk, PhD ’24, talks about a life-changing summer camp experience, going to college during a coup attempt and ISIS bombings, and searching for the origins of life through...
Deriving the KL divergence loss in variational autoencoders
Let's derive some things related to variational auto-encoders (VAEs). Evidence Lower Bound (ELBO) First, we'll state some assumptions. We have a dataset of images, xxx. We'll assume that each image...
Taste for Makers
February 2002 "...Copernicus' aesthetic objections to [equants] provided one essential motive for his rejection of the Ptolemaic system...." - Thomas Kuhn, The Copernican Revolution "All...
On Stress · Gwern.net
Stoic meditation with reference to being homeless. Written to myself at a particularly low point; like many, I take comfort in considering how things could be worse.
P2P No. 20 — The Gladius of Comparison
Compare forward, but not backward.
The ReLU illusion of progress - by Patrik Reizinger
A message to first-year PhD students who feel like they have nothing to show.
The Surprisingly Powerful Influence of Drawing on Memory
Fermat's Library is a platform for illuminating academic papers.
[2405.07987] The Platonic Representation Hypothesis
Abstract:We argue that representations in AI models, particularly deep networks, are converging. First, we survey many examples of convergence in the literature: over time and across multiple...
Making a Tiny Mac From a Raspberry Pi Zero
Making a Tiny Mac From a Raspberry Pi Zero: Years ago I saw that John Leake built a 1/3 scale Macintosh. He made his before cheap 3d printers were everywhere. His was made from scratch from sheets of...
Shape Rotation 101: An Intro to Einsum and Jax Transformers
Acknowledgements First, I would like to acknowledge my friends and kind internet strangers who helped me with this post. This post heavily adapts from ...
Which Future?
This essay is the text for a talk on how to wisely navigate risks from transformative technology, especially artificial superintelligence (ASI). It was given at Astera on January 28, 2026. In 1954...
frontier model training methodologies | Alex Wa’s Blog
How do labs train a frontier, multi-billion parameter model? We look towards seven open-weight frontier models: Hugging Face’s SmolLM3, Prime Intellect’s Intellect 3, Nous Research’s Hermes 4,...
Why I Write - Niko McCarty
In the summer of 1946, shortly after the close of World War II, George Orwell published a short essay entitled “Why I Write.” He had already released Coming Up for Air, Keep the Aspidistra Flying (my...
NaturalProofs: Mathematical Theorem Proving in Natural Language
Abstract:Understanding and creating mathematics using natural mathematical language - the mixture of symbolic and natural language used by humans - is a challenging and important problem for driving...
Your Transformer is Secretly an EOT Solver | Elements of a Vector Space
How the search for a simple approximation revealed a beautiful, exact solution.
Are there any mathematical knots that exist in dimensions higher than 3?
There are no nontrivial knots that live in four- or higher-dimensional spaces, because if you have four dimensions to work in you can easily untie any knot. Of course a knot is just an embedding of a...
Conscious AI? Not Even Close! | Brain Inspired
The Transmitter is an online publication that aims to deliver useful information, insights and tools to build bridges across neuroscience and advance research. Visit thetransmitter.org to explore the...
The Singularity will Occur on a Tuesday - Cam Pedersen
src={alwaysHasBeen.src} alt="Always has been astronaut meme" / "Wait, the singularity is just humans freaking out?" "Always has been." Everyone in
The Project Gutenberg eBook #5740: Tractatus Logico-Philosophicus
N/A
The Shared State of Mathematics
Last month I wrote about failed attempt to understand Fermat’s Last Theorem. I confessed failure — or at least, incomplete success. I understood the skeleton of Wiles’ proof, but not its flesh. But...
What's Our Problem? — Wait But Why
A popular long-form, stick-figure-illustrated blog about almost everything.
Petri Dish Neural Cellular Automata
While neural cellular automata (NCA) have proven effective for modeling morphogenesis and self-organizing processes, they are typically governed by a fixed, non-adaptive update rule shared across all...
a journey through the american west - by Jasmine Li
Taking a long journey on low-speed rail has for a while been on my bucket list.1 So, I organized a trip with friends this Thanksgiving break on the California Zephyr, which runs from Chicago to San...
Deep-learning model predicts how fruit flies form, cell by cell | MIT News | Massachusetts Institute of Technology
Images for download on the MIT News office website are made available to non-commercial entities, press and the general public under a Creative Commons Attribution Non-Commercial No Derivatives...
Transformers and Self-Attention (DL 19) - YouTube
This Davidson College lecture explores Transformer neural networks, explaining their architecture and self-attention mechanisms. Professor Bryce details how these networks process word embeddings and...
Towards Geometric Deep Learning
Geometric Deep Learning is an umbrella term for approaches considering a broad class of ML problems from the perspectives of symmetry and invariance. It provides a common blueprint allowing to derive...
Consciousness and Philosophy of Mind
Understanding consciousness is one of the most profound intellectual challenges we face, and with the advent of increasingly compelling AI, it has assumed practical signnificance. I have been working...
The Story of Heads
Model trained on WMT EN-DE Model trained on WMT EN-DE Model trained on WMT EN-FR Model trained on WMT EN-FR Model trained on WMT EN-RU Model trained on WMT EN-RU Model trained on OpenSubtitles EN-RU...
Neural Networks, Manifolds, and Topology -- colah's blog
Recently, there’s been a great deal of excitement and interest in deep neural networks because they’ve achieved breakthrough results in areas such as computer vision.1
How To Win
I really like games. I think that they provide a nice controlled environment for distilling what it means to become good at something, and as a result I think that everybody should play them. Most...
How the NanoGPT Speedrun WR dropped by 20% in 3 months — LessWrong
In early 2024 Andrej Karpathy stood up an llm.c repo to train GPT-2 (124M), which took an equivalent of 45 minutes on 8xH100 GPUs to reach 3.28 cross entropy loss. By Jan 2025, collaborators of...
How to Think About GPUs | How To Scale Your Model
We love TPUs at Google, but GPUs are great too. This chapter takes a deep dive into the world of NVIDIA GPUs – how each chip works, how they’re networked together, and what that means for LLMs,...
The Bahdanau Attention Mechanism - MachineLearningMastery.com
Conventional encoder-decoder architectures for machine translation encoded every source sentence into a fixed-length vector, regardless of its length, from which the decoder would then generate a...
KV Caching & Attention Optimization: From O(n²) to O(n) | by pdawg | Medium
We’ve seen how an LLM types out a thousand-word answer, word by word, as if it’s “thinking out loud”. It feels smooth, but behind the scenes, the process is painfully inefficient. At generation step...
Transformers are Graph Neural Networks
My engineering friends often ask me: deep learning on graphs sounds great, but are there any real applications? While Graph Neural Networks are used in recommendation systems at Pinterest, Alibaba...
Using topology for discrete problems | The Borsuk-Ulam theorem and stolen necklaces - YouTube
Follow along using the transcript. Follow along using the transcript.
Energy-Based Models · Deep Learning
We will introduce a new framework for defining models. It provides a unifying umbrella that helps define supervised, unsupervised and self-supervised models. Energy-based models observe a set of...
On the Biology of a Large Language Model
We investigate the internal mechanisms used by Claude 3.5 Haiku — Anthropic's lightweight production model — in a variety of contexts, using our circuit tracing methodology.
Galaxy brain resistance
One important property for a style of thinking and argumentation to have is what I call galaxy brain resistance: how difficult is it to abuse that style of thinking to argue for pretty much whatever...
On Seeing Through and Unseeing: The Hacker Mindset · Gwern.net
Defining the security/hacker mindset as extreme reductionism: ignoring the surface abstractions and limitations to treat a system as a source of parts to manipulate into a different system, with...
Towards a Geometric Theory of Deep Learning - Govind Menon - YouTube
This lecture explores the mathematical underpinnings of deep learning, focusing on the deep linear network (DLN) model. The speaker presents sharp results on training dynamics, uncovering unexpected...
Reinforcement Learning algorithms summarized
The basic task of reinforcement learning is this: given a state we’re in and probabilities of different actions we can take, how do we increase or decrease those probabilities so that we increase...
The Love Song of J. Alfred Prufrock by T. S. Eliot | Poetry Magazine
The yellow fog that rubs its back upon the window-panes, The yellow smoke that rubs its muzzle on the window-panes, Licked its tongue into the corners of the evening, Lingered upon the pools that...
thalassophilia - by hannah - a lot about nothing
Lately I’ve been restless as ever. I miss home and all its people and real seasons where all the leaves color the sky orange-red and dance swirling in the sweeping wind. I miss the vastness of the...
The Concept of the Ruliad—Stephen Wolfram Writings
I call it the ruliad. Think of it as the entangled limit of everything that is computationally possible: the result of following all possible computational rules in all possible ways. It’s yet...
Francesco Capuano
TLDR: A technical blog to revisit the fundamentals of what, in the crudest sense, makes Deep Learning work. A SGD-to-Muon tour, derived from first principles in math and then implemented from scratch...
Just quit - Arjun Raj
We spend a lot of time as scientists thinking about how to choose a project—and that is, of course, critically important to success. But… no matter how carefully you try to pick out the most...
PiTorch: ML on Baremetal Raspberry Pis | projects
How do you get from a $5 computer to a working language model? We strip away every layer of abstraction and build up from scratch, **running and training models** on a cluster of Pi Zeros. No...
Elementary Condensation - by Jan Hendrik Kirchner
Previously in this series: Elementary Infra-Bayesianism
Ilija Lichkovski on X: "Defining Continual Learning" / X
To view keyboard shortcuts, press question mark View keyboard shortcuts Home Explore Notifications Chat Grok Bookmarks Creator Studio Premium Profile More Post weeye @weeyev Article See new...
JEPA Wiki - a compiled resource of everything JEPA
Use this site to browse a curated collection of articles, timelines, and videos about Joint-Embedding Predictive Architecture. Just navigate the menus to read overviews, detailed concept pages, and...
The Truth of Fact, the Truth of Feeling by Ted Chiang
When my daughter Nicole was an infant, I read an essay suggesting that it might no longer be necessary to teach children how to read or write, because speech recognition and synthesis would soon...
A Powerful New ‘QR Code’ Untangles Math’s Knottiest Knots | Quanta Magazine
With a newly discovered mathematical tool, researchers are hoping to gain unprecedented insight into the structure of complex knots.
DeepSeek Sparse Attention from First Principles
FLOPs, dollars and a path to million-token context window
Tautology, Barbers, Impredicativity and Self-Adjudication
Tautology, Barbers, Impredicativity and Self-AdjudicationNeil D. Lawrence
metaseq/OPT175B_Logbook.pdf at main · facebookresearch/metaseq
N/A
Deriving Muon
We recently proposed Muon: a new neural net optimizer. Muon has garnered attention for its excellent practical performance: it was used to set NanoGPT speed records leading to interest from the big...
How we collected 10,000 hours of neuro-language data in our basement - Conduit
Over the last 6 months, we collected ~10k hours of data across thousands of unique individuals. As far as we know, this is the largest neuro-language dataset in the world.[1] See here, here, here,...
The Geometry of Surprise
Why curiosity methods collapse prediction error into one number, what that costs for continual learning, and how a settling substrate could preserve the shape of surprise.
[2407.08723] Topological Generalization Bounds for Discrete-Time Stochastic Optimization Algorithms
Abstract:We present a novel set of rigorous and computationally efficient topology-based complexity notions that exhibit a strong correlation with the generalization gap in modern deep neural...
Notes on Distributed Training | plugyawn's blog
This blog probes and develops the idea of distributed training of large models over heterogeneous devices. By distributed training, we will mean two things: ...
Epistemic Humility in the Age of AI
During my doctoral studies, I learned a lot of things about me that I did not know before. I gazed into the abyss, as one is wont to do, and the abyss winked back at me. I rejoiced at new insights...
Flow Matching in 5 Minutes | wh
In this post, I will try to build an intuitive understanding to the Flow Matching, a framework used to train many state-of-the-art generative image models. In generative modelling, we start with 2...
Who By Very Slow Decay | Slate Star Codex
[Trigger warning: Death, pain, suffering, sadness] I. Some people, having completed the traditional forms of empty speculation – “What do you want to be when you grow up?”, “…
What we think is a decline in literacy is a design problem | Aeon Essays
is a university librarian at Charles Sturt University in New South Wales, Australia. He writes the Hybrid Horizons Substack. Edited bySam Haselby Listen to this essay 23 minute listen Everyone is...
Language, Curiosity, and Life
My new reflection: Language, Curiosity, and Life. It is my attempt to put into words what has mattered most to me: language, family, curiosity, music, work, illness, and gratitude. This is not a...
The Spacetime Geometry of Diffusion Models | Rafał Karczewski
A summary of our ICLR 2026 Oral paper on defining a geometric structure on the latent space of diffusion models using information geometry.
When I say "toy models", what do I mean? | Ziming Liu
A simple, whitespace theme for academics. Based on [*folio](https://github.com/bogoli/-folio) design.
Tricks Wiki: Use basic examples to calibrate exponents - T.T
Title: Use basic examples to calibrate exponents Motivation: In the more quantitative areas of mathematics, such as analysis and combinatorics, one has to frequently keep track of a large number of…
Grasping Graphormer : Assessing Transformer Performance for Graph Representation |
A first-principles blog post to understand the Graphormer.
Straight lines on graphs - Joel Becker
People who do not watch AI developments closely are often suspicious of an intuition that I’ll call “straight lines on graphs,” which holds that progress in AI is (not only rapid but) remarkably...
