Scaling Exponents Across Parameterizations and Optimizers
K Everett, L Xiao, M Wortsman, AA Alemi, R Novak, PJ Liu, I Gur, J Sohl-Dickstein, LP Kaelbling, J Lee, J Pennington
ICML 2024
Understanding parameterizations and how to scale them.
Training LLMs over Neurally Compressed Text
B Lester, J Lee, AA Alemi, J Pennington, A Roberts, J Sohl-Dickstein, N Constant
Trying to train transformers on top of transformers with arithmetic compression.
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models
Squeezing more performance out of models by fine-tuning on filtered generated responses.
Frontier Language Models are not Robust to Adversarial Arithmetic, or "What do I need to say so you agree 2+2=5?"
It's easy to get models to perform arithmetic incorrectly, if you just ask nicely.
Small-scale proxies for large-scale Transformer training instabilities
M Wortsman & PAGI
ICLR 2024
Studying problems of large scale models in the small.
Speed Limits for Deep Learning
I Seroussi, AA Alemi, M Helias, Z Ringel
Working out thermodynamic speed limits for learning.
Variational Prediction
AA Alemi, B Poole
Targetting the predictive distribution directly with a variational method.
Weighted Ensemble Self-Supervised Learning
Y Ruan, S Singh, WR Morningstar, AA Alemi, S Ioffe, I Fischer, JV Dillon
ICLR 2023
Ensembling the heads of SSL methods gives nice gains.
Trajectory ensembling for fine tuning - performance gains without modifying training
L Anderson-Conway, V Birodkar, S Singh, H Mobahi, AA Alemi
HITY Workshop NeurIPS 2022
Ensembling within a trajectory gives some simple gains.
Bayesian Imitation Learning for End-to-End Mobile Manipulation
Y Du, D Ho, AA Alemi, E Jang, M Khansari
ICML 2022
Using VIB to help robots open doors.
A Closer Look at the Adversarial Robustness of Information Bottleneck Models
I Korshunova, D Stutz, AA Alemi, O Wiles, S Gowal
ICML 2021 AML Workshop Poster
Looking more carefully, IB models aren't fully robust to adversarial examples.
Does Knowledge Distillation Really Work?
S Stanton, P Izmailov, P Kirichenko, AA Alemi, AG Wilson
Knowledge distillation doesn't seem to work as well as people assume it does.
VIB is Half Bayes
AA Alemi, WR Morningstar, B Poole, I Fischer, JV Dillon
AABI 2021 Oral
VIB can be rederived as a half-Bayesian half-Maximum likelihood method.
PACᵐ-Bayes: Narrowing the Empirical Risk Gap in the Misspecified Bayesian Regime
WR Morningstar, AA Alemi, JV Dillon
Multisample bound that does better than Bayes at prediction for misspecified models.
Density of States Estimation for Out-of-Distribution Detection
WR Morningstar, C Ham, AG Gallagher, B Lakshminarayanan, AA Alemi, JV Dillon
AISTATS 2021 Oral
Simple density-of-states inspired out of distribution detection.
The OpenKIM Processing Pipeline: A Cloud-Based Automatic Materials Property Computation Engine
DS Karls, M Bierbaum, AA Alemi, RS Elliot, JP Sethna, EB Tadmor
Journal of Chemical Physics
Database for Interatomic Potentials.
Neural Tangents: Fast and Easy Infinite Neural Networks in Python
R Novak, L Xiao, J Hron, J Lee, AA Alemi, J Sohl-Dickstein, SS Schoenholz
Simple to use python package for training infinitely wide neural networks.
Variational Predictive Information Bottleneck
AA Alemi
Most modern inference procedures can be rederived as a simple variational bound on a predictive information bottleneck objective.
Information in Infinite Ensembles of Infinitely-Wide Networks
R Shwartz-Ziv, AA Alemi
AABI 2019 - PMLR
While they seem complex, infinite ensembles of infinitely-wide networks are simple enough to enable tractable calculations of many information theoretic quantities.
CEB Improves Model Robustness
I Fischer, AA Alemi
A class conditional version of VIB shows good robustness.
On Predictive Information in RNNs
Z Dong, D Oktay, B Poole, AA Alemi
Modern RNNs do not optimally capture predictive information in sequences.
Thermodynamic Computing
T Conte, E DeBenedictis, N Ganesh, T Hylton, JP Strachan, RS Williams, AA Alemi, L Altenberg, G Crooks, J Crutchfield, L del Rio, J Deutsch, M DeWeese, K Douglas, M Esposito, M Frank, R Fry, P Harsha, M Hill, C Kello, J Krichmar, S Kumar, SC Liu, S Lloyd, M Marsili, I Nemenman, A Nugent, N Packard, D Randall, P Sadowski, N Santhanam, R Shaw, A Stieg, E Stopnitzky, C Teuscher, C Watkins, D Wolpert, J Yang, Y Yufik
A position paper on the future of thermodynamic computing.
On Variational Bounds of Mutual Information
B Poole, S Ozair, A van den Oord, AA Alemi, G Tucker
Overview of recent advances in variationally bounding mutual information.
Dueling Decoders: Regularizing Variational Autoencoder Latent Spaces
B Seybold, E Fertig, AA Alemi, I Fischer
Sometimes a worse decoder gives better representations.
Variational Autoencoders with Tensorflow Probability Layers
I Fischer, AA Alemi, JV Dillon, TFP Team
Tensorflow Blog
TFP makes VAEs easy.
On the Use of ArXiv as a Dataset
CB Clement, M Bierbaum, KP O'Keeffe, AA Alemi
ICLR workshop RLGM
More people should use the ArXiv as a dataset.
β-VAEs can retain label information even at high compression
E Fertig, A Arbabi, AA Alemi
NeurIPS BDL Workshop
Some rich decoder VAEs can magically focus on salient information.
Canonical Sectors and Evolution of Firms in the US Stock Markets
LX Hayden, R Chachra, AA Alemi, PH Ginsparg, JP Sethna
Quantitative Finance
Matrix factorization gives automatic and continous sector assignments to stocks.
WAIC, but Why? Generative Ensembles for Robust Anomaly Detection
H Choi, E Jang, AA Alemi
Even though it shouldn't work, robust likelihoods can detect OOD data in practice.
TherML: Thermodynamics of Machine Learning
AA Alemi, I Fisher
ICML2018 TFADGM Workshop
Modern variational latent variable modelling looks a lot like Thermodynamics.
Uncertainty in the Variational Information Bottleneck
AA Alemi, I Fischer, JV Dillon
UAI UDL Workshop
VIB builds robust classifiers which are aware of what they don't know.
Watch your step: Learning node embeddings via graph attention
S Abu-El-Haija, B Perozzi, R Al-Rfou, AA Alemi
Building better graph representations.
GILBO: one metric to measure them all
AA Alemi, I Fischer
A variational lower bound on the mutual informations in GANs highlight some of their problems.
Fixing a Broken ELBO
AA Alemi, B Poole, I Fischer, JV Dillon, RA Saurous, K Murphy
Adopting a representational view of VAEs can help explain away some of their problems.
Tensorflow distributions
JV Dillon, I Langmore, D Tran, E Brevdo, S Vasudevan, D Moore, B Patton, AA Alemi, M Hoffman, RA Saurous
Paper accompanying library.
Light microscopy at maximal precision
M Bierbaum, BD Leahy, AA Alemi, I Cohen, JP Sethna
Phys Rev X
Better featuring of colloids.
Jeffrey's prior sampling of deep sigmoidal networks
LX Hayden, AA Alemi, PH Ginsparg, JP Sethna
Jeffrey's prior doesn't really work for neural networks.
Motion prediction under multimodality with conditional stochastic networks
K Fragkiadaki, J Huang, AA Alemi, S Vijayanarasimhan, S Ricco, R Sukthankar
Pedestrian motion is stochastic which creates certain challenges.
Inception-v4, inception-resnet and the impact of residual connections on learning
C Szegedy, S Ioffe, V Vanhoucke, AA Alemi
Residual connections improve the inception family of classifiers.
Deep Variational Information Bottleneck
AA Alemi, I Fischer, JV Dillon, K Murphy
A modern formulation of the Information Bottleneck which is friendly towards neural networks.
Improved generator objectives for gans
B Poole, AA Alemi, J Sohl-Dickstein, A Angelova
NeurIPS Adversarial Workshop
You can target separate divergences for the generator and discriminator of a GAN.
Tree-Structured Variational Autoencoder
R Shin, AA Alemi, G Irving, O Vinyals
Attempting to learn tree-structured representations.
Improving inception and image classification in tensorflow
AA Alemi
Google Research Blog
Blogpost accompanying open source release of Inception Resnet V2.
DeepMath-deep sequence models for premise selection
G Irving, C Szegedy, AA Alemi, N Eén, F Chollet, J Urban
Using neural networks to improve automatic theorem proving.
SPARTA: Fast global planning of collision-avoiding robot trajectories
CJM Mathy, F Gonda, D Schmidt, N Derbinsky, AA Alemi, J Bento, FM Delle Fave, JS Yedidia
Using ADMM to do fast trajectory planning.
You can run, you can hide: The epidemiology and statistical mechanics of zombies
AA Alemi, M Bierbaum, CR Myers, JP Sethna
Phys Rev E
A fun pedadogical introduction to epidemiology and statistical mechanics.
Zombies Reading Segmented Graphene Articles On The Arxiv
AA Alemi
A collection of four of my graduate student projects.
Clustering via Content-Augmented Stochastic Blockmodels
JM Cashore, X Zhao, AA Alemi, Y Liu, PI Frazier
Better clustering through content conditioning.
Text segmentation based on semantic word embeddings
AA Alemi, P Ginsparg
Using word2vec vectors to do automatic text segmentation.
Mechanical properties of growing melanocytic nevi and the progression to melanoma
A Taloni, AA Alemi, E Ciusani, JP Sethna, S Zapperi, CAM La Porta
PloS One
Elastic models of skin cancer.
Ensuring reliability, reproducibility and transferability in atomistic simulations: The knowledgebase of interatomic models (https://openkim.org)
E Tadmor, R Elliott, D Karls, A Ludvik, J Sethna, M Bierbaum, AA Alemi, T Wennblom
Knowledgebase of Interatomic Models application programming interface as a standard for molecular simulations
R Elliott, E Tadmor, D Karls, A Ludvik, J Sethna, M Bierbaum, AA Alemi, T Wennblom
Building a website to collect interatomic potentials and score them.
Imaging atomic rearrangements in two-dimensional silica glass: watching silica's dance
PY Huang, S Kurasch, JS Alden, A Shekhawat, AA Alemi, PL McEuen, JP Sethna, U Kaiser, DA Muller
Applying elastic theory to the atomic scale.
Growth and form of melanoma cell colonies
MM Baraldi, AA Alemi, JP Sethna, S Caracciolo, CAM La Porta, S Zapperi
Simple models of skin cancer growth.
Near-field radiative heat transfer between macroscopic planar surfaces
RS Ottens, Volker Quetschke, Stacy Wise, AA Alemi, Ramsey Lundock, Guido Mueller, David H Reitze, David B Tanner, Bernard F Whiting
Phys Rev Lett
Exploration of quantum tunnelling as a mechanism for cooling the next generation LIGO detectors.
Laplace-Runge-Lenz Vector
AA Alemi
Undergraduate project on the history of the Runge Vector.
NEMS Coupling
AA Alemi
Undergraduate research project on synchronization in nano cantilevers.
Why Venus has no moon
AA Alemi, DJ Stevenson
AAS Oral
Undergraduate research investigating whether two collisions in the opposite direction could explain Venus' lack of moon and slow rotation.