aboutblogreadrunczigcpugpumlos

Automatic differentiation


Backpropagation


Neural networks


https://numpy.org/doc/stable/reference/generated/numpy.matmul.html

z1=W1x+b1A1=σ(z1)z2=W2A1+b2y=σ(z2)L=(y^y)2 z_1 = \bold W_1 \bold x + \bold b_1 \\ A_1 = \sigma (z_1) \\ z_2 = \bold W_2 \bold A_1 + \bold b_2 \\ y = \sigma (z_2) \\ L = (\hat y - y)^2

Pytorch


Resources


Top


Other


Research papers


The list https://punkx.org/jackdoe/30.html (supposedly) given to John Carmack by Ilya Sutskever. And other papers:

References


Shojaee, P., Mirzadeh, I., Alizadeh, K., Horton, M., Bengio, S. & Farajtabar, M. (2025). The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity. https://ml-site.cdn-apple.com/papers/the-illusion-of-thinking.pdf

Nasr, M., Carlini, N., Hayase, J., Jagielski, M., Cooper, A.F., Ippolito, D., Choquette-Choo, C.A., Wallace, E., Tramèr, F. & Lee, K. (2023). Scalable Extraction of Training Data from (Production) Language Models. http://arxiv.org/abs/2311.17035

Schmidt, R.M. (2019). Recurrent Neural Networks (RNNs): A gentle Introduction and Overview. http://arxiv.org/abs/1912.05911

Borgeaud, S., Mensch, A., Hoffmann, J., Cai, T., Rutherford, E., Millican, K., Driessche, G.v.d., Lespiau, J., Damoc, B., Clark, A., Casas, D.d.L., Guy, A., Menick, J., Ring, R., Hennigan, T., Huang, S., Maggiore, L., Jones, C., Cassirer, A., Brock, A., Paganini, M., Irving, G., Vinyals, O., Osindero, S., Simonyan, K., Rae, J.W., Elsen, E. & Sifre, L. (2021). Improving language models by retrieving from trillions of tokens. http://arxiv.org/abs/2112.04426

Gao, L., Biderman, S., Black, S., Golding, L., Hoppe, T., Foster, C., Phang, J., He, H., Thite, A., Nabeshima, N., Presser, S. & Leahy, C. (2020). The Pile: An 800GB Dataset of Diverse Text for Language Modeling. http://arxiv.org/abs/2101.00027

Zhai, S., Talbott, W., Srivastava, N., Huang, C., Goh, H., Zhang, R. & Susskind, J. (2021). An Attention Free Transformer. http://arxiv.org/abs/2105.14103

Chukewad, Y.M., James, J., Singh, A. & Fuller, S. (2020). RoboFly: An insect-sized robot with simplified fabrication that is capable of flight, ground, and water surface locomotion. http://arxiv.org/abs/2001.02320

Perez, E., Strub, F., de Vries, H., Dumoulin, V. & Courville, A. (2017). FiLM: Visual Reasoning with a General Conditioning Layer. http://arxiv.org/abs/1709.07871

Brohan, A., Brown, N., Carbajal, J., Chebotar, Y., Dabis, J., Finn, C., Gopalakrishnan, K., Hausman, K., Herzog, A., Hsu, J., Ibarz, J., Ichter, B., Irpan, A., Jackson, T., Jesmonth, S., Joshi, N.J., Julian, R., Kalashnikov, D., Kuang, Y., Leal, I., Lee, K., Levine, S., Lu, Y., Malla, U., Manjunath, D., Mordatch, I., Nachum, O., Parada, C., Peralta, J., Perez, E., Pertsch, K., Quiambao, J., Rao, K., Ryoo, M., Salazar, G., Sanketi, P., Sayed, K., Singh, J., Sontakke, S., Stone, A., Tan, C., Tran, H., Vanhoucke, V., Vega, S., Vuong, Q., Xia, F., Xiao, T., Xu, P., Xu, S., Yu, T. & Zitkovich, B. (2022). RT-1: Robotics Transformer for Real-World Control at Scale. http://arxiv.org/abs/2212.06817

Kendall, A., Hawke, J., Janz, D., Mazur, P., Reda, D., Allen, J., Lam, V., Bewley, A. & Shah, A. (2018). Learning to Drive in a Day. http://arxiv.org/abs/1807.00412

Kirsch, L. & Schmidhuber, J. (2022). Meta Learning Backpropagation And Improving It. http://arxiv.org/abs/2012.14905

Smith, L., Kostrikov, I. & Levine, S. (2022). A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning. http://arxiv.org/abs/2208.07860

Hafner, D., Pasukonis, J., Ba, J. & Lillicrap, T. (2023). Mastering Diverse Domains through World Models. http://arxiv.org/abs/2301.04104

Gozalo-Brizuela, R. & Garrido-Merchan, E.C. (2023). ChatGPT is not all you need. A State of the Art Review of large Generative AI models. http://arxiv.org/abs/2301.04655

Millidge, B., Tschantz, A. & Buckley, C.L. (2020). Predictive Coding Approximates Backprop along Arbitrary Computation Graphs. http://arxiv.org/abs/2006.04182

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C.L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J. & Lowe, R. (2022). Training language models to follow instructions with human feedback. http://arxiv.org/abs/2203.02155

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L. & Polosukhin, I. (2017). Attention Is All You Need. http://arxiv.org/abs/1706.03762

Hafner, D., Lillicrap, T., Ba, J. & Norouzi, M. (2020). Dream to Control: Learning Behaviors by Latent Imagination. http://arxiv.org/abs/1912.01603

Hinton, G. (2022). The Forward-Forward Algorithm: Some Preliminary Investigations. http://arxiv.org/abs/2212.13345

Solomitckii, D., Li, Q.C., Balercia, T., da Silva, C.R.C.M., Talwar, S., Andreev, S. & Koucheryavy, Y. (2016). Characterizing the Impact of Diffuse Scattering in Urban Millimeter-Wave Deployments. #

Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L. & Chen, W. (2021). LoRA: Low-Rank Adaptation of Large Language Models. http://arxiv.org/abs/2106.09685

Seita, D. & Song, X.G.A.G.H.L.E.W.P.A.S.L.D. (Accessed: 2023). Koala: A Dialogue Model for Academic Research. http://bair.berkeley.edu/blog/2023/04/03/koala/

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J. & Houlsby, N. (2021). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. http://arxiv.org/abs/2010.11929

Birhane, A. & McGann, M. (2024). Large Models of What? Mistaking Engineering Achievements for Human Linguistic Agency. http://arxiv.org/abs/2407.08790

Muennighoff, N., Yang, Z., Shi, W., Li, X.L., Fei-Fei, L., Hajishirzi, H., Zettlemoyer, L., Liang, P., Candès, E. & Hashimoto, T. (2025). s1: Simple test-time scaling. http://arxiv.org/abs/2501.19393

DeepSeek-AI, , Guo, D., Yang, D., Zhang, H., Song, J., Zhang, R., Xu, R., Zhu, Q., Ma, S., Wang, P., Bi, X., Zhang, X., Yu, X., Wu, Y., Wu, Z.F., Gou, Z., Shao, Z., Li, Z., Gao, Z., Liu, A., Xue, B., Wang, B., Wu, B., Feng, B., Lu, C., Zhao, C., Deng, C., Zhang, C., Ruan, C., Dai, D., Chen, D., Ji, D., Li, E., Lin, F., Dai, F., Luo, F., Hao, G., Chen, G., Li, G., Zhang, H., Bao, H., Xu, H., Wang, H., Ding, H., Xin, H., Gao, H., Qu, H., Li, H., Guo, J., Li, J., Wang, J., Chen, J., Yuan, J., Qiu, J., Li, J., Cai, J.L., Ni, J., Liang, J., Chen, J., Dong, K., Hu, K., Gao, K., Guan, K., Huang, K., Yu, K., Wang, L., Zhang, L., Zhao, L., Wang, L., Zhang, L., Xu, L., Xia, L., Zhang, M., Zhang, M., Tang, M., Li, M., Wang, M., Li, M., Tian, N., Huang, P., Zhang, P., Wang, Q., Chen, Q., Du, Q., Ge, R., Zhang, R., Pan, R., Wang, R., Chen, R.J., Jin, R.L., Chen, R., Lu, S., Zhou, S., Chen, S., Ye, S., Wang, S., Yu, S., Zhou, S., Pan, S., Li, S.S., Zhou, S., Wu, S., Ye, S., Yun, T., Pei, T., Sun, T., Wang, T., Zeng, W., Zhao, W., Liu, W., Liang, W., Gao, W., Yu, W., Zhang, W., Xiao, W.L., An, W., Liu, X., Wang, X., Chen, X., Nie, X., Cheng, X., Liu, X., Xie, X., Liu, X., Yang, X., Li, X., Su, X., Lin, X., Li, X.Q., Jin, X., Shen, X., Chen, X., Sun, X., Wang, X., Song, X., Zhou, X., Wang, X., Shan, X., Li, Y.K., Wang, Y.Q., Wei, Y.X., Zhang, Y., Xu, Y., Li, Y., Zhao, Y., Sun, Y., Wang, Y., Yu, Y., Zhang, Y., Shi, Y., Xiong, Y., He, Y., Piao, Y., Wang, Y., Tan, Y., Ma, Y., Liu, Y., Guo, Y., Ou, Y., Wang, Y., Gong, Y., Zou, Y., He, Y., Xiong, Y., Luo, Y., You, Y., Liu, Y., Zhou, Y., Zhu, Y.X., Xu, Y., Huang, Y., Li, Y., Zheng, Y., Zhu, Y., Ma, Y., Tang, Y., Zha, Y., Yan, Y., Ren, Z.Z., Ren, Z., Sha, Z., Fu, Z., Xu, Z., Xie, Z., Zhang, Z., Hao, Z., Ma, Z., Yan, Z., Wu, Z., Gu, Z., Zhu, Z., Liu, Z., Li, Z., Xie, Z., Song, Z., Pan, Z., Huang, Z., Xu, Z., Zhang, Z. & Zhang, Z. (2025). DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. http://arxiv.org/abs/2501.12948

Zhou, C., Liu, P., Xu, P., Iyer, S., Sun, J., Mao, Y., Ma, X., Efrat, A., Yu, P., Yu, L., Zhang, S., Ghosh, G., Lewis, M., Zettlemoyer, L. & Levy, O. (2023). LIMA: Less Is More for Alignment. http://arxiv.org/abs/2305.11206

Cho, K. (2015). Natural Language Understanding with Distributed Representation. http://arxiv.org/abs/1511.07916

Goldberg, Y. (2015). A Primer on Neural Network Models for Natural Language Processing. http://arxiv.org/abs/1510.00726

Fein-Ashley, J. (2025). The FFT Strikes Back: An Efficient Alternative to Self-Attention. http://arxiv.org/abs/2502.18394

Gandhi, K., Chakravarthy, A., Singh, A., Lile, N. & Goodman, N.D. (2025). Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs. http://arxiv.org/abs/2503.01307

Wilson, A.G. (2025). Deep Learning is Not So Mysterious or Different. http://arxiv.org/abs/2503.02113

Zhu, J., Chen, X., He, K., LeCun, Y. & Liu, Z. (2025). Transformers without Normalization. http://arxiv.org/abs/2503.10622

Singh, S., Nan, Y., Wang, A., D'Souza, D., Kapoor, S., Üstün, A., Koyejo, S., Deng, Y., Longpre, S., Smith, N., Ermis, B., Fadaee, M. & Hooker, S. (2025). The Leaderboard Illusion. http://arxiv.org/abs/2504.20879

Darlow, L., Regan, C., Risi, S., Seely, J. & Jones, L. (2025). Continuous Thought Machines. http://arxiv.org/abs/2505.05522

Zhao, A., Wu, Y., Yue, Y., Wu, T., Xu, Q., Yue, Y., Lin, M., Wang, S., Wu, Q., Zheng, Z. & Huang, G. (2025). Absolute Zero: Reinforced Self-play Reasoning with Zero Data. http://arxiv.org/abs/2505.03335

Jaghouar, S., Mattern, J., Ong, J.M., Straube, J., Basra, M., Pazdera, A., Ferrante, M.D., Thaman, K., Gabriel, F., Obeid, F., Erdem, K., Keiblinger, M. & Hagemann, J. (). INTELLECT-2: A Reasoning Model Trained Through Globally Decentralized Reinforcement Learning. #

Laban, P., Hayashi, H., Zhou, Y. & Neville, J. (2025). LLMs Get Lost In Multi-Turn Conversation. http://arxiv.org/abs/2505.06120

Jha, R., Zhang, C., Shmatikov, V. & Morris, J.X. (2025). Harnessing the Universal Geometry of Embeddings. http://arxiv.org/abs/2505.12540