20) Mixture of Experts Balancing Techniques Auxiliary Loss Load Balancing Capacity Factor5просмотров15 дней назад
15) All about Sinusoidal Positional Encodings What’s with the weird sin-cos formula1просмотр15 дней назад
14) Integer and Binary Positional Encodings Journey towards Rotary Positional Encodings (RoPE)3просмотра16 дней назад
12) Multi-Head Latent Attention From Scratch One of the major DeepSeek innovation6просмотров16 дней назад
11) Understand Grouped Query Attention (GQA) The final frontier before latent attention7просмотров16 дней назад
10) Multi-Query Attention Explained Dealing with KV Cache Memory Issues Part 14просмотра16 дней назад