Hello, world
This post exists to exercise the rendering pipeline. Replace it with real content before launch.
Code
import torch
import torch.nn.functional as F
def scaled_dot_product_attention(q, k, v, mask=None):
scale = q.size(-1) ** -0.5
scores = (q @ k.transpose(-2, -1)) * scale
if mask is not None:
scores = scores.masked_fill(mask == 0, float("-inf"))
return F.softmax(scores, dim=-1) @ v
Math
The softmax over a vector is:
Inline math also works: .
MDX
MDX lets us drop components inline later — interactive demos, charts, tensor visualizers.