1from labml.configs import BaseConfigs
This defines configurations for a transformer. The configurations are calculate using option functions. These are lazy loaded and therefore only the necessary modules are calculated.
4class RWKVConfigs(BaseConfigs):
Number of attention heads
14 n_heads: int = 8
Transformer embedding size
16 d_model: int = 512
Number of layers
18 n_layers: int = 6
Dropout probability
20 dropout: float = 0.1
Number of tokens in the source vocabulary (for token embeddings)
22 n_src_vocab: int
Number of tokens in the target vocabulary (to generate logits for prediction)
24 n_tgt_vocab: int