#machinelearning

Machine learning

The process of converting words (or parts of a word) into numbers in order to train a machine learning algorithm.

Normally we don't train our own tokenizer

ChatGPT uses by pair encoding written in Rust