Analyzing...
- Your file contains 8 prompt-completion pairs. In general, we recommend having at least a few hundred examples. We've found that performance tends to linearly increase for every doubling of the number of examples
- More than a third of your `completion` column/key is uppercase. Uppercase completions tends to perform worse than a mixture of case encountered in normal language. We recommend to lower case the data if that makes sense in your domain. See https://platform.openai.com/docs/guides/fine-tuning/preparing-your-dataset for more details
- All prompts end with suffix `\n---`
- All prompts start with prefix `description: `. Fine-tuning doesn't require the instruction specifying the task, or a few-shot example scenario. Most of the time you should only add the input data into the prompt, and the desired output into the completion
- Your data does not contain a common ending at the end of your completions. Having a common ending string appended to the end of the completion makes it clearer to the fine-tuned model where the completion should end. See https://platform.openai.com/docs/guides/fine-tuning/preparing-your-dataset for more detail and examples.
- The completion should start with a whitespace character (` `). This tends to produce better results due to the tokenization we use. See https://platform.openai.com/docs/guides/fine-tuning/preparing-your-dataset for more details
Based on the analysis we will perform the following actions:
- [Recommended] Lowercase all your data in column/key `completion` [Y/n]: n
- [Recommended] Remove prefix `description: ` from all prompts [Y/n]: n
- [Recommended] Add a suffix ending ` END` to all completions [Y/n]: n
- [Recommended] Add a whitespace character to the beginning of the completion [Y/n]: n
You can use your file for fine-tuning:
> openai api fine_tunes.create -t "handbook.jsonl"
After you’ve fine-tuned a model, remember that your prompt has to end with the indicator string `\n---` for the model to start generating completions, rather than continuing with the prompt.
Once your model starts training, it'll approximately take 2.55 minutes to train a `curie` model, and less for `ada` and `babbage`. Queue will approximately take half an hour per job ahead of you.
一方、以下の制限があります。
回答は 2048 トークン以下
学習
以下のコマンドでモデルの学習を行えます。データ量に因りますが、基本的には数時間で終わります。
openai api fine_tunes.create -t <TRAIN_FILE_ID_OR_PATH> -m <BASE_MODEL>