【LangChainのExample selectorsとは？】使い方を解説

langchainのExample selectorsとは

LangChainのExample selectorsは、複数の例の候補から、プロンプトで提示する例を選択することができる機能です。例えば、質問と回答のパターンを示す例として10個の例の候補を用意していたとき、その10個の候補の中からある基準に基づいて3つの例を選択して、プロンプトに含めるという使い方があります。プロンプトに含める例は、複数の例の候補の中から長さ、関連性、複雑さなどの基準に基づいて選択します。

Example selectorsの使い方

Example selectorsの作成

Example selectorsの基本インターフェースは、BaseExampleSelectorというクラスで次のように定義されています。

class BaseExampleSelector(ABC):
    """Interface for selecting examples to include in prompts."""

    @abstractmethod
    def select_examples(self, input_variables: Dict[str, str]) -> List[dict]:
        """Select which examples to use based on the inputs."""

    @abstractmethod
    def add_example(self, example: Dict[str, str]) -> Any:
        """Add new example to store."""

select_examples：複数の例が書かれたリストの中から適切なものを抽出するメソッド
add_example：例のリストに、新しく例を追加する場合のメソッド

ここでは、BaseExampleSelectorを継承して、任意のExample selectorsのクラスを作成した場合の例を示します。

コード1

from langchain_core.example_selectors.base import BaseExampleSelector

# Example selectorsを使用するには、例のリストを作成する必要があります。これらは通常、入力と出力の例になります。
examples = [
    {"input": "hi", "output": "ciao"},
    {"input": "bye", "output": "arrivaderci"},
    {"input": "soccer", "output": "calcio"},
]

class CustomExampleSelector(BaseExampleSelector):
    def __init__(self, examples):
        self.examples = examples

    def add_example(self, example):
        self.examples.append(example)

    def select_examples(self, input_variables):
        # This assumes knowledge that part of the input will be a 'text' key
        new_word = input_variables["input"]
        new_word_length = len(new_word)

        # Initialize variables to store the best match and its length difference
        best_match = None
        smallest_diff = float("inf")

        # Iterate through each example
        for example in self.examples:
            # Calculate the length difference with the first word of the example
            current_diff = abs(len(example["input"]) - new_word_length)

            # Update the best match if the current one is closer in length
            if current_diff < smallest_diff:
                smallest_diff = current_diff
                best_match = example

        return [best_match]

example_selector = CustomExampleSelector(examples)
example_selector.select_examples({"input": "okay"})

このコードの処理の流れは、以下のようになっています。

BaseExampleSelectorクラスをインポートします。
examplesに、例のリストを定義しています。
BaseExampleSelectorクラスを継承して、任意の形式にカスタマイズしたCustomExampleSelectorクラスを定義しています。単語の長さに基づいてどの例を選択するかを決定するクラスとなっています。

__init__：例のリストであるexamplesを引数で受け取り、受け取ったexamplesの値でself.examplesを初期化するメソッド
add_example：self.examplesに新しく例を追加する場合のメソッド
select_examples：単語の長さに基づいてどの例を選択するかを決定するメソッド
- 引数のinput_variablesに辞書型のデータを受け取り、その辞書で"input"のキーに対応する値をnew_wordに指定した後、new_wordの長さをnew_word_lengthに指定する
- self.examplesの例のリストから1つずつ例を取り出し、各例の"input"キーに対応する値（入力）がexample["input"]である。example["input"]の長さとnew_word_lengthの長さの差の絶対値を計算し、current_diffに代入する。current_diffが最も小さくなる例を選び、出力する。

example_selector = CustomExampleSelector(examples)で、CustomExampleSelectorクラスのインスタンスを作ります。
example_selector.select_examples({"input": "okay"})で、Example selectorsのselect_examplesメソッドを実行しています。実行すると、以下のような結果が出力されます。

[{'input': 'bye', 'output': 'arrivaderci'}]

また、以下のようにadd_exampleメソッドで例を追加した後、select_examplesメソッドを実行したとき、追加した例が最適な例として選択されれば、select_examplesメソッドの実行結果が変わります。

コード2

example_selector.add_example({"input": "hand", "output": "mano"})
example_selector.select_examples({"input": "okay"})

コード2を実行すると、以下のような結果が出力され、コード1を実行した場合と出力結果が変わります。

[{'input': 'hand', 'output': 'mano'}]

Example selectorsをプロンプトで使用する

作成したExample selectorsをプロンプトで使用する場合、以下のようなコードを実行します。

from langchain_core.prompts.few_shot import FewShotPromptTemplate
from langchain_core.prompts.prompt import PromptTemplate

example_prompt = PromptTemplate.from_template("Input: {input} -> Output: {output}")

prompt = FewShotPromptTemplate(
    example_selector=example_selector,
    example_prompt=example_prompt,
    suffix="Input: {input} -> Output:",
    prefix="Translate the following words from English to Italain:",
    input_variables=["input"],
)

print(prompt.format(input="word"))

このコードで、FewShotPromptTemplateクラスの各引数は、次のような目的で使用されます。

example_selector：作成したExample selectorsのインスタンスを指定します
example_prompt：プロンプトテンプレートを受け取ります
suffix：入力変数を受け取って出力フォーマットを定義するために使用されます。
prefix：プロンプトの開始部分に追加されるテキストを指定します。
input_variables：入力変数を指定します

LengthBasedExampleSelector

Example selectorsは、BaseExampleSelectorを継承して、自身で好きなものを作成することもできますが、Langchainのライブラリにあらかじめ用意されているものもあります。

ここでは、Langchainのライブラリにあらかじめ用意されているものとして、LengthBasedExampleSelectorというExample selectorsのクラスを紹介します。LengthBasedExampleSelectorは、長さに基づいて使用する例を選択します。実装コードは以下のようになります。

from langchain.prompts import FewShotPromptTemplate, PromptTemplate
from langchain.prompts.example_selector import LengthBasedExampleSelector

# Examples of a pretend task of creating antonyms.
examples = [
    {"input": "happy", "output": "sad"},
    {"input": "tall", "output": "short"},
    {"input": "energetic", "output": "lethargic"},
    {"input": "sunny", "output": "gloomy"},
    {"input": "windy", "output": "calm"},
]

example_prompt = PromptTemplate(
    input_variables=["input", "output"],
    template="Input: {input}\nOutput: {output}",
)
example_selector = LengthBasedExampleSelector(
    # The examples it has available to choose from.
    examples=examples,
    # The PromptTemplate being used to format the examples.
    example_prompt=example_prompt,
    # The maximum length that the formatted examples should be.
    # Length is measured by the get_text_length function below.
    max_length=25,
    # The function used to get the length of a string, which is used
    # to determine which examples to include. It is commented out because
    # it is provided as a default value if none is specified.
    # get_text_length: Callable[[str], int] = lambda x: len(re.split("\n| ", x))
)
dynamic_prompt = FewShotPromptTemplate(
    # We provide an ExampleSelector instead of examples.
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="Give the antonym of every input",
    suffix="Input: {adjective}\nOutput:",
    input_variables=["adjective"],
)

LengthBasedExampleSelectorクラスを使うと、文字列の長さがmax_lengthで指定した基準を超えない範囲で、できるだけ多くの例を含むように、例を選択します。上のコードでは、LengthBasedExampleSelectorのインスタンスを作成するときに、max_length=25としており、文字列または改行の総数が25個以内の範囲になるように、できるだけ多くの例を選択します。

例えば、以下のように"big"という短い文字列を、入力変数に与えた場合、出力結果1のように含まれる例は多くなります。

print(dynamic_prompt.format(adjective="big"))

出力結果1

Give the antonym of every input

Input: happy
Output: sad

Input: tall
Output: short

Input: energetic
Output: lethargic

Input: sunny
Output: gloomy

Input: windy
Output: calm

Input: big
Output:

しかし、以下のように長い文字列を、入力変数に与えた場合、出力結果2のように含まれる例は少なくなります。

long_string = "big and huge and massive and large and gigantic and tall and much much much much much bigger than everything else"
print(dynamic_prompt.format(adjective=long_string))

出力結果2

Give the antonym of every input

Input: happy
Output: sad

Input: big and huge and massive and large and gigantic and tall and much much much much much bigger than everything else
Output:

SemanticSimilarityExampleSelector

SemanticSimilarityExampleSelectorは、入力との類似性に基づいて例を選択するExample selectorsのクラスです。実装コードは以下のようになります。

from langchain.prompts import FewShotPromptTemplate, PromptTemplate
from langchain.prompts.example_selector import SemanticSimilarityExampleSelector
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

example_prompt = PromptTemplate(
    input_variables=["input", "output"],
    template="Input: {input}\nOutput: {output}",
)

# Examples of a pretend task of creating antonyms.
examples = [
    {"input": "happy", "output": "sad"},
    {"input": "tall", "output": "short"},
    {"input": "energetic", "output": "lethargic"},
    {"input": "sunny", "output": "gloomy"},
    {"input": "windy", "output": "calm"},
]

example_selector = SemanticSimilarityExampleSelector.from_examples(
    # The list of examples available to select from.
    examples,
    # The embedding class used to produce embeddings which are used to measure semantic similarity.
    OpenAIEmbeddings(),
    # The VectorStore class that is used to store the embeddings and do a similarity search over.
    Chroma,
    # The number of examples to produce.
    k=1,
)
similar_prompt = FewShotPromptTemplate(
    # We provide an ExampleSelector instead of examples.
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="Give the antonym of every input",
    suffix="Input: {adjective}\nOutput:",
    input_variables=["adjective"],
)

# Input is a feeling, so should select the happy/sad example
print(similar_prompt.format(adjective="worried"))

このコードでは、入力の"worried"と類似性の高い例が、examplesから選択されます。実行結果は、以下のようになります。"worried"は感情を表す言葉であるため、選択された例は感情を表す{"input": "happy", "output": "sad"}となりました。

Give the antonym of every input

Input: happy
Output: sad

Input: worried
Output:

まとめ

Example selectorsは、LangChainの強力な機能です。提示データの中から適切なものを選択します。

Example selectorsはプロンプトテンプレートを作成するときに使うことができ、PromptTemplateのクラスのexample_selectorという引数に設定することで使用します。

参考文献

備考

LLM を業務で活用したり、自社のサービスに組み込みたくありませんか？Hakky では、AI を用いた企業独自のシステムを構築するご支援を行っています。ソリューションサイト：https://www.about.st-hakky.com/chatgpt-solution

「どんなことが出来るのか」「いま考えているこのようなことは本当に実現可能か」など、ご検討段階でも構いませんので、ぜひお気軽にフォームよりお問い合わせくださいませ。