PUSHPAK-221

🏁 jfbench - Benchmark Japanese Instruction Following Easily

🚀 Getting Started

JFBench is an easy-to-use benchmark suite designed to evaluate Japanese instruction-following performance. It helps you to run various tests on language models using simple scripts.

📥 Download & Install

To download JFBench, please visit the Releases Page.

Here you will find the latest version of the software. Click the link for the release you want, then download the file that suits your operating system.

After downloading, follow these steps to set it up on your machine:

Unzip the Downloaded File: Locate the downloaded file and unzip it to a preferred location on your computer.
Open Your Terminal or Command Prompt: Depending on your operating system, either open Terminal (Mac/Linux) or Command Prompt (Windows).
Navigate to the JFBench Directory: Use the cd command (change directory) to go to the folder where you unzipped JFBench. Example command:
```
cd path/to/jfbench
```

📦 Setup

Next, you’ll need to set up the necessary dependencies. JFBench uses uv for this purpose. To install the dependencies, run:

uv sync

You also need to set an API key for OpenRouter since some parts of JFBench rely on it. To do this, follow these instructions:

Get Your OpenRouter API Key: Sign up at OpenRouter if you don’t have an account and obtain your API key.
Set Your API Key in Your Terminal: Run the following command, replacing your_openrouter_api_key with the key you received:

export OPENROUTER_API_KEY="your_openrouter_api_key"

🛠️ Using JFBench

JFBench includes various scripts located in the src/jfbench directory. Let’s go through how to run a benchmark.

🎯 Benchmark Run: `src/jfbench/benchmark/eval.py`

You can evaluate a model using OpenRouter with the benchmarking script. Here’s how to run it:

Open your Terminal or Command Prompt.
Make sure you are still in the JFBench directory.
Execute the following command:

uv run python src/jfbench/benchmark/eval.py \
  --benchmark "ifbench" \
  --output-dir data/benchmark_results \
  --n-constraints "1,2,4,8" \
  --constraint-set "test" \
  --n-benchmark-data 200 \
  --model-specs-json '[{"provider": "openrouter", "model": "qwen/qwen3-30b-a3b-thinking-2507", "model_short": "Qwen3 30B A3B Thinking 2507"}]'

🌐 Understanding the Options

–benchmark: Specify which benchmark you want to run. The default is "ifbench" but you can choose others if needed.
–output-dir: Determines where the results will be saved. You can change data/benchmark_results to a directory of your choice.
–n-constraints: Set the number of constraints. The example uses "1,2,4,8".
–constraint-set: Sets the group of constraints to use. “test” is the default.
–n-benchmark-data: This sets how much benchmark data you want to use. In the example, 200 items are processed.
–model-specs-json: Describe the model and provider you wish to evaluate in JSON format.

📈 Viewing Results

After running the benchmark, you can locate your results in the directory you specified with the --output-dir option. Open the files in your preferred text editor or viewer to examine the results.

🧑‍🤝‍🧑 Community Support

If you have questions or need help, consider reaching out to the JFBench community via the issue tracker on GitHub. Other users and maintainers can offer guidance and solutions.

🌟 Future Updates

JFBench will continue to evolve. Keep an eye on the Releases Page for new features and improvements as they become available.

Thank you for using JFBench! Enjoy exploring Japanese instruction-following benchmarks with ease.