Fine-tuned Model Evaluation Report¶

This report compares multiple models on:

  • Syntactic Validity (%): queries passing custom syntax validation
  • Execution Success (%): queries executing without errors
  • Output Accuracy (%): queries returning the correct answer
  • Average Latency (ms): average time to generate output
  • Token Usage: average input/output tokens per query

Notes:

  • Latency for Qwen models is constrained by GPU inference, there is room for optimization.
  • Display names are normalized for readability. Raw model IDs remain available in the underlying data.

The sections below are:

  1. Accuracy metrics across models
  2. Qwen3-4B: before vs after fine-tuning
  3. Overall output accuracy
  4. Latency by model
  5. Token usage per query
  6. Accuracy vs latency trade-off
  7. Model links

Highlights:

  • Fine tuned SLMs gave 2x improvement in output accuracy over general, more powerful Gemini models, while also being faster.
  • Qwen3-4B (ft-oncograph) displayed 100% syntactic validity and execution success.
  • Gemini-2.5-Flash showed slightly worse output accuracy than Gemini-2.0-Flash, despite taking more time.

1) Accuracy metrics across models¶

No description has been provided for this image

2) Qwen3-4B: before vs after fine-tuning¶

No description has been provided for this image

3) Overall output accuracy¶

No description has been provided for this image

4) Latency across models¶

Note: Latency can be bottlenecked by non-optimized inference (e.g., free tiers), and may not represent best-case performance for a given model.

No description has been provided for this image

5) Token usage per query¶

Gemini models require significantly more input tokens to cover all schema details, edge cases, few shot examples, etc. This can lead to much higher costs.

Fine tuned models have this knowledge internalized.

No description has been provided for this image

6) Accuracy vs latency trade-off¶

The most desirable region is top-left (higher accuracy, lower latency).

No description has been provided for this image

7) Model links¶

Model Name Model ID Link
Qwen3-4B (ft-oncograph) qwen3-4b-ft-oncograph Hugging Face
Qwen3-4B (base) Qwen3-4B-Instruct-2507-unsloth-bnb-4bit Hugging Face
Qwen3-1.7B (ft-oncograph) qwen3-1.7b-ft-oncograph Hugging Face
Qwen3-1.7B (base) Qwen3-1.7B-Instruct-2507-unsloth-bnb-4bit Hugging Face
Gemini 2.0 Flash gemini-2.0-flash Google Cloud
Gemini 2.5 Flash gemini-2.5-flash Google Cloud
Gemini 2.5 Flash Lite gemini-2.5-flash-lite Google Cloud