Fine-tuned Model Evaluation Report¶

This report compares multiple models on:

Syntactic Validity (%): queries passing custom syntax validation
Execution Success (%): queries executing without errors
Output Accuracy (%): queries returning the correct answer
Average Latency (ms): average time to generate output
Token Usage: average input/output tokens per query

Notes:

Latency for Qwen models is constrained by GPU inference, there is room for optimization.
Display names are normalized for readability. Raw model IDs remain available in the underlying data.

The sections below are:

Accuracy metrics across models
Qwen3-4B: before vs after fine-tuning
Overall output accuracy
Latency by model
Token usage per query
Accuracy vs latency trade-off
Model links

Highlights:

Fine tuned SLMs gave 2x improvement in output accuracy over general, more powerful Gemini models, while also being faster.
Qwen3-4B (ft-oncograph) displayed 100% syntactic validity and execution success.
Gemini-2.5-Flash showed slightly worse output accuracy than Gemini-2.0-Flash, despite taking more time.

1) Accuracy metrics across models¶

No description has been provided for this image

2) Qwen3-4B: before vs after fine-tuning¶

3) Overall output accuracy¶

4) Latency across models¶

Note: Latency can be bottlenecked by non-optimized inference (e.g., free tiers), and may not represent best-case performance for a given model.

5) Token usage per query¶

Gemini models require significantly more input tokens to cover all schema details, edge cases, few shot examples, etc. This can lead to much higher costs.

Fine tuned models have this knowledge internalized.

6) Accuracy vs latency trade-off¶

The most desirable region is top-left (higher accuracy, lower latency).

7) Model links¶

Model Name	Model ID	Link
Qwen3-4B (ft-oncograph)	qwen3-4b-ft-oncograph	Hugging Face
Qwen3-4B (base)	Qwen3-4B-Instruct-2507-unsloth-bnb-4bit	Hugging Face
Qwen3-1.7B (ft-oncograph)	qwen3-1.7b-ft-oncograph	Hugging Face
Qwen3-1.7B (base)	Qwen3-1.7B-Instruct-2507-unsloth-bnb-4bit	Hugging Face
Gemini 2.0 Flash	gemini-2.0-flash	Google Cloud
Gemini 2.5 Flash	gemini-2.5-flash	Google Cloud
Gemini 2.5 Flash Lite	gemini-2.5-flash-lite	Google Cloud