Fine-tuned Model Evaluation Report¶
This report compares multiple models on:
- Syntactic Validity (%): queries passing custom syntax validation
- Execution Success (%): queries executing without errors
- Output Accuracy (%): queries returning the correct answer
- Average Latency (ms): average time to generate output
- Token Usage: average input/output tokens per query
Notes:
- Latency for Qwen models is constrained by GPU inference, there is room for optimization.
- Display names are normalized for readability. Raw model IDs remain available in the underlying data.
The sections below are:
- Accuracy metrics across models
- Qwen3-4B: before vs after fine-tuning
- Overall output accuracy
- Latency by model
- Token usage per query
- Accuracy vs latency trade-off
- Model links
Highlights:
- Fine tuned SLMs gave 2x improvement in output accuracy over general, more powerful Gemini models, while also being faster.
- Qwen3-4B (ft-oncograph) displayed 100% syntactic validity and execution success.
- Gemini-2.5-Flash showed slightly worse output accuracy than Gemini-2.0-Flash, despite taking more time.
1) Accuracy metrics across models¶
2) Qwen3-4B: before vs after fine-tuning¶
3) Overall output accuracy¶
4) Latency across models¶
Note: Latency can be bottlenecked by non-optimized inference (e.g., free tiers), and may not represent best-case performance for a given model.
5) Token usage per query¶
Gemini models require significantly more input tokens to cover all schema details, edge cases, few shot examples, etc. This can lead to much higher costs.
Fine tuned models have this knowledge internalized.
6) Accuracy vs latency trade-off¶
The most desirable region is top-left (higher accuracy, lower latency).
7) Model links¶
| Model Name | Model ID | Link |
|---|---|---|
| Qwen3-4B (ft-oncograph) | qwen3-4b-ft-oncograph | Hugging Face |
| Qwen3-4B (base) | Qwen3-4B-Instruct-2507-unsloth-bnb-4bit | Hugging Face |
| Qwen3-1.7B (ft-oncograph) | qwen3-1.7b-ft-oncograph | Hugging Face |
| Qwen3-1.7B (base) | Qwen3-1.7B-Instruct-2507-unsloth-bnb-4bit | Hugging Face |
| Gemini 2.0 Flash | gemini-2.0-flash | Google Cloud |
| Gemini 2.5 Flash | gemini-2.5-flash | Google Cloud |
| Gemini 2.5 Flash Lite | gemini-2.5-flash-lite | Google Cloud |