TAG Leaderboard
A benchmark for natural language queries over data
Rank | Method | Execution Accuracy |
---|---|---|
10 | 65 |
What does the TAG leaderboard evaluate?
In this leaderboard, you'll find execution accuracy comparisons of table question answering approaches on TAG-Bench. TAG-Bench contains complex queries requiring world knowledge or semantic reasoning that goes beyond the information explicitly available in the database.
How is accuracy measured?
Execution accuracy is measured as the number of exact matches to our annotated ground truth answers which are hand-labeled by experts.
Citation
@misc{biswal2024text2sqlenoughunifyingai,
title={Text2SQL is Not Enough: Unifying AI and Databases with TAG},
author={Asim Biswal and Liana Patel and Siddarth Jha and Amog Kamsetty and Shu Liu and Joseph E. Gonzalez and Carlos Guestrin and Matei Zaharia},
year=2024,
eprint=2408.14717,
archivePrefix={arXiv},
primaryClass={cs.DB},
url={https://arxiv.org/abs/2408.14717},
}
Ensure the following files are included in your submission:
- output.json: File containing the evaluation outputs generated by your model. Please refer to [] for format instructions.
- requirements.txt: A list of dependencies needed to run your model or script.
- README.md: A detailed description of your submission, including:
- Purpose and overview of the submission.
- Instructions to reproduce the results.
- Any additional notes for evaluators.
- Model/Keys: Upload your models or API keys to Hugging Face if they are not publicly accessible.
Note: Submissions missing any of these materials will not be processed.
- Submissions are accepted once a month to ensure sufficient evaluation bandwidth.
- Plan your submission timeline accordingly to avoid delays.
Follow these steps to upload your materials:
- Compress all files in the code into a single
.zip
file, or provide a public repository to refer to. - Email the
.zip
file or repositoty link to our email tagbenchmark@gmail.com.
After uploading your materials:
- Provide accurate contact information for follow-ups.
- Double-check your materials for completeness to avoid processing delays.
Important: Your submission will be added to the evaluation queue. Depending on the queue size, evaluations may take up to a few weeks.