If you are building a custom AI, you run it against these 164 problems to see its "Pass@k" score (the probability that at least one of the generated code samples passes the unit tests).
This dataset is a benchmark created by OpenAI to test "code generation" capabilities. It consists of 164 Python programming tasks that include: Download 164K txt
As a set of clean, verified coding challenges for practice. How to Access It If you are building a custom AI, you
The name and parameters of the code to be written. Docstrings: A text description of what the code should do. Download 164K txt