-
Notifications
You must be signed in to change notification settings - Fork 648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot generate TestDataset more than once #848
Comments
AFAIK, the |
plus 1 on what @njcolvin said. The "evolutionary process" (details here) ragas uses for testset generation reduces the number of actual contexts you can use to generate question. the it also depends on the LLMs you use too how many documents are there in I can also spend some time with you on a call to debug further if you like 🙂 |
Hi @jjmachan I appreciate the response. In the above example I believe I used Biden's state of the union for You might be right about the Thank you for offering to debug this with me. I am interested, and will follow up as soon as I can. |
I have the same issue. I will appreciate it if you udate. |
fixes #848 This pull request introduces updates to the TestsetGenerator and ComplexEvolution classes within the generator.py and evolutions.py files respectively. The primary focus of these changes is to improve the handling of new documents during test dataset generation, ensuring that data from previous documents does not interfere with the generation process. ### Before ![Screenshot 2024-05-31 183725 before](https://github.com/explodinggradients/ragas/assets/77217074/dcc24e81-2e35-4e20-8638-2904d6779225) Here row 2 and 4 are generated from the document used for the previous generation ### After ![Screenshot 2024-05-31 184647 after](https://github.com/explodinggradients/ragas/assets/77217074/582afeeb-9ade-4d35-891c-af54af21c859) Here all rows generated are based on the new document provided
[x] I have checked the documentation and related resources and couldn't resolve my bug.
Describe the bug
Calling TestsetGenerator.generate_with_langchain_docs() more than once results in documents from the first call being used in subsequent calls.
Ragas version: 0.1.5
Python version: 3.11
Code to Reproduce
Expected behavior
I expect testset2 to contain a question about the .txt file in your-directory-2, however it contains a question about the .txt file in your-directory.
Additional context
None.
The text was updated successfully, but these errors were encountered: