You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a self-created data set with questions, GT, contexts and answers and have started the evaluation with the RAGAS evaluate() method. The percentage increases a few points and then stops. Then I get several identical errors. Then the evaluation continues and freezes again after a few minutes.
Ragas version: 0.1.13
Python version: 3.10.13
Code to Reproduce
import pandas as pd
from datasets import Dataset
from ragas import evaluate
from ragas.metrics import (
answer_relevancy,
faithfulness,
context_recall,
context_precision,
context_utilization,
context_entity_recall,
answer_correctness,
answer_similarity,
)
from ragas.metrics.critique import harmfulness
from dotenv import load_dotenv
load_dotenv()
df = pd.read_excel('testset_full-200-ES.xlsx')
# Create the 'data_samples' dictionary structure
data_samples = {
'question': df['question'].tolist(),
'answer': df['answer'].tolist(),
'contexts': df['contexts'].apply(lambda x: [x] if pd.notna(x) else []).tolist(),
'ground_truth': df['ground_truth'].tolist()
}
dataset = Dataset.from_dict(data_samples)
result = evaluate(
dataset,
metrics=[
context_precision,
faithfulness,
answer_relevancy,
context_recall,
context_utilization,
context_entity_recall,
answer_correctness,
answer_similarity,
harmfulness,
],
)
df = result.to_pandas()
print(result)
# save evaluation results to csv
df.to_csv('results_es.csv', index=False)
Error trace
Evaluating: 11%|█ | 1709/15453 [14:40<330:40:17, 86.61s/it]Exception raised in Job[13246]: TimeoutError()
Exception raised in Job[13243]: TimeoutError()
Exception raised in Job[5118]: TimeoutError()
Exception raised in Job[13245]: TimeoutError()
Exception raised in Job[5125]: TimeoutError()
Exception raised in Job[13241]: TimeoutError()
Exception raised in Job[5126]: TimeoutError()
Exception raised in Job[13227]: TimeoutError()
Exception raised in Job[5122]: TimeoutError()
Exception raised in Job[13244]: TimeoutError()
Exception raised in Job[13240]: TimeoutError()
Exception raised in Job[13236]: TimeoutError()
Exception raised in Job[5113]: TimeoutError()
Exception raised in Job[5124]: TimeoutError()
Exception raised in Job[14126]: TimeoutError()
Evaluating: 13%|█▎ | 2063/15453 [33:06<750:37:32, 201.81s/it]Exception raised in Job[8734]: TimeoutError()
Exception raised in Job[8731]: TimeoutError()
Exception raised in Job[4239]: TimeoutError()
Exception raised in Job[8727]: TimeoutError()
Exception raised in Job[1626]: TimeoutError()
Exception raised in Job[12353]: TimeoutError()
Exception raised in Job[4231]: TimeoutError()
Exception raised in Job[12354]: TimeoutError()
Exception raised in Job[4235]: TimeoutError()
Exception raised in Job[12358]: TimeoutError()
Exception raised in Job[4236]: TimeoutError()
Exception raised in Job[8732]: TimeoutError()
Exception raised in Job[1625]: TimeoutError()
Exception raised in Job[1630]: TimeoutError()
Exception raised in Job[1617]: TimeoutError()
The text was updated successfully, but these errors were encountered:
I found some similar issues that might help you resolve the TimeoutError when using the evaluate() function in the ragas library.
Timeout Configuration:
You need to set the timeout parameter in the RunConfig and ensure it is passed to the evaluation function. Here is an example of how to do it:
fromragasimportevaluate, RunConfigfromragas.metricsimport (
answer_relevancy,
faithfulness,
context_recall,
context_precision,
context_utilization,
context_entity_recall,
answer_correctness,
answer_similarity,
)
fromragas.metrics.critiqueimportharmfulnessfromdatasetsimportDatasetimportpandasaspd# Load your datasetdf=pd.read_excel('testset_full-200-ES.xlsx')
data_samples= {
'question': df['question'].tolist(),
'answer': df['answer'].tolist(),
'contexts': df['contexts'].apply(lambdax: [x] ifpd.notna(x) else []).tolist(),
'ground_truth': df['ground_truth'].tolist()
}
dataset=Dataset.from_dict(data_samples)
# Create a RunConfig with a specified timeoutrun_config=RunConfig(timeout=120) # Set timeout to 120 seconds# Use the evaluate function with the custom run_configresult=evaluate(
dataset,
metrics=[
context_precision,
faithfulness,
answer_relevancy,
context_recall,
context_utilization,
context_entity_recall,
answer_correctness,
answer_similarity,
harmfulness,
],
run_config=run_config
)
df=result.to_pandas()
print(result)
df.to_csv('results_es.csv', index=False)
Known Issues and Limitations:
There are known issues with using certain configurations in the ragas library, such as the lack of support for asynchronous callbacks and specific configurations required for different types of models [1].
Debugging and Logging:
Enable debugging and logging options to diagnose the TimeoutError. Set the log_tenacity parameter in RunConfig to True and enable the RAGAS_DEBUG environment variable:
I have a self-created data set with questions, GT, contexts and answers and have started the evaluation with the RAGAS evaluate() method. The percentage increases a few points and then stops. Then I get several identical errors. Then the evaluation continues and freezes again after a few minutes.
Ragas version: 0.1.13
Python version: 3.10.13
Code to Reproduce
Error trace
The text was updated successfully, but these errors were encountered: