Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of Noise sensitivity metrics from RAGChecker #1190

Merged

Conversation

sahusiddharth
Copy link
Contributor

@sahusiddharth sahusiddharth commented Aug 12, 2024

Solves:

Input

from datasets import Dataset 
from ragas.metrics import noise_sensitivity_relevant, noise_sensitivity_irrelevant
from ragas import evaluate
data_sample = {
    "question": ["What is the Life Insurance Corporation of India (LIC) known for?"],
    "ground_truth": ["The Life Insurance Corporation of India (LIC) is the largest insurance company in India, established in 1956 through the nationalization of the insurance industry. It is known for managing a large portfolio of investments."],
    "answer": ["The Life Insurance Corporation of India (LIC) is the largest insurance company in India, known for its vast portfolio of investments. LIC contributs to the financial stability of the country."],
    "contexts": [["The Life Insurance Corporation of India (LIC) was established in 1956 following the nationalization of the insurance industry in India.",
        "LIC is the largest insurance company in India, with a vast network of policyholders and a huge investments.",
        "As the largest institutional investor in India, LIC manages a substantial funds, contributing to the financial stability of the country.",
        "The Indian economy is one of the fastest-growing major economies in the world, thanks to the secors like finance, technology, manufacturing etc"]]
}


dataset = Dataset.from_dict(data_sample)
metrics = [noise_sensitivity_relevant, noise_sensitivity_irrelevant]
score = evaluate(dataset,metrics=metrics)
score.to_pandas()

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Aug 12, 2024
@shahules786 shahules786 self-requested a review August 13, 2024 04:04
Copy link
Member

@shahules786 shahules786 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, great work. @sahusiddharth
Two more suggestions

  1. Can you fix the conflict?
  2. Can you also add a documentation associated with noise sensitivity?
    TBH understanding the use and internals of this is little overwhelming, users will find it hard to adopt it if they can't understand it.

src/ragas/metrics/_noise_sensitivity.py Outdated Show resolved Hide resolved
src/ragas/metrics/_noise_sensitivity.py Outdated Show resolved Hide resolved
@sahusiddharth
Copy link
Contributor Author

Could you please provide some suggestions on the types of documentation that might be required? Your input would be greatly appreciated.

@shahules786
Copy link
Member

Hey @sahusiddharth I did see this doc written by one of author of ragchecker, I have also emailed him asking for an intuitive explanation of noise sensitivity (let's wait for few hours and see if he replies). Otherwise, it would be nice to follow how the format in metrics as here https://docs.ragas.io/en/stable/concepts/metrics/context_recall.html

@sahusiddharth
Copy link
Contributor Author

Hi @shahules786,

Got it. Please let me know when you hear back from the author. In the meantime, I’ll check the metrics format you mentioned.

Thanks!

Copy link
Member

@shahules786 shahules786 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @sahusiddharth The changes you did looks good. I also did some corrections on top of that. I think it's best to move forward and do the doc now - it the author reacts later we can get his feedback on it later. Can you prepare the docs to go with this?

It needs 3 sections:
A brief intuitive description of metrics
an example
how it's calculated using the example

@sahusiddharth
Copy link
Contributor Author

Hi @shahules786,

I wanted to clarify something regarding the noise sensitivity implementation. There are two types: one for when relevant context is retrieved and another for when irrelevant context is retrieved. The current implementation only addresses the relevant context.

Would you prefer that I add the handling for irrelevant context first, or should I complete the documentation for the basic implementation before proceeding with the additional functionality?

@shahules786
Copy link
Member

@sahusiddharth I did notice that, thought that using relevant might be more useful but now I think if the user has the ability to switch b/w both using an argument that would be better. Can you modify it to include that behavior? Then we can add documentation for both in same page.

@sahusiddharth
Copy link
Contributor Author

I’m happy to make the necessary modifications. I would appreciate some additional guidance on how best to return the results when asked for both. Since the output could be returned in multiple types of data, such as dictionaries, named tuples, or tuples, I’m considering the most appropriate format.

{'noise_sensitivity_relevant': 0.0, 'noise_sensitivity_irrelevant': 0.0}

I was thinking to return only the number when asked for a specific one.

Could you please advise on the preferred format for returning these results?

@shahules786
Copy link
Member

shahules786 commented Aug 14, 2024

@sahusiddharth Thanks for asking that. I think both should not be an option - it would be one of them 'relevant' or 'irrelevant'. By default, it should stay as 'relevant'.
In upcoming versions, we will introduce caching so avoid llm recalculating the same intermediate results as in this case if someone wants both they have two make two calls.

Make sure that when you write the doc give credit to Ragchecker by citing the work.

@sahusiddharth
Copy link
Contributor Author

@shahules786, I'm almost done with the documentation, to properly show the power of noise sensitivity, the example is getting long, Do you have a problem with that?

@shahules786
Copy link
Member

@sahusiddharth We can refine it later , but I also show some basic examples here

@sahusiddharth
Copy link
Contributor Author

@shahules786, Have gone through them, I didn't liked it that much the answers generated by llm is rarely using the information provided in the context and I didn't find it intuitive enough.

@shahules786
Copy link
Member

@sahusiddharth I agree, their dataset generation is naive.

@sahusiddharth
Copy link
Contributor Author

@shahules786, tried returning tuple when given we want noise sensitivity for both relevant and irrelevant the make-ci was giving me error.

ragas/src/ragas/metrics/_noise_sensitivity.py:208:15 - error: Method "_ascore" overrides class "Metric" in an incompatible manner
    Return type mismatch: base method returns type "Coroutine[Any, Any, float]", override returns type "Coroutine[Any, Any, Coroutine[Any, Any, float | Tuple[float, float]]]"
      "Coroutine[Any, Any, Coroutine[Any, Any, float | Tuple[float, float]]]" is incompatible with "Coroutine[Any, Any, float]"
        Type parameter "_ReturnT_co_nd@Coroutine" is covariant, but "Coroutine[Any, Any, float | Tuple[float, float]]" is not a subtype of "float"
          "Coroutine[Any, Any, float | Tuple[float, float]]" is incompatible with "float" (reportIncompatibleMethodOverride)
  /Users/nexus/Desktop/ankit/ragas/src/ragas/metrics/_noise_sensitivity.py:244:16 - error: Expression of type "Unknown | tuple[Unknown, Unknown]" is incompatible with return type "Coroutine[Any, Any, float | Tuple[float, float]]"
    Type "Unknown | tuple[Unknown, Unknown]" is incompatible with type "Coroutine[Any, Any, float | Tuple[float, float]]"
      "tuple[Unknown, Unknown]" is incompatible with "Coroutine[Any, Any, float | Tuple[float, float]]" (reportReturnType)

@sahusiddharth sahusiddharth changed the title Basic implementation of Noise sensitivity metrics from RAGChecker Implementation of Noise sensitivity metrics from RAGChecker Aug 14, 2024
Copy link
Member

@shahules786 shahules786 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I have made few changes to polish it up.

@jjmachan jjmachan linked an issue Aug 15, 2024 that may be closed by this pull request
@shahules786 shahules786 merged commit 8da231d into explodinggradients:main Aug 23, 2024
15 checks passed
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[R-293] Noise sensitivity metrics from RAGChecker
2 participants