Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TA] Add encoding support and length and offset #14719

Merged
merged 3 commits into from
Sep 1, 2020
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions sdk/textanalytics/Azure.AI.TextAnalytics/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
- It defaults to the latest supported API version, which currently is `3.1-preview.2`.
- `ErrorCode` value returned from the service is now surfaced in `RequestFailedException`.
- Support added for Opinion Mining. This feature is available in the Text Analytics service v3.1-preview.1 and above.
- Added `Offset` and `Length` properties for `CategorizedEntity`, `SentenceSentiment`, and `LinkedEntityMatch`. The default encoding is UTF16 code units. For additional information see https://aka.ms/text-analytics-offsets
maririos marked this conversation as resolved.
Show resolved Hide resolved
- `TextAnalyticsError` and `TextAnalyticsWarning` now are marked as immutable.

## 5.0.0 (2020-07-27)
Expand Down
9 changes: 6 additions & 3 deletions sdk/textanalytics/Azure.AI.TextAnalytics/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -190,7 +190,8 @@ CategorizedEntityCollection entities = client.RecognizeEntities(document);
Console.WriteLine($"Recognized {entities.Count} entities:");
foreach (CategorizedEntity entity in entities)
{
Console.WriteLine($"Text: {entity.Text}, Category: {entity.Category}, SubCategory: {entity.SubCategory}, Confidence score: {entity.ConfidenceScore}");
Console.WriteLine($"Text: {entity.Text}, Offset (in UTF-16 code units): {entity.Offset}, Length (in UTF-16 code units): {entity.Length}");
maririos marked this conversation as resolved.
Show resolved Hide resolved
Console.WriteLine($"Category: {entity.Category}, SubCategory: {entity.SubCategory}, Confidence score: {entity.ConfidenceScore}");
}
```
For samples on using the production recommended option `RecognizeEntitiesBatch` see [here][recognize_entities_sample].
Expand All @@ -211,7 +212,8 @@ foreach (LinkedEntity linkedEntity in linkedEntities)
Console.WriteLine($"Name: {linkedEntity.Name}, Language: {linkedEntity.Language}, Data Source: {linkedEntity.DataSource}, Url: {linkedEntity.Url.ToString()}, Entity Id in Data Source: {linkedEntity.DataSourceEntityId}");
foreach (LinkedEntityMatch match in linkedEntity.Matches)
{
Console.WriteLine($" Match Text: {match.Text}, Confidence score: {match.ConfidenceScore}");
Console.WriteLine($" Match Text: {match.Text}, Offset (in UTF-16 code units): {match.Offset}, Length (in UTF-16 code units): {match.Length}");
Console.WriteLine($" Confidence score: {match.ConfidenceScore}");
}
}
```
Expand Down Expand Up @@ -241,7 +243,8 @@ CategorizedEntityCollection entities = await client.RecognizeEntitiesAsync(docum
Console.WriteLine($"Recognized {entities.Count} entities:");
foreach (CategorizedEntity entity in entities)
{
Console.WriteLine($"Text: {entity.Text}, Category: {entity.Category}, SubCategory: {entity.SubCategory}, Confidence score: {entity.ConfidenceScore}");
Console.WriteLine($"Text: {entity.Text}, Offset (in UTF-16 code units): {entity.Offset}, Length (in UTF-16 code units): {entity.Length}");
Console.WriteLine($"Category: {entity.Category}, SubCategory: {entity.SubCategory}, Confidence score: {entity.ConfidenceScore}");
}
```

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,8 @@ CategorizedEntityCollection entities = client.RecognizeEntities(document);
Console.WriteLine($"Recognized {entities.Count} entities:");
foreach (CategorizedEntity entity in entities)
{
Console.WriteLine($"Text: {entity.Text}, Category: {entity.Category}, SubCategory: {entity.SubCategory}, Confidence score: {entity.ConfidenceScore}");
Console.WriteLine($"Text: {entity.Text}, Offset (in UTF-16 code units): {entity.Offset}, Length (in UTF-16 code units): {entity.Length}");
Console.WriteLine($"Category: {entity.Category}, SubCategory: {entity.SubCategory}, Confidence score: {entity.ConfidenceScore}");
}
```

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,8 @@ foreach (LinkedEntity linkedEntity in linkedEntities)
Console.WriteLine($"Name: {linkedEntity.Name}, Language: {linkedEntity.Language}, Data Source: {linkedEntity.DataSource}, Url: {linkedEntity.Url.ToString()}, Entity Id in Data Source: {linkedEntity.DataSourceEntityId}");
foreach (LinkedEntityMatch match in linkedEntity.Matches)
{
Console.WriteLine($" Match Text: {match.Text}, Confidence score: {match.ConfidenceScore}");
Console.WriteLine($" Match Text: {match.Text}, Offset (in UTF-16 code units): {match.Offset}, Length (in UTF-16 code units): {match.Length}");
Console.WriteLine($" Confidence score: {match.ConfidenceScore}");
}
}
```
Expand Down
16 changes: 15 additions & 1 deletion sdk/textanalytics/Azure.AI.TextAnalytics/src/AspectSentiment.cs
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,13 @@ public readonly struct AspectSentiment
{
private const double _neutralValue = 0d;

internal AspectSentiment(TextSentiment sentiment, string text, double positiveScore, double negativeScore)
internal AspectSentiment(TextSentiment sentiment, string text, double positiveScore, double negativeScore, int offset, int length)
{
Sentiment = sentiment;
Text = text;
ConfidenceScores = new SentimentConfidenceScores(positiveScore, _neutralValue, negativeScore);
Offset = offset;
Length = length;
}

internal AspectSentiment(SentenceAspect sentenceAspect)
Expand All @@ -31,6 +33,8 @@ internal AspectSentiment(SentenceAspect sentenceAspect)
Text = sentenceAspect.Text;
ConfidenceScores = new SentimentConfidenceScores(sentenceAspect.ConfidenceScores.Positive, _neutralValue, sentenceAspect.ConfidenceScores.Negative);
Sentiment = (TextSentiment)Enum.Parse(typeof(TextSentiment), sentenceAspect.Sentiment, ignoreCase: true);
Offset = sentenceAspect.Offset;
Length = sentenceAspect.Length;
}

/// <summary>
Expand All @@ -50,5 +54,15 @@ internal AspectSentiment(SentenceAspect sentenceAspect)
/// Higher values signify higher confidence.
/// </summary>
public SentimentConfidenceScores ConfidenceScores { get; }

/// <summary>
/// Gets the starting position (in UTF16 code units) for the aspect text.
/// </summary>
public int Offset { get; }

/// <summary>
/// Gets the length (in UTF16 code units) of the aspect text.
/// </summary>
public int Length { get; }
}
}
12 changes: 12 additions & 0 deletions sdk/textanalytics/Azure.AI.TextAnalytics/src/CategorizedEntity.cs
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@ internal CategorizedEntity(Entity entity)
Text = entity.Text;
SubCategory = entity.Subcategory;
ConfidenceScore = entity.ConfidenceScore;
Offset = entity.Offset;
Length = entity.Length;
}

/// <summary>
Expand Down Expand Up @@ -50,5 +52,15 @@ internal CategorizedEntity(Entity entity)
/// text substring matches this inferred entity.
/// </summary>
public double ConfidenceScore { get; }

/// <summary>
/// Gets the starting position (in UTF16 code units) for the matching text in the input document.
/// </summary>
public int Offset { get; }

/// <summary>
/// Gets the length (in UTF16 code units) of the matching text in the input document.
/// </summary>
public int Length { get; }
}
}
15 changes: 10 additions & 5 deletions sdk/textanalytics/Azure.AI.TextAnalytics/src/LinkedEntityMatch.cs
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,14 @@ internal LinkedEntityMatch(double confidenceScore, string text, int offset, int
/// </summary>
public double ConfidenceScore { get; }

/// <summary> Start position for the entity match text. </summary>
private int Offset { get; }
/// <summary> Length for the entity match text. </summary>
private int Length { get; }
}
/// <summary>
/// Gets the starting position (in UTF16 code units) for the matching text in the document.
/// </summary>
public int Offset { get; }

/// <summary>
/// Gets the length (in UTF16 code units) of the matching text in the document.
/// </summary>
public int Length { get; }
}
}
16 changes: 15 additions & 1 deletion sdk/textanalytics/Azure.AI.TextAnalytics/src/OpinionSentiment.cs
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,14 @@ public readonly struct OpinionSentiment
{
private const double _neutralValue = 0d;

internal OpinionSentiment(TextSentiment sentiment, double positiveScore, double negativeScore, string text, bool isNegated)
internal OpinionSentiment(TextSentiment sentiment, double positiveScore, double negativeScore, string text, bool isNegated, int offset, int length)
{
Sentiment = sentiment;
ConfidenceScores = new SentimentConfidenceScores(positiveScore, _neutralValue, negativeScore);
Text = text;
IsNegated = isNegated;
Offset = offset;
Length = length;
}

internal OpinionSentiment(SentenceOpinion opinion)
Expand All @@ -31,6 +33,8 @@ internal OpinionSentiment(SentenceOpinion opinion)
ConfidenceScores = new SentimentConfidenceScores(opinion.ConfidenceScores.Positive, _neutralValue, opinion.ConfidenceScores.Negative);
Sentiment = (TextSentiment)Enum.Parse(typeof(TextSentiment), opinion.Sentiment, ignoreCase: true);
IsNegated = opinion.IsNegated;
Offset = opinion.Offset;
Length = opinion.Length;
}

/// <summary>
Expand All @@ -57,5 +61,15 @@ internal OpinionSentiment(SentenceOpinion opinion)
/// "The food is not good", the opinion "good" is negated.
/// </summary>
public bool IsNegated { get; }

/// <summary>
/// Gets the starting position (in UTF16 code units) for the opinion text.
/// </summary>
public int Offset { get; }

/// <summary>
/// Gets the length (in UTF16 code units) of the opinion text.
/// </summary>
public int Length { get; }
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,13 @@ namespace Azure.AI.TextAnalytics
/// </summary>
public readonly struct SentenceSentiment
{
internal SentenceSentiment(TextSentiment sentiment, string text, double positiveScore, double neutralScore, double negativeScore, IReadOnlyList<MinedOpinion> minedOpinions)
internal SentenceSentiment(TextSentiment sentiment, string text, double positiveScore, double neutralScore, double negativeScore, int offset, int length, IReadOnlyList<MinedOpinion> minedOpinions)
{
Sentiment = sentiment;
Text = text;
ConfidenceScores = new SentimentConfidenceScores(positiveScore, neutralScore, negativeScore);
Offset = offset;
Length = length;
MinedOpinions = new List<MinedOpinion>(minedOpinions);
}

Expand All @@ -33,6 +35,8 @@ internal SentenceSentiment(SentenceSentimentInternal sentenceSentiment, IReadOnl
ConfidenceScores = sentenceSentiment.ConfidenceScores;
Sentiment = (TextSentiment)Enum.Parse(typeof(TextSentiment), sentenceSentiment.Sentiment, ignoreCase: true);
MinedOpinions = ConvertToMinedOpinions(sentenceSentiment, allSentences);
Offset = sentenceSentiment.Offset;
Length = sentenceSentiment.Length;
}

/// <summary>
Expand All @@ -57,6 +61,16 @@ internal SentenceSentiment(SentenceSentimentInternal sentenceSentiment, IReadOnl
/// </summary>
public IReadOnlyCollection<MinedOpinion> MinedOpinions { get; }

/// <summary>
/// Gets the starting position (in UTF16 code units) for the matching text in the sentence.
/// </summary>
public int Offset { get; }

/// <summary>
/// Gets the length (in UTF16 code units) of the matching text in the sentence.
/// </summary>
public int Length { get; }

private static IReadOnlyCollection<MinedOpinion> ConvertToMinedOpinions(SentenceSentimentInternal sentence, IReadOnlyList<SentenceSentimentInternal> allSentences)
{
var minedOpinions = new List<MinedOpinion>();
Expand Down
Loading