Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GitHub Occurrences Badge #4068

Open
cloewen8 opened this issue Sep 24, 2019 · 9 comments
Open

GitHub Occurrences Badge #4068

cloewen8 opened this issue Sep 24, 2019 · 9 comments
Labels
service-badge Accepted and actionable changes, features, and bugs

Comments

@cloewen8
Copy link

📋 Description

A badge for GitHub that counts the occurrences of a sequence in a file.

For example, given /github/occurances/badges/shields/README.md/badge, the badge would show 22 (badge occurs in README.md 22 times).

Optimally, sequence should be a regular expression (escape sequences, word boundaries, character classes).

🔗 Data

The data required for this can be retrieved from https://raw.githubusercontent.com/:user/:repo/:branch/:path. It only requires authentication for private repositories.
Unfortunately I don't know of any official documentation for this endpoint, only that it is the destination when pressing "Raw" on a file on GitHub.

Additional processing is required and would need to be limited.

🎤 Motivation

I personally want to use it to count the number of facts in a text file (each fact is on its own line).
A badge for counting lines would work, but being able to count anything opens the door for a lot more opportunity:

  • How many times is "as you know" mentioned in a story?
  • How many times, if at all, is goto used?
  • How many code blocks are present?
  • How many references to a shutdown API exist?
@cloewen8 cloewen8 added the service-badge Accepted and actionable changes, features, and bugs label Sep 24, 2019
@paulmelnikow
Copy link
Member

Hi, thanks for your request. We have something similar which searches for files within a repo that match a specific pattern, using the GitHub Search API, however it’s not able to do this.

I like the idea of a dynamic text badge, and can see doing lines or string matching, however I feel like for a lot of things you’d want a regex (and not sure we should run arbitrary regexes, since they can be crafted in a way that they use a large amount of compute resource).

Can you share a link to the file? Sometimes seeing the specific case really solidifies why a feature should exist. It might also surface a creative way to use what is already there!

@cloewen8
Copy link
Author

I absolutely agree that arbitrary regex (or any user submitted code) should not be blindly trusted! For computation, a timeout can be used. I know regex101.com uses this strategy.
image

Here is were I want to use it: https://github.com/cloewen8/dolphin-fact/blob/master/README.md
Currently, I'm using /github/size, but this isn't very helpful, but better than nothing. I want to use it as a form of progress counter, get people interested as the project grows.

@paulmelnikow
Copy link
Member

Huh, since it's a list, what would you think about using YAML instead? That way you could use the Dynamic YAML badge and a JSONPath expression.

All you'd have to do is prefix each line with a -, and then in your app, you could either strip off the leading - or a proper YAML parser (we use js-yaml which is great).

@cloewen8
Copy link
Author

YAML is definitely an option for my use case. I wouldn't consider it optimal over a simple text file or CSV file though. Would creating this or a similar badge be an option? Depending on what is required, I would be willing to just create it.

@paulmelnikow
Copy link
Member

I'd be 👍 on adding a badge to count lines either in an arbitrary URL or in a file on GitHub. Would you be interested in working on that?

The GitHub version is a little more complicated because, to support auth, we use the Contents API.

This is the helper function that fetches file contents from GitHub repos. It parses JSON but could be refactored to obtain the contents as text.

async function fetchJsonFromRepo(
serviceInstance,
{ schema, user, repo, branch = 'master', filename }
) {
const errorMessages = errorMessagesFor(
`repo not found, branch not found, or ${filename} missing`
)
if (serviceInstance.staticAuthConfigured) {
const url = `/repos/${user}/${repo}/contents/${filename}`
const options = { qs: { ref: branch } }
const { content } = await serviceInstance._requestJson({
schema: contentSchema,
url,
options,
errorMessages,
})
let decoded
try {
decoded = Buffer.from(content, 'base64').toString('utf-8')
} catch (e) {
throw new InvalidResponse({ prettyMessage: 'undecodable content' })
}
const json = serviceInstance._parseJson(decoded)
return serviceInstance.constructor._validate(json, schema)
} else {
const url = `https://raw.githubusercontent.com/${user}/${repo}/${branch}/${filename}`
return serviceInstance._requestJson({
schema,
url,
errorMessages,
})
}
}

Here's the existing GitHub badge for the package.json version, which is the closest badge we have to GitHub file line count.

class GithubPackageJsonVersion extends ConditionalGithubAuthV3Service {
static get category() {
return 'version'
}
static get route() {
return {
base: 'github/package-json/v',
pattern: ':user/:repo/:branch*',
}
}
static get examples() {
return [
{
title: 'GitHub package.json version',
pattern: ':user/:repo',
namedParams: { user: 'IcedFrisby', repo: 'IcedFrisby' },
staticPreview: this.render({ version: '2.0.0-alpha.2' }),
documentation,
keywords,
},
{
title: 'GitHub package.json version (branch)',
pattern: ':user/:repo/:branch',
namedParams: {
user: 'IcedFrisby',
repo: 'IcedFrisby',
branch: 'master',
},
staticPreview: this.render({ version: '2.0.0-alpha.2' }),
documentation,
keywords,
},
]
}
static render({ version, branch }) {
return renderVersionBadge({
version,
tag: branch,
defaultLabel: 'version',
})
}
async handle({ user, repo, branch }) {
const { version } = await fetchJsonFromRepo(this, {
schema: versionSchema,
user,
repo,
branch,
filename: 'package.json',
})
return this.constructor.render({ version, branch })
}
}

The "any URL" version (which could be used with a raw.githubusercontent.com URL) would be simpler. The osslifecycle badge could be adapted pretty readily for this:

'use strict'
const { BaseService, InvalidResponse } = require('..')
const documentation = `
<p>
OSS Lifecycle is an initiative started by Netflix to classify open-source projects into lifecycles
and clearly identify which projects are active and which ones are retired. To enable this badge,
simply create an OSSMETADATA tagging file at the root of your GitHub repository containing a
single line similar to the following: <code>osslifecycle=active</code>. Other suggested values are
<code>osslifecycle=maintenance</code> and <code>osslifecycle=archived</code>. A working example
can be viewed on the <a href="https://github.com/Netflix/osstracker">OSS Tracker repository</a>.
</p>
`
module.exports = class OssTracker extends BaseService {
static get category() {
return 'other'
}
static get route() {
return {
base: 'osslifecycle',
pattern: ':user/:repo/:branch*',
}
}
static get examples() {
return [
{
title: 'OSS Lifecycle',
pattern: ':user/:repo',
namedParams: { user: 'Teevity', repo: 'ice' },
staticPreview: this.render({ status: 'active' }),
keywords: ['Netflix'],
documentation,
},
{
title: 'OSS Lifecycle (branch)',
pattern: ':user/:repo/:branch',
namedParams: {
user: 'Netflix',
repo: 'osstracker',
branch: 'documentation',
},
staticPreview: this.render({ status: 'active' }),
keywords: ['Netflix'],
documentation,
},
]
}
static get defaultBadgeData() {
return { label: 'oss lifecycle' }
}
/**
* Return color for active, maintenance and archived statuses, which were the three
* example keywords used in Netflix's open-source meetup.
* See https://slideshare.net/aspyker/netflix-open-source-meetup-season-4-episode-1
* Other keywords are possible, but will appear in grey.
*
* @param {object} attrs Refer to individual attrs
* @param {string} attrs.status Specifies the current maintenance status
* @returns {string} color
*/
static getColor({ status }) {
if (status === 'active') {
return 'brightgreen'
} else if (status === 'maintenance') {
return 'yellow'
} else if (status === 'archived') {
return 'red'
}
return 'lightgrey'
}
static render({ status }) {
const color = this.getColor({ status })
return {
message: status,
color,
}
}
async fetch({ user, repo, branch }) {
return this._request({
url: `https://raw.githubusercontent.com/${user}/${repo}/${branch}/OSSMETADATA`,
})
}
async handle({ user, repo, branch }) {
const { buffer } = await this.fetch({
user,
repo,
branch: branch || 'master',
})
try {
const status = buffer.match(/osslifecycle=([a-z]+)/im)[1]
return this.constructor.render({ status })
} catch (e) {
throw new InvalidResponse({
prettyMessage: 'metadata in unexpected format',
})
}
}
}

And here's our tutorial: https://github.com/badges/shields/blob/master/doc/TUTORIAL.md

@PyvesB PyvesB changed the title GitHub Occurances Badge GitHub Occurrences Badge May 9, 2021
@calebcartwright
Copy link
Member

Confess I'm still not sure I'm following after reading through a couple times, but curious whether this is a case that would be better suited to our Endpoint Badge?

@cloewen8
Copy link
Author

I feel like this is general-purpose enough to be its own badge. That certainly is an option, but would require hosting the endpoint, which may be too much setup for users.

Would you be interested in working on that?
Assuming this is still a wanted feature, I could definitely implement it now.

@calebcartwright
Copy link
Member

but would require hosting the endpoint, which may be too much setup for users.

This is a common and understandable intuition, but one which I tend to think is an incorrect assumption. With services like Runkit (linked in our Endpoint docs) there's 0 hosting concerns and 0 costs, users can quite literally just chuck a bit of code up there and be off and running.

While there's certainly a case to be made that your goal is something others might be interested in (though worth mentioning that we've not had any other requests nor has our community been upvoting/requesting this particular ask), I'm not convinced any implementation would actually be sufficiently general purpose. I also have some reservations about the notion of processing any arbitrary file on our prod servers as an implementation mechanism to achieve the goal.

I'm not inherently opposed to having this as a native badge, so I'd be happy to have my skepticism and concerns proven wrong if you're feeling sufficiently motivated to submit a PR! However, I do think the Endpoint is both the easiest and fastest approach, and is also something we could reference in places like Awesome Badges to highlight the pattern in case any future users are interested in something similar.

@cloewen8
Copy link
Author

I no longer need this, I forgot why I needed this to begin with. My motivation in implementing it is the simplicity and initial responses. If it would be more trouble than it's worth, I'd be happy to use the Endpoint badge with RunKit instead when needed (can't believe I ever missed this, great service).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
service-badge Accepted and actionable changes, features, and bugs
Projects
None yet
Development

No branches or pull requests

3 participants