Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cansavvy's attempts at getting fig.alt's to sync between Google slides and Rmd #116

Merged
merged 27 commits into from
Aug 14, 2023

Conversation

cansavvy
Copy link
Contributor

@cansavvy cansavvy commented Jul 10, 2023

Purpose/implementation Section

What changes are being implemented in this Pull Request?

This is related to jhudsl/OTTR_Template#487

What was your approach?

I was trying to have it so it would download slide notes from Google Slides and then put them in the fig.alt option for the relevant ottrpal::get_image_from_slide()bit

This would be good because right now people have to do copy and pasting to update fig.alts and that's an easy way to get mistakes because the slide notes can quickly become out of sync with what is in the course.

The problem I've encountered is that there is no easy URL to retrieve slide notes from. You appear to need google authorization to download the powerpoint and get slide notes that way (which is what ari does).

I have been looking for a simpler work around so we don't have to supply google auth but there may not be one. ¯_(ツ)_/¯

What GitHub issue does your pull request address?

jhudsl/OTTR_Template#487

Tell potential reviewers what kind of feedback you are soliciting.

I'm posting this so @howardbaek can see what I was starting to work on and he might find a better solution.

The dream would be, we could download one slide's notes at a time using its slide ID eg.:

If this is the slide link:
https://docs.google.com/presentation/d/1ME0NbcIBmnHJRhX3JJyCwJuuomkl_BjJp6lD5oD5WnU/edit#slide=id.gd422c5de97_0_5 then somehow we could alter this URL to get the slide notes. But so far no such luck.

@cansavvy cansavvy changed the title Cansavvy's attempts at gettingn fig.alt's to sync between Google slides and Rmd Cansavvy's attempts at getting fig.alt's to sync between Google slides and Rmd Jul 10, 2023
#' sure it does not fail on large files
#' @return Downloaded file (in temporary directory)
#' @export
get_gs_pptx <- function(id) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we would just download one slide's powerpoint (which would contain its slide notes as well). But I haven't found how to do this.

@cansavvy cansavvy requested a review from howardbaek July 10, 2023 18:14
@howardbaek
Copy link
Contributor

howardbaek commented Jul 13, 2023

Page ID

ariExtra:::get_page_ids() extracts "page IDs of slides in a Google Slides presentation". A note about this function is that it somehow extracts out an non-existent page ID, g1013cbb9c28_0_45, which is later filtered out with check_png_urls().

Speaker Notes

# Download Google Slides as PPTX
pptx_file <- ariExtra::download_gs_file(id = "https://docs.google.com/presentation/d/1Vjvq7PYuWsTkGi2EkXpnk0KtQYhbPSidBhMFQcqyb8I/edit?usp=sharing", out_type = "pptx")

# Extract speaker notes
speaker_notes <- ariExtra::pptx_notes(pptx_file)
# Get rid of filenames in name
names(speaker_notes) <- NULL

@cansavvy
Copy link
Contributor Author

Background Research

ariExtra:::get_page_ids() extracts "page IDs of slides in a Google Slides presentation".

Oh that’s an awesome that exists! Great!

@howardbaek
Copy link
Contributor

howardbaek commented Jul 19, 2023

ariExtra:::get_page_ids() doesn't work perfectly. Sometimes, it misses some ids (last slide id).

We could use the rgoogleslides package (need authorization) to talk to the API and get the objectIds (ids of all the slides):

library(rgoogleslides)

client_id <- "YOUR_CLIENT_ID"
client_secret <- "YOUR_CLIENT_SECRET"

# Authorize R package to access Google Slides API
authorize(client_id = client_id, client_secret = client_secret)

url <- "https://slides.googleapis.com/v1/presentations/YOUR_PRESENTATION_ID_HERE?fields=slides.objectId"
# Get auth token
token <- get_token()
config <- httr::config(token=token)

# Get object Id
result <- httr::GET(url, config = config, accept_json())
result_content <- content(result, "text")
result_list <- jsonlite::fromJSON(result_content)

# Character vector of objectIds
result_list$slides$objectId

@howardbaek
Copy link
Contributor

howardbaek commented Jul 21, 2023

Great article on using OAuth 2.0 in R: https://blog.r-hub.io/2021/01/25/oauth-2.0/

Note to myself:

So far, I've figured out how to:

  • Read in JSON file that contains client ID and client secret
  • Generate an oauth2.0 token
  • Using this token, perform a GET HTTP method to the Google Slides API and retrieve the page IDs
  • Encrypt my OAuth 2.0 token by following the Managing tokens securely section
  • Now, I need to write a function that lets other users generate a token using their own credentials (e.g. gs4_auth())

@howardbaek howardbaek self-assigned this Jul 31, 2023
@howardbaek
Copy link
Contributor

@cansavvy Made some changes today:

  • authorize(): Using Google Cloud's Client ID and Client Secret, generate an OAuth 2.0 Access Token. Store this token in environment for later use.
  • extract_object_id(): Performs a HTTP GET method to request the IDs of every slide in a Google Slides presentation. This uses the token generated by authorize().
  • get_object_id_notes(): Retrieve Speaker Notes and their corresponding Object (Slide) IDs from a Google Slides presentation. Wrapper around extract_object_id() and get_gs_pptx() + pptx_notes()

@howardbaek
Copy link
Contributor

howardbaek commented Aug 1, 2023

Checks are failing because ariExtra is not on CRAN. I used the Remotes field in DESCRIPTION to depend on jhudsl/ariExtra: a618e6a#diff-9cc358405149db607ff830a16f0b4b21f7366e3c99ec00d52800acebe21b231cR47.

This SO post: Remotes is not an official description field and dependencies should be publicly available for submission on CRAN, . Also, Jenny Bryan says r-lib/devtools#1717 (comment).

So, it seems like we need to put ariExtra on CRAN or just copy-paste ariExtra::get_slide_id into ottrpal. Obviously, the latter is much easier to do.

@cansavvy
Copy link
Contributor Author

cansavvy commented Aug 1, 2023

Yeah that seems like a good solution. Alternatively you could have ari as a dependency since you transferred things into that package from ariExtra right?

But yeah making ottrpal's own version of get_slide_id makes sense to me (since its a small function).

So, it seems like we need to put ariExtra on CRAN or just copy-paste ariExtra::get_slide_id into ottrpal. Obviously, the latter is much easier to do.

@cansavvy
Copy link
Contributor Author

cansavvy commented Aug 1, 2023

@cansavvy Made some changes today:

  • authorize(): Using Google Cloud's Client ID and Client Secret, generate an OAuth 2.0 Access Token. Store this token in environment for later use.
  • extract_object_id(): Performs a HTTP GET method to request the IDs of every slide in a Google Slides presentation. This uses the token generated by authorize().
  • get_object_id_notes(): Retrieve Speaker Notes and their corresponding Object (Slide) IDs from a Google Slides presentation. Wrapper around extract_object_id() and get_gs_pptx() + pptx_notes()

Would you be able to write out code that illustrates how you test this in the context of the end use case? Basically can you give me a reprex for me to test this?

@howardbaek
Copy link
Contributor

@cansavvy Made some changes today:

  • authorize(): Using Google Cloud's Client ID and Client Secret, generate an OAuth 2.0 Access Token. Store this token in environment for later use.
  • extract_object_id(): Performs a HTTP GET method to request the IDs of every slide in a Google Slides presentation. This uses the token generated by authorize().
  • get_object_id_notes(): Retrieve Speaker Notes and their corresponding Object (Slide) IDs from a Google Slides presentation. Wrapper around extract_object_id() and get_gs_pptx() + pptx_notes()

Would you be able to write out code that illustrates how you test this in the context of the end use case? Basically can you give me a reprex for me to test this?

To test this:

  1. Get a Google Cloud Client ID and Client Secret following steps outlined here: https://www.hairizuan.com/rgoogleslides-using-your-own-account-client-id-and-secret/. Save the Client ID in R as client_id and Client Secret as client_secret.
  2. Run authorize(client_id = client_id, client_secret = client_secret). This will take you to a browser that looks like:

Screenshot 2023-08-01 at 11 12 50 AM

Give all the Google Drive and Google Slides permissions and you should be seeing this message: Authentication complete. Please close this page and return to R.

  1. Close page and return to R, where the console should show Authentication complete.
  2. Now, you have generated an OAuth 2.0 Access Token and stored it in an environment for later use.
  3. Use stored token to talk to Google Slides API: extract_object_id("https://docs.google.com/presentation/d/1H5aF_ROKVxE-HFHhoOy9vU2Y-y2M_PiV0q-JBL17Gss/edit?usp=sharing"). This should output a character vector of the ids of each 19 slides.
  4. To get the speaker notes+corresponding ids, run get_object_id_notes("https://docs.google.com/presentation/d/1H5aF_ROKVxE-HFHhoOy9vU2Y-y2M_PiV0q-JBL17Gss/edit?usp=sharing"). This should output a dataframe:

Screenshot 2023-08-01 at 11 18 36 AM

@howardbaek
Copy link
Contributor

Yeah that seems like a good solution. Alternatively you could have ari as a dependency since you transferred things into that package from ariExtra right?

But yeah making ottrpal's own version of get_slide_id makes sense to me (since its a small function).

So, it seems like we need to put ariExtra on CRAN or just copy-paste ariExtra::get_slide_id into ottrpal. Obviously, the latter is much easier to do.

Good point. But, the ari branch that contains this function, https://github.com/jhudsl/ari/tree/ariExtra-immigration, isn't on CRAN yet, so we encounter the same problem.

@cansavvy
Copy link
Contributor Author

cansavvy commented Aug 1, 2023

  1. Get a Google Cloud Client ID and Client Secret following steps outlined here:

This is a great place to start from! But we should think about how we want this to be implemented on the user side .

Setting up a Google Client Id is a lot for each user to do to just get the notes.

We should probably have a Google client id that is encrypted here and a default account that we can use. Perhaps we could make a dummy Google account so that is one more level safe. I can work on this potentially if you like and then we can pair program on it together.

@howardbaek
Copy link
Contributor

howardbaek commented Aug 1, 2023

I think I can do this fairly easily.

  1. Create a dummy Google email account ("ottrpal@gmail.com")
  2. Generate Google Client ID and Client Secret from GCP
  3. Set these as default arguments to authorize()

Is this what you were thinking? Is this a safe method?

@cansavvy
Copy link
Contributor Author

cansavvy commented Aug 1, 2023

I think I can do this fairly easily.

  1. Create a dummy Google email account ("ottrpal@gmail.com")

  2. Generate Google Client ID and Client Secret from GCP

  3. Set these as default arguments to authorize()

Is this what you were thinking? Is this a safe method?

Yes that's part 1. But we'll still want to keep those credentials safe via some encryption steps and finding a way (if possible) to just provide oAuth token from that account by default. That last part is easy through GitHub secrets but we'd have to think about the set up if people want to use the function locally. In the later case, we'd probably want them to provide their own credentials.

@howardbaek
Copy link
Contributor

howardbaek commented Aug 1, 2023

What you are saying is we want to use the Google Client ID and Client Secret to generate an OAuth 2.0 Token, encrypt this token somehow, and store it in the GitHub secrets of the ottrpal repo?

@cansavvy
Copy link
Contributor Author

cansavvy commented Aug 1, 2023

What you are saying is we want to use the Google Client ID and Client Secret to generate an OAuth 2.0 Token, encrypt this token somehow, and store it in the GitHub secrets of the ottrpal repo?

Yeah programmatic access through a secrets. Here's an example of that: https://github.com/datatrail-jhu/rgoogleclassroom/blob/fbf7f2a5479d25546ea51533c769ebeaae8cbbb6/R/auth.R#L116

And then the secrets can be GitHub secrets

@howardbaek howardbaek marked this pull request as ready for review August 14, 2023 19:07
Copy link
Contributor

@howardbaek howardbaek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We paircoded functions in auth.R so that we can talk to the Google Slides API using encrypted credentials. extract_object_id() lets us talk to the API and extract individual IDs of all the slides in a Google Slides deck.

@howardbaek howardbaek merged commit 43a050d into main Aug 14, 2023
4 checks passed
@howardbaek howardbaek deleted the cansavvy/fig.alt-2 branch August 14, 2023 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants