Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature(gatsby-source-drupal): Use list of UUIDs generated by Drupal to fetch content individually #32131

Closed
wants to merge 3 commits into from

Conversation

Auspicus
Copy link
Contributor

@Auspicus Auspicus commented Jun 27, 2021

Description

As per discussions with @KyleAMathews we've been considering using the JSON:API individual content page eg. /jsonapi/node/article/1234 because its cacheability is much better than the listing endpoint. This means that for requests which can be cached the response time is increased on average 20x. This PR requires you have the latest patch from this issue: https://www.drupal.org/project/gatsby/issues/3220713 installed which enables the /gatsby/content-list endpoint.

There's still a few questions surrounding this:

  • benchmark cold cache, warm cache, etc.
  • figure out if / how we want to support current filters
  • enable this fetching strategy using a flag eg. fetchIndividualEntities: true in gatsby-config
  • what this means for fastbuilds

Doing some testing with my own Drupal site I saw:

  • 95% cache-hit-ratio (most misses were due to 403 access denied from unpublished pages OR content-types which require authentication to fetch)
  • requests per second for warm cache: ~1000+
  • requests per second for cold cache: ~50

This cache-hit-ratio is not going to fluctuate as much as it does with the current source plugin because node saves won't purge all of these cache entries only the relevant pages to the particular node that was saved.

@gatsbot gatsbot bot added the status: triage needed Issue or pull request that need to be triaged and assigned to a reviewer label Jun 27, 2021
const isTranslatable = languageConfig.translatableEntities.some(
entityType => entityType === type
)
const listResponse = await worker([
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should be using the requestQueue here and other places

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I figured out how to do that yesterday. Feels weird pushing a return value from a queue.push to a queue.push but here we go :)

headers,
}
])
const listResponseBody = JSON.parse(listResponse.body)
Copy link
Contributor Author

@Auspicus Auspicus Jun 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For whatever reason, passing responseType: 'json' to got and letting it do the JSON.parse results in some very weird artifacts in some environments. It worked 100% of the time on my local builds but doing a build in CI failed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's almost certainly that somehow your CI version isn't getting the right version of got. Probably because you're running this in a local plugin which means that it's whatever version of got is resolving in your site which because Gatsby's dependency tree unfortunately contains older versions of Got, isn't going to be predictable atm. Probably best for testing to publish the module to a temp npm package so that it can resolve its own dependencies correctly.

requestQueue.push([
urlJoin(baseUrl, language, apiBase, `/${entityType}/${entityBundle}/${entityUuid}`),
{
headers: shouldUseAuth ? headers : undefined
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dumps all the other headers if you don't want auth on the entity, need a better way to do this.

} = pluginOptions
const { createNode, setPluginStatus, touchNode } = actions

// Update the concurrency limit from the plugin options
requestQueue.concurrency = concurrentAPIRequests

if (typeof basicAuth.username === 'string' && typeof basicAuth.password === 'string') {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to work around: sindresorhus/got#1169

// based on relationships. The expanded data is exposed as `included`
// in the JSON API response.
// See https://www.drupal.org/docs/8/modules/jsonapi/includes
if (d.body.included) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're losing "included"

@LekoArts LekoArts changed the title WIP: feature(gatsby-source-drupal): Use list of UUIDs generated by Drupal to fetch content individually feature(gatsby-source-drupal): Use list of UUIDs generated by Drupal to fetch content individually Jun 28, 2021
@LekoArts LekoArts added topic: source-drupal Related to Gatsby's integration with Drupal and removed status: triage needed Issue or pull request that need to be triaged and assigned to a reviewer labels Jun 28, 2021
@LekoArts LekoArts marked this pull request as draft June 28, 2021 08:53
@KyleAMathews
Copy link
Contributor

(most misses were due to 403 access denied from unpublished pages OR content-types which require authentication to fetch)

Another idea here — we could just auto-retry with authentication on 403 failures. Maybe learn automatically which content-types need auth too and store that in the plugin cache for subsequent fetching.

@Auspicus
Copy link
Contributor Author

Auspicus commented Jun 28, 2021

(most misses were due to 403 access denied from unpublished pages OR content-types which require authentication to fetch)

Another idea here — we could just auto-retry with authentication on 403 failures. Maybe learn automatically which content-types need auth too and store that in the plugin cache for subsequent fetching.

Problem is: 403 != content type requires auth. Sometimes a 403 just means that the content isn't published. There might be a difference between these 2 cases (403 vs 401) but not 100% sure.

@KyleAMathews
Copy link
Contributor

ah right — yeah 401 is what we'd retry with auth

@KyleAMathews
Copy link
Contributor

The gatsby module should probably just mark unpublished content as such so the source plugin could know to skip it if it's doing a build (or fetch it with auth if it's doing a preview build).

@Auspicus
Copy link
Contributor Author

Auspicus commented Jul 1, 2021

The gatsby module should probably just mark unpublished content as such so the source plugin could know to skip it if it's doing a build (or fetch it with auth if it's doing a preview build).

That's probably a good way to handle it. Some demarcation (maybe keyed differently or with a status: 0 value) noting that the content is unpublished and should be sourced accordingly.

@LekoArts LekoArts added the topic: source-plugins Relates to the Gatsby source plugins (e.g. -filesystem) label Apr 11, 2022
@Auspicus Auspicus closed this Nov 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: source-drupal Related to Gatsby's integration with Drupal topic: source-plugins Relates to the Gatsby source plugins (e.g. -filesystem)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants