Skip to content

Conversation

@holtvogt
Copy link

@holtvogt holtvogt commented Sep 18, 2025

This PR introduces two new audits to monitor and analyze content fragment 404s on AEM:

What's New

  • CDN Content Fragment 404 Audit (cdn-content-fragment-404): Monitors CDN logs hourly to identify content fragment requests that return 404 errors, using Athena queries to aggregate and export the data to S3 for further analysis and reporting.
  • Content Fragment 404 Audit (content-fragment-404): Analyzes broken content fragment paths discovered in CDN logs on a daily basis, and intelligently suggests repair actions through a multi-step workflow that applies various strategies like republishing, locale fallbacks, and similar path matching.

Use Case

Health monitoring for Content Fragments on AEM Sites: Automatically detect broken content fragment requests across AEM Sites by monitoring CDN traffic patterns, identifying 404 errors, and providing actionable repair suggestions to maintain content availability and user experience.

Related

This commit introduces the AemAuthorClient class, which facilitates communication with the AEM Author API. It includes methods for checking content availability, fetching content, and retrieving child paths from a given parent path. The class also handles authentication and URL creation for API requests, enhancing the overall content path management capabilities.
This commit introduces the LevenshteinDistance class, which provides a static method to calculate the edit distance between two strings. The implementation includes error handling for null inputs and utilizes a dynamic programming approach to compute the distance efficiently.
This commit introduces the PathUtils class, which provides utility methods for managing content paths. The class includes methods to remove locale segments from paths and to retrieve the parent path of a given content path, enhancing the overall path management functionality.
This commit introduces the PathIndex and PathNode classes, which implement a structure for managing and indexing content paths. The PathIndex class provides methods for inserting, finding, deleting, and retrieving paths, while the PathNode class represents individual nodes in the path tree.
This commit introduces the AnalysisStrategy class, which implements a strategy for analyzing broken content paths using various rules. The class includes methods for cleaning paths, analyzing broken paths, and processing suggestions based on content status. It integrates existing rules such as PublishRule, LocaleFallbackRule, and SimilarPathRule to provide comprehensive analysis and recommendations for content management.
This commit introduces the AthenaCollector class, which extends BaseCollector and provides functionality to fetch broken content paths from an Athena database. It includes methods for ensuring the database and table exist, as well as querying for broken paths. Additionally, a BaseCollector class is created as a base for future collectors, and a CollectorFactory class is added to instantiate collectors based on the specified type, currently supporting Athena.
This commit introduces three new SQL files: create-database.sql for creating a database if it doesn't exist, create-table.sql for defining an external table with partitioning and storage properties, and daily-query.sql for selecting URLs based on specified date and tenant criteria.
This commit refactors the broken content path handling by introducing three new functions: fetchBrokenContentPaths, analyzeBrokenContentPaths, and provideSuggestions. It replaces the previous runner function with a step-based approach, utilizing a CollectorFactory for fetching broken paths and using the analysis strategy.
This update introduces pagination handling in the AemAuthorClient class, allowing for the fetching of content in multiple pages. A maximum page limit and a delay between requests have been implemented to manage rate limiting. Additionally, a utility function for creating URLs with pagination parameters has been added, along with error handling improvements during content fetching.
This update modifies the logic for determining content availability by checking if exactly one item exists in the response.
holtvogt and others added 22 commits October 28, 2025 17:18
- Added optional getRank parameter to syncSuggestions for extracting rank from new data items.
- Updated existing suggestions' rank if getRank is provided and new data item is found.
- Added unit test to verify rank update functionality when getRank is used.
- Renamed `enrichBrokenContentFragmentLinkSuggestions` to `createContentFragmentLinkSuggestions` for clarity.
- Updated function parameters to accept `auditUrl` and `auditData` directly.
- Enhanced suggestion creation by enriching suggestions with request metadata (request count and user agents).
- Added a new function `enrichContentFragmentLinkSuggestions` to handle enrichment and messaging to Mystique.
- Updated tests to reflect changes in function names and logic, ensuring proper handling of suggestions and opportunities.
@holtvogt holtvogt changed the title feat: add audit for broken content fragment requests feat: add audit for content fragment 404s Nov 10, 2025
@holtvogt holtvogt linked an issue Nov 10, 2025 that may be closed by this pull request
4 tasks
@holtvogt holtvogt removed a link to an issue Nov 10, 2025
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants