Skip to content

feat: Structured data audit to validate issues from scrape data #885

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

herzog31
Copy link
Member

This is a major refactor of the structured data audit to support the following features:

  • Instead of just relying on Google Search Console (GSC) to retrieve structured data issues, the audit now analyses the structured data extracted by WAE as part of the scraping process.
  • If a customer is onboarded to GSC, issues raised by GSC will be additionally considered, if not already covered by the validator. Customers that are not onboarded to GSC will not bloc
  • Added prefix to all logs to facilitate monitoring of issues and missing features. This is in particular important for the further development of the validator.
  • The audit now supports structured data in Microdata and RDFa formats, in addition to JSON-LD.
  • Refactored of audit code and unit tests.

Related Issues

  • SITES-28711 Remove dependency to Google Search Console
  • SITES-28714 Support additional structured data formats

Needs to be merged together with https://github.com/adobe/spacecat-content-scraper/pull/333.

Thanks for contributing!

Copy link

This PR will trigger no release when merged.

@herzog31 herzog31 requested a review from a team May 21, 2025 12:02
@solaris007 solaris007 added the enhancement New feature or request label May 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants