Skip to content

An adaptive ML model for identifying ad-serving domains for use in Pi-Hole.

Notifications You must be signed in to change notification settings

jkerrigan/photon_sphere

Repository files navigation

Photon Sphere

Build StatusCoverage Status

Photon Sphere aims to provide a machine learning approach to identifying domain DNS requests that are seen as pernicious (analytics, trackers, ad-serving) for use along with Pi Hole (https://pi-hole.net/) while being deployable on a Raspberry Pi. Model uses the unsupervised text tokenizer YouTokenToMe to parse and tokenize domains for use in a lightweight embedding model. Ideally, common elements (e.g. domain names having words such as 'ads' or 'tracker') among prior known pernicious domains can be used to identify domains that would traditionally require parsing by hand or an exceptionally complicated regex.

The model is composed of a siamese embedding layer with a distance metric learning network. The model is trained using a triplet loss to maximize dissimilarites between domains (e.g. login.microsoft.com - analytics.microsoft.com) while minimizing similarities (e.g. login.github.com - github.com).

Notes

  • YouTokenToMe(YTTM) vocab size is 300 by default (too large results in overfitting)
  • Model can be run in real-time or on the archived Pi Hole SQL DNS query logs
  • Online learning aspect is still in development

Requirements

About

An adaptive ML model for identifying ad-serving domains for use in Pi-Hole.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published