Photon Sphere

Photon Sphere aims to provide a machine learning approach to identifying domain DNS requests that are seen as pernicious (analytics, trackers, ad-serving) for use along with Pi Hole (https://pi-hole.net/) while being deployable on a Raspberry Pi. Model uses the unsupervised text tokenizer YouTokenToMe to parse and tokenize domains for use in a lightweight embedding model. Ideally, common elements (e.g. domain names having words such as 'ads' or 'tracker') among prior known pernicious domains can be used to identify domains that would traditionally require parsing by hand or an exceptionally complicated regex.

The model is composed of a siamese embedding layer with a distance metric learning network. The model is trained using a triplet loss to maximize dissimilarites between domains (e.g. login.microsoft.com - analytics.microsoft.com) while minimizing similarities (e.g. login.github.com - github.com).

Notes

YouTokenToMe(YTTM) vocab size is 300 by default (too large results in overfitting)
Model can be run in real-time or on the archived Pi Hole SQL DNS query logs
Online learning aspect is still in development

Requirements

tensorflow
numpy
sqlalchemy
youtokentome

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
images		images
models		models
yttm_model		yttm_model
.travis.yml		.travis.yml
DistanceMetricModel.ipynb		DistanceMetricModel.ipynb
EDA_DescriptiveStatistics.ipynb		EDA_DescriptiveStatistics.ipynb
README.md		README.md
functions.py		functions.py
photon_sphere_online.py		photon_sphere_online.py
requirements.txt		requirements.txt
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Photon Sphere

Notes

Requirements

About

Releases

Packages

Languages

jkerrigan/photon_sphere

Folders and files

Latest commit

History

Repository files navigation

Photon Sphere

Notes

Requirements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages