Jidouteki ("自動的", "automatic") is a website-independed manga sources and data extractor.
It uses python configs to describe website structures and provides many convenience methods to get parsers writen quickly and accurately.
Given a google-drive.py
config
import jidouteki
@jidouteki.register
class GDrive(Config):
@jidouteki.meta
def _meta(self):
return jidouteki.Metadata(
base = 'https://drive.google.com/',
key = 'google-drive',
display_name = 'Google drive'
)
@jidouteki.match
def _match(self):
return (
r"https://drive\.google\.com/drive/folders/(?P<folderId>.*?)(?:[/?].*|)$",
)
@jidouteki.images
def _images(self, folderId):
d = self.fetch(f"/drive/folders/{folderId}")
d = d.css("c-wiz > div[data-id]")
images = []
for el in d:
data_id = el["data-id"]
images.append(f"https://lh3.googleusercontent.com/d/{data_id}")
return images
The following code
from jidouteki import Jidouteki
jdtk = Jidouteki(
proxy="https://your-cors-proxy/"
)
gdrive = jdtk.load_config("google-drive.py")
images = gdrive.images(<folderId>)
print(images)
Will print all the urls of the images contained the google-drive folderId
folder.
The config files are publicily hosted over on ridamoe/configs. Contributions are welcome!
- Document api