diff --git a/CHANGELOG.md b/CHANGELOG.md
index e352e03..f341cc6 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,5 +1,20 @@
# Changelog
+## [0.3.1] - 2024-11-07
+
+### Breaking Changes
+- Loading images from 'path' has been removed for security reasons. Please specify images by passing an 'url'.
+
+### Added
+- Add podcast generation from topic "Latest News in U.S. Politics"
+- Integrate with 100+ LLM models (OpenAI, Anthropic, Google etc) for transcript generation
+- Integrate with Google's Multispeaker TTS model for high-quality audio generation
+- Deploy [REST API](https://github.com/souzatharsis/podcastfy/blob/main/usage/api.md) with FastAPI
+- Support for raw text as input
+- Add PRIVACY_POLICY.md
+- Start TESTIMONIALS.md
+- Add apps using Podcastfy to README.md
+
## [0.2.3] - 2024-10-15
### Added
diff --git a/README.md b/README.md
index 2ce4505..17f4f3c 100644
--- a/README.md
+++ b/README.md
@@ -121,6 +121,12 @@ Podcastfy offers a range of customization options to tailor your AI-generated po
- Choose to run [Local LLMs](usage/local_llm.md) (156+ HuggingFace models)
- Set [System Settings](usage/config_custom.md) (e.g. output directory settings)
+## Built with Podcastfy 🛠️
+
+- [OpenNotebook](www.open-notebook.ai)
+- [Podcastfy-UI](https://github.com/giulioco/podcastfy-ui)
+- [Podcastfy-Gradio App](https://huggingface.co/spaces/thatupiso/Podcastfy.ai_demo)
+
## License
This software is licensed under [Apache 2.0](LICENSE). [Here](usage/license-guide.md) are a few instructions if you would like to use podcastfy in your software.
diff --git a/TESTIMONIALS.md b/TESTIMONIALS.md
new file mode 100644
index 0000000..d113b49
--- /dev/null
+++ b/TESTIMONIALS.md
@@ -0,0 +1,2 @@
+- "Love that you casually built an open source version of the most popular product Google built in the last decade"
+- "I think it's awesome that you were inspired/recognize how hard it is to beat NotebookLM's quality, but you did an *incredible* job with this! It sounds incredible, and it's open-source! Thank you for being amazing!"
\ No newline at end of file
diff --git a/data/audio/podcast_e9ef119af37c45b6abd8326cb382b3b0.mp3 b/data/audio/podcast_e9ef119af37c45b6abd8326cb382b3b0.mp3
new file mode 100644
index 0000000..4c3e9da
Binary files /dev/null and b/data/audio/podcast_e9ef119af37c45b6abd8326cb382b3b0.mp3 differ
diff --git a/data/transcripts/transcript_c8b400052bbe48fa99b10c93ad8c3576.txt b/data/transcripts/transcript_c8b400052bbe48fa99b10c93ad8c3576.txt
new file mode 100644
index 0000000..d58dfc0
--- /dev/null
+++ b/data/transcripts/transcript_c8b400052bbe48fa99b10c93ad8c3576.txt
@@ -0,0 +1,17 @@
+ "Welcome to PODCASTFY - Your Personal Generative AI Podcast! Hot off the digital press, we're diving into OpenAI's latest power move: snatching up Chat.com! Can you believe it?"
+ "Seriously?! Chat.com? That's like owning prime real estate in the internet world. It's gotta be worth a fortune!"
+ "Well, rumors are swirling around the $15 million mark, maybe even more! Think about it, it went for that much just last year to HubSpot's CTO, Dharmesh Shah, and he just sold it to OpenAI! Apparently even got some OpenAI shares in the deal. Pretty sweet, huh?"
+ "Wow, OpenAI shares as part of the deal? That's insightful! But why Chat.com? Don't they already have ChatGPT?"
+ "Exactly! It's all about accessibility, baby! Making ChatGPT even easier to find. Right now, it's just a redirect, but who knows what the future holds? Maybe a whole new platform built around it!"
+ "Ooh, interesting. So, it's less about a new product, more about grabbing that sweet, sweet keyword: 'chat'."
+ "Precisely! It's like buying the best billboard on the digital highway. Everyone searching for 'chat' might just stumble upon OpenAI's goldmine."
+ "Smart move. But grabbing Chat.com isn't the only thing they've been up to, is it?"
+ "Oh no, not even close! They're on a roll! ChatGPT search, their own built-in search engine—taking on Google, no less! And Canvas?! A brand-new way to use ChatGPT for writing and coding? Game changer!"
+ "Hold on, Canvas? I haven't heard about that one. Fill me in!"
+ "Think of it as a more interactive space within ChatGPT. Perfect for crafting documents, collaborative coding, you name it! It's like they're building a whole ecosystem around ChatGPT. Plus, they just dropped OpenAI o1, whatever *that* is! "
+ "They're certainly not resting on their laurels! A for-profit transition in California? Hiring the former Pebble CEO, Gabor Cselle, for a 'secret project'? What's next, world domination? "
+ "Haha, right? And let's not forget SimpleQA! OpenAI is pushing the boundaries of AI research left and right! I'm slightly concerned about these developments though, don't you think they are going a little too fast?"
+ "I see your point. It *is* a lot, and fast. While innovation is exciting, responsible development is crucial. We need to make sure these advancements benefit humanity, not the other way around."
+ "Absolutely. But hey, with all this happening, the AI landscape is definitely anything but boring! It'll be interesting to see how these moves play out, especially against giants like Google."
+ "Couldn't agree more! OpenAI is certainly one to watch. This is just the beginning, folks. Buckle up!"
+ "And that’s a wrap for today’s episode on OpenAI's strategic moves in the AI arena! Until next time, stay tuned to PODCASTFY!"
\ No newline at end of file
diff --git a/docs/source/conf.py b/docs/source/conf.py
index 73f88f8..7060154 100644
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@@ -9,7 +9,7 @@
project = 'podcastfy'
copyright = '2024, Tharsis T. P. Souza'
author = 'Tharsis T. P. Souza'
-release = 'v0.2.10'
+release = 'v0.3.1'
# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
diff --git a/docs/source/podcastfy_demo.ipynb b/docs/source/podcastfy_demo.ipynb
deleted file mode 100644
index 3e78ff5..0000000
--- a/docs/source/podcastfy_demo.ipynb
+++ /dev/null
@@ -1,823 +0,0 @@
-{
- "cells": [
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "# Podcastfy \n",
- "Transforming Multimodal Content into Captivating Multilingual Audio Conversations with GenAI"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Table of Contents\n",
- "\n",
- "- [Setup](#setup)\n",
- "- [Getting Started](#getting-started)\n",
- "- [Generate a podcast from text](#generate-a-podcast-from-text)\n",
- " - [Single URL](#single-url)\n",
- " - [Multiple URLs](#multiple-urls)\n",
- " - [Generate transcript only](#generate-transcript-only)\n",
- " - [Generate audio from transcript](#generate-audio-from-transcript)\n",
- " - [Generate audio from PDF](#generate-audio-from-pdf)\n",
- "- [Generate podcast from images](#generate-podcast-from-images)\n",
- "- [Conversation Customization](#customization)\n",
- "- [Multilingual Support](#multilingual-support)\n",
- " - [French (fr)](#french-(fr))\n",
- " - [Portuguese (pt-br)](#portuguese-(pt-br))"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Setup\n",
- "\n",
- "Firstly, please make sure you have installed the podcastfy module, its dependencies and associated API keys. [See Setup](https://github.com/souzatharsis/podcastfy/tree/main?tab=readme-ov-file#quickstart-)."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Getting Started"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 9,
- "metadata": {},
- "outputs": [
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "/home/tobias/src/podcastfy-pypi/podcastfy/.venv/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
- " from .autonotebook import tqdm as notebook_tqdm\n"
- ]
- }
- ],
- "source": [
- "# Import necessary modules\n",
- "from podcastfy.client import generate_podcast"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "This is just a custom function we will use to embed audio in this Python notebook."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "%pip install ipython\n",
- "from IPython.display import Audio, display\n",
- "\n",
- "def embed_audio(audio_file):\n",
- "\t\"\"\"\n",
- "\tEmbeds an audio file in the notebook, making it playable.\n",
- "\n",
- "\tArgs:\n",
- "\t\taudio_file (str): Path to the audio file.\n",
- "\t\"\"\"\n",
- "\ttry:\n",
- "\t\tdisplay(Audio(audio_file))\n",
- "\t\tprint(f\"Audio player embedded for: {audio_file}\")\n",
- "\texcept Exception as e:\n",
- "\t\tprint(f\"Error embedding audio: {str(e)}\")"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Generate a podcast from text"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Single URL"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "This code demonstrates the process of generating a podcast from a single URL, in this case wikipedia's page on \"Podcast\":\n",
- "1. Extract content from the URL\n",
- "2. Generate a Q&A transcript from the extracted content\n",
- "3. Convert the transcript to speech Text-to-Speech model\n",
- "4. Save the generated audio file to data/audio"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 12,
- "metadata": {},
- "outputs": [
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "2024-10-05 09:50:03,308 - podcastfy.client - INFO - Processing 1 links\n"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "[(\"Welcome to Podcastfy - Your Personal GenAI Podcast! Uh, what's up everyone? You know, it's funny how we use this technology every day, but have you ever stopped to think about the history of podcasts?\", \"I know what you mean. It's like, they're just there, you know? We hit play and boom - instant entertainment or information. But, where did this all start?\"), ('Well, get this: the word \"podcast\" is actually a mashup of \"iPod\" and \"broadcast\"! Ben Hammersley, a journalist, first used it back in 2004.', \"Wow, 2004? That's way earlier than I would've guessed! But, weren't MP3 players around before that?\"), ('Totally! In fact, there was a company, i2Go, that offered a service kinda like podcasting back in 2000. It let people download news to their MP3 players. They were onto something, but it fizzled out quickly.', 'So, if that was happening in 2000, what really made podcasts take off later?'), ('It was a perfect storm of tech advancements. Apple launched iTunes with podcast support in 2005, which made listening SO much easier. That, plus cheaper recording tech and the rise of smartphones - it all just exploded from there.', 'That makes a lot of sense. It\\'s interesting, though, that you mentioned Apple was such a driving force. Weren\\'t there legal battles over the whole \"pod\" terminology?'), ('Oh yeah, big time. Apple got pretty aggressive going after companies using \"pod\" in their names, even sending out cease and desist letters. They claimed people associated \"pod\" so strongly with the iPod that it fell under their trademark. I mean, they even tried to trademark \"podcast\" itself!', 'Wow, really? Seems like a bit of a stretch, but I guess they wanted to protect their brand. So, aside from straight-up talk shows, what other types of podcasts have become popular?'), (\"Oh man, there's like, a whole universe of podcasts now! You've got fiction podcasts that are basically like audio dramas, complete with actors, sound effects, the works. There's also the enhanced podcasts that combine audio with slideshows - super cool for educational stuff. And then, you can't forget the video podcasts! It's wild how much it's evolved from those early days.\", 'Yeah, it really is amazing. And it seems like podcasting is only getting bigger. I mean, look at how many podcasts and episodes there are now!'), (\"For sure! And it's not just about listening anymore. Live shows are becoming huge, too! It's like a whole new way for creators to connect with audiences. Who knows what the future holds for podcasting, but I'm along for the ride!\", 'Me too! Until next time on Podcastfy - Your Personal GenAI Podcast.')]\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "2024-10-05 09:51:06,711 - podcastfy.client - INFO - Podcast generated successfully using openai TTS model\n"
- ]
- }
- ],
- "source": [
- "audio_file = generate_podcast(urls=[\"https://en.wikipedia.org/wiki/Podcast\"])"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 13,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/html": [
- "\n",
- " \n",
- " "
- ],
- "text/plain": [
- ""
- ]
- },
- "metadata": {},
- "output_type": "display_data"
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Audio player embedded for: ./data/audio/podcast_e1525fed48054896af5645c203138dca.mp3\n"
- ]
- }
- ],
- "source": [
- "# Embed the audio file generated from transcript\n",
- "embed_audio(audio_file)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "It works but it does not sound that exceptionally great! The default backend utilizes OpenAI's TTS model for speech generation. In the next example, we will utilize ElevenLabs model, which in my experience improves results dramatically."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Multiple URLs"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Here, we take one step further and generate a podcast from multiple sources.\n",
- "1. Podcastify's own github readme file\n",
- "3. A youtube video about Google's NotebookLM going viral"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 16,
- "metadata": {},
- "outputs": [
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "2024-10-05 10:06:04,940 - podcastfy.client - INFO - Processing 2 links\n",
- "2024-10-05 10:07:25,914 - podcastfy.client - INFO - Podcast generated successfully using elevenlabs TTS model\n"
- ]
- }
- ],
- "source": [
- "# Define multiple URLs to process\n",
- "urls = [\n",
- "\t\"https://github.com/souzatharsis/podcastfy/blob/main/README.md\",\n",
- "\t\"https://www.youtube.com/watch?v=jx2imp33glc\"\n",
- "]\n",
- "\n",
- "# Generate podcast from multiple URLs\n",
- "audio_file_multi = generate_podcast(\n",
- "\turls=urls,\n",
- " tts_model=\"elevenlabs\"\n",
- ")"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 17,
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Podcast generated and saved as: ./data/audio/podcast_829a531a20334c949f76e077b846cc7f.mp3\n"
- ]
- },
- {
- "data": {
- "text/html": [
- "\n",
- " \n",
- " "
- ],
- "text/plain": [
- ""
- ]
- },
- "metadata": {},
- "output_type": "display_data"
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Audio player embedded for: ./data/audio/podcast_829a531a20334c949f76e077b846cc7f.mp3\n"
- ]
- }
- ],
- "source": [
- "print(f\"Podcast generated and saved as: {audio_file_multi}\")\n",
- "\n",
- "# Embed the generated audio file\n",
- "embed_audio(audio_file_multi)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "This AI-generated transcript is interesting for a couple of reasons:\n",
- "\n",
- "- Realism: The transcript demonstrates the ability of AI to generate realistic, conversational dialogue. It includes elements like filler words (\"uh\", \"umm\"), casual language, and back-and-forth banter that mimic human conversation patterns.\n",
- "\n",
- "- Irony: There's an ironic element in that the transcript presents AI-generated characters expressing concern about the implications of AI-generated content on their own (fictional) careers as podcasters.\n",
- "\n",
- "- Ethical and legal concerns: The characters discuss potential implications of this technology, including copyright issues, voice replication without consent, and the impact on human content creators. This reflects real-world debates surrounding AI-generated content.\n",
- "\n",
- "- Meta-commentary: The podcast is a an AI-generated content discussion about AI-generated content, specifically AI-created podcasts. This creates an intriguing layer of self-reference, as an AI-generated conversation is discussing the capabilities of AI to generate conversations."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "However, this particular transcript did not pickup on my Podcastify's content solely focusing on the youtube video. This may happen as the AI-Podcast hosts may pick a particular concept from one of the provided sources and develop a conversation around that. There is room for improvement in guiding the AI-Podcasts hosts to strike a good balance of content coverage among the provided input sources."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Generate transcript only\n",
- "\n",
- "There is also the option to generate the transcript only from input urls. This would allow users to edit/process transcripts before further downstream audio generation."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 18,
- "metadata": {},
- "outputs": [
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "2024-10-05 10:15:06,561 - podcastfy.client - INFO - Processing 1 links\n",
- "2024-10-05 10:15:29,500 - podcastfy.client - INFO - Transcript generated successfully\n"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Transcript generated and saved as: ./data/transcripts/transcript_f6ab3ee241444e999ed4d1142564b9fe.txt\n",
- "First 20 characters of the transcript: \"Welcome t\n"
- ]
- }
- ],
- "source": [
- "# Generate transcript only\n",
- "transcript_file = generate_podcast(\n",
- "\turls=[\"https://github.com/souzatharsis/podcastfy/blob/main/README.md\"],\n",
- "\ttranscript_only=True\n",
- ")"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 19,
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Transcript generated and saved as: ./data/transcripts/transcript_f6ab3ee241444e999ed4d1142564b9fe.txt\n",
- "First 100 characters of the transcript: \"Welcome to Podcastfy - YOUR Personal GenAI Podcast! You know, the other day I was struggl\n"
- ]
- }
- ],
- "source": [
- "\n",
- "print(f\"Transcript generated and saved as: {transcript_file}\")\n",
- "# Read and print the first 20 characters from the transcript file\n",
- "with open(transcript_file, 'r') as file:\n",
- "\ttranscript_content = file.read(100)\n",
- "\tprint(f\"First 100 characters of the transcript: {transcript_content}\")"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Generate audio from transcript"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Users can also generate audio from a given transcript. Here, we generate a podcast from the previsouly generate transcript on wikipedia's Artificial Intelligence page. This allows users to re-use previsouly generated transcripts or provide their own custom produced transcript for podcast generation."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 23,
- "metadata": {},
- "outputs": [
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "2024-10-05 10:28:37,745 - podcastfy.client - INFO - Using transcript file: ./data/transcripts/transcript_f6ab3ee241444e999ed4d1142564b9fe.txt\n",
- "2024-10-05 10:30:17,300 - podcastfy.client - INFO - Podcast generated successfully using elevenlabs TTS model\n"
- ]
- }
- ],
- "source": [
- "# Generate podcast from existing transcript file\n",
- "audio_file_from_transcript = generate_podcast(\n",
- "\ttranscript_file=transcript_file,\n",
- " tts_model=\"elevenlabs\"\n",
- ")"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 24,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/html": [
- "\n",
- " \n",
- " "
- ],
- "text/plain": [
- ""
- ]
- },
- "metadata": {},
- "output_type": "display_data"
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Audio player embedded for: ./data/audio/podcast_c06620d918d4419884f9c7558a4a2cf1.mp3\n"
- ]
- }
- ],
- "source": [
- "# Embed the audio file generated from transcript\n",
- "embed_audio(audio_file_from_transcript)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Generate audio from PDF\n",
- "One or many pdfs can be processed in the same way as urls by simply passing a corresponding file path."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "audio_file_from_pdf = generate_podcast(urls=[\"./data/pdf/s41598-024-58826-w.pdf\"])"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "This is a Scientific Reports about climate change in France. We have it pre-generated into our data directory. Let's listen to the podcast:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 7,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/html": [
- "\n",
- " \n",
- " "
- ],
- "text/plain": [
- ""
- ]
- },
- "metadata": {},
- "output_type": "display_data"
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Audio player embedded for: ./data/audio/Agro_paper.mp3\n"
- ]
- }
- ],
- "source": [
- "file_path = \"./data/audio/Agro_paper.mp3\"\n",
- "# Embed the audio file generated from transcript\n",
- "embed_audio(file_path)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Generate podcast from images\n",
- "\n",
- "Images can be provided as input to generate a podcast. This can be useful when users want to generate a podcast from images such as works of art, physical spaces, historical events, etc. One or many images can be provided as input. The following example generates a podcast from two images: Senecio, 1922 (Paul Klee) and Connection of Civilizations (2017) by Gheorghe Virtosu.\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 11,
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "[('\"Welcome to PODCASTFY - Your Personal Generative AI Podcast. Today, we\\'re diving into the vibrant world of abstract art! Buckle up!\"', '\"I\\'m all ears! Abstract art can be so captivating, but also a bit puzzling sometimes. What kind of abstract pieces are we looking at today?\"'), ('\"Well, imagine a canvas bathed in warm, almost fiery, orange hues. On this canvas, we see a circular face, divided into sections like a carefully pieced-together puzzle. The eyes are striking - bright red dots that seem to stare right at you. It\\'s geometric, yet full of emotion. That\\'s our first piece.\"', '\"Wow, I can practically feel the energy radiating from that description! It sounds like the artist used simple shapes and colors to create something incredibly powerful. What about the second piece?\"'), ('\"Ah, the second one is a whole other story! Imagine the same vibrant orange, but this time, it\\'s like a wild dance of brushstrokes, a whirlwind of texture. The figure here is more abstract, with jagged lines, bold shapes, and a single blue eye peering out from the chaos. It\\'s dynamic, almost chaotic, but undeniably captivating.\"', '\"It\\'s fascinating how both pieces use a similar color palette but evoke completely different feelings. The first one sounds almost serene in its geometric precision, while the second one sounds like it\\'s bursting with raw energy. It really shows the range of abstract art, doesn\\'t it?\"'), ('\"Absolutely! And that\\'s the beauty of it, isn\\'t it? Abstract art invites us to interpret, to feel, to connect with the emotions the artist is conveying through color, shape, and form. It\\'s a conversation between the artist and the viewer, with no right or wrong answers.\"', '\"I totally agree! It\\'s like a visual puzzle that each person gets to solve in their own way. No wonder abstract art continues to fascinate and inspire people all over the world.\"'), ('\"That\\'s all the time we have for today. Thanks for tuning in to PODCASTFY. Until next time, keep exploring the fascinating world of art!\"', 'Bye Bye!')]\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "2024-10-12 18:33:15,834 - podcastfy.client - INFO - Podcast generated successfully using openai TTS model\n"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Podcast generated from images: ./data/audio/podcast_9e4e617c7ab546ada4f103521a330468.mp3\n"
- ]
- }
- ],
- "source": [
- "# Generate podcast from input images\n",
- "image_paths = [\n",
- "\t\"./data/images/Senecio.jpeg\",\n",
- "\t\"./data/images/connection.jpg\"\n",
- "]\n",
- "\n",
- "audio_file_from_images = generate_podcast(image_paths=image_paths)\n",
- "\n",
- "print(\"Podcast generated from images:\", audio_file_from_images)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Here is the generated podcast, which we have pre-saved in the data directory."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 3,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/html": [
- "\n",
- " \n",
- " "
- ],
- "text/plain": [
- ""
- ]
- },
- "metadata": {},
- "output_type": "display_data"
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Audio player embedded for: ../../data/audio/abstract_art.mp3\n"
- ]
- }
- ],
- "source": [
- "# Embed the audio file generated from images\n",
- "embed_audio(\"data/audio/abstract_art.mp3\")"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Customization\n",
- "\n",
- "Podcastfy offers a range of customization options to tailor your AI-generated podcasts. Whether you're creating educational content, storytelling experiences, or anything in between, these configuration options allow you to fine-tune your podcast's tone, length, and format.\n",
- "See [Conversation Configuration](usage/conversation_custom.md) for more details.\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 2,
- "metadata": {},
- "outputs": [
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "2024-10-10 02:19:01,046 - podcastfy.client - INFO - Processing 2 links\n"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "[('\"Welcome to Tech Crossroads - Where Innovation Meets Scrutiny! Today, we\\'re diving deep into the fascinating world of artificial intelligence, or AI as it\\'s more commonly known. Uh, it\\'s a field that\\'s been making waves for decades, and now, it\\'s really starting to impact our everyday lives in ways we never imagined.\"', '\"I agree, AI is everywhere these days. From the smartphones in our pockets to the algorithms that curate our news feeds, it\\'s becoming increasingly difficult to escape its influence. But while AI offers incredible potential, I can\\'t help but feel a sense of unease about its rapid development. It\\'s like we\\'re opening Pandora\\'s Box, and we\\'re not entirely sure what we\\'ll find inside.\"'), ('\"I see your point. AI does raise some serious ethical concerns, and it\\'s crucial that we address them proactively. But let\\'s not forget the incredible benefits AI brings to the table. Think about the advancements in healthcare, where AI is helping doctors diagnose diseases earlier and more accurately. Or in transportation, where self-driving cars have the potential to reduce accidents and save lives.\"', '\"Those are valid points, but I\\'m still wary of the potential downsides. One of my biggest concerns is the issue of algorithmic bias. We\\'ve already seen instances where AI systems have perpetuated existing societal biases, leading to discrimination against certain groups. For example, facial recognition algorithms have been shown to be less accurate for people with darker skin tones, which could have serious implications for law enforcement and security.\"'), ('\"Interesting. You\\'re right, algorithmic bias is a significant problem, and it\\'s something that needs to be addressed head-on. The good news is that researchers are actively working on developing techniques to mitigate bias in AI systems. For instance, they\\'re exploring ways to ensure that training data is more representative of diverse populations and that algorithms are designed to be more fair and equitable.\"', '\"I\\'m glad to hear that, but I\\'m also concerned about the lack of transparency in many AI systems. Often, even the developers themselves don\\'t fully understand how these complex algorithms work. This makes it difficult to identify and correct biases, and it raises questions about accountability when things go wrong.\"'), ('\"Got it. Transparency is indeed crucial, and there\\'s a growing movement towards developing explainable AI, where the decision-making processes of AI systems are more understandable to humans. This will not only help us identify and address biases but also build trust in AI technology.\"', '\"Another concern I have is the potential for AI to exacerbate existing inequalities. As AI becomes more sophisticated, it could automate a wide range of jobs, potentially leading to mass unemployment and widening the gap between the rich and the poor.\"'), ('\"I understand your concern about technological unemployment. It\\'s a valid point, and it\\'s something that policymakers need to consider seriously. However, history has shown that technological advancements often create new jobs, even as they displace old ones. The key is to ensure that workers have the skills and training they need to adapt to the changing job market.\"', '\"That\\'s true, but this time feels different. AI has the potential to automate not just manual labor but also cognitive tasks that were once thought to be the exclusive domain of humans. This could have a profound impact on the job market, and we need to be prepared for the challenges it presents.\"'), ('\"You raise a valid point. The nature of work is undoubtedly changing, and we need to adapt our education and training systems to prepare people for the jobs of the future. This includes fostering skills such as critical thinking, creativity, and problem-solving, which are less likely to be automated.\"', '\"Beyond the economic implications, I\\'m also concerned about the potential for AI to be used for malicious purposes. Imagine AI-powered surveillance systems that track our every move or autonomous weapons that can kill without human intervention. These are terrifying possibilities that we need to guard against.\"'), ('\"I agree, the potential for AI to be weaponized is a serious concern. That\\'s why it\\'s crucial that we develop international regulations and ethical guidelines for the development and use of AI, especially in sensitive areas like military applications.\"', '\"I\\'m glad to hear that, but I\\'m not sure if regulations alone will be enough. We also need to foster a culture of responsible AI development, where ethics are considered from the very beginning of the design process.\"'), ('\"Absolutely. We need to ensure that AI is developed and used in a way that benefits humanity as a whole, not just a select few. This requires a multi-faceted approach, involving researchers, policymakers, industry leaders, and the public.\"', '\"One final thought: as AI becomes more powerful, it raises fundamental questions about what it means to be human. If machines can think, learn, and even create, what does that say about our own unique abilities and our place in the world?\"'), ('\"That\\'s a profound question, and one that philosophers have been grappling with for centuries. As AI continues to evolve, it will undoubtedly challenge our understanding of ourselves and our relationship with technology. It\\'s a journey that will require careful consideration, open dialogue, and a commitment to shaping a future where AI serves humanity, not the other way around.\"', 'Tchau!')]\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "2024-10-10 02:21:30,016 - podcastfy.client - INFO - Podcast generated successfully using openai TTS model\n"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Tech Debate Podcast generated: ('./data/transcripts/transcript_7e84bb13b26f4ab78dda30d04d461838.txt', './data/audio/podcast_c8f53545fefd44569dbebd4fa739e2b9.mp3')\n"
- ]
- }
- ],
- "source": [
- "# Example: In-depth Tech Debate Podcast\n",
- "\n",
- "# Define a custom conversation config for a tech debate podcast\n",
- "tech_debate_config = {\n",
- " \"word_count\": 4000, # Longer content for in-depth discussions\n",
- " \"conversation_style\": [\"analytical\", \"argumentative\"],\n",
- " \"roles_person1\": \"tech optimist\",\n",
- " \"roles_person2\": \"tech skeptic\",\n",
- " \"dialogue_structure\": [\"Topic Introduction\", \"Pro Arguments\", \"Con Arguments\", \"Rebuttal\", \"Audience Questions\", \"Conclusion\"],\n",
- " \"podcast_name\": \"Tech Crossroads\",\n",
- " \"podcast_tagline\": \"Where Innovation Meets Scrutiny\",\n",
- " \"output_language\": \"English\",\n",
- " \"engagement_techniques\": [\"statistics\", \"case studies\", \"ethical dilemmas\"],\n",
- " \"creativity\": 0.3 # Lower creativity for more factual content\n",
- "}\n",
- "\n",
- "# Generate a tech debate podcast about artificial intelligence\n",
- "tech_debate_podcast = generate_podcast(\n",
- " urls=[\"https://en.wikipedia.org/wiki/Artificial_intelligence\", \n",
- " \"https://en.wikipedia.org/wiki/Ethics_of_artificial_intelligence\"],\n",
- " conversation_config=tech_debate_config,\n",
- " tts_model=\"openai\" # Using OpenAI for clear, neutral voices\n",
- ")\n",
- "\n",
- "print(\"Tech Debate Podcast generated:\", tech_debate_podcast)\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 8,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/html": [
- "\n",
- " \n",
- " "
- ],
- "text/plain": [
- ""
- ]
- },
- "metadata": {},
- "output_type": "display_data"
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Audio player embedded for: ./data/audio/podcast_c8f53545fefd44569dbebd4fa739e2b9.mp3\n"
- ]
- }
- ],
- "source": [
- "file_path = \"./data/audio/podcast_c8f53545fefd44569dbebd4fa739e2b9.mp3\"\n",
- "# Embed the audio file generated from transcript\n",
- "embed_audio(file_path)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Multilingual Support\n",
- "\n",
- "Description of how to generate non-English content TBD. See [Notes of Caution](https://github.com/souzatharsis/podcastfy/blob/main/usage/conversation_custom.md#notes-of-caution) before starting to customize to avoid unexpected results. For now, here are a couple of audio examples:"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### French (fr)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Generates a podcast from about [AgroClim website](https://agroclim.inrae.fr/) - French Government's service unit that aims to study the climate and its impacts on agroecosystems."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 6,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/html": [
- "\n",
- " \n",
- " "
- ],
- "text/plain": [
- ""
- ]
- },
- "metadata": {},
- "output_type": "display_data"
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Audio player embedded for: ./data/audio/podcast_FR_AGRO.mp3\n"
- ]
- }
- ],
- "source": [
- "embed_audio(\"./data/audio/podcast_FR_AGRO.mp3\")"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Portuguese (pt-br)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Generates a podcast in Brazilian Portuguese from a news article on the most recent voting polls on [Sao Paulo's 2024 Elections](https://noticias.uol.com.br/eleicoes/2024/10/03/nova-pesquisa-datafolha-quem-subiu-e-quem-caiu-na-disputa-de-sp-03-10.htm)."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 5,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/html": [
- "\n",
- " \n",
- " "
- ],
- "text/plain": [
- ""
- ]
- },
- "metadata": {},
- "output_type": "display_data"
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Audio player embedded for: ./data/audio/podcast_thatupiso_BR.mp3\n"
- ]
- }
- ],
- "source": [
- "embed_audio(\"./data/audio/podcast_thatupiso_BR.mp3\")"
- ]
- }
- ],
- "metadata": {
- "kernelspec": {
- "display_name": ".venv",
- "language": "python",
- "name": "python3"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.11.10"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
diff --git a/podcastfy.ipynb b/podcastfy.ipynb
index b007e97..bd534b4 100644
--- a/podcastfy.ipynb
+++ b/podcastfy.ipynb
@@ -16,13 +16,14 @@
"\n",
"- Setup\n",
"- Getting Started\n",
- "- Generate a podcast from text\n",
+ "- Generate a podcast from text content\n",
" - Single URL\n",
" - Multiple URLs\n",
" - Generate transcript only\n",
" - Generate audio from transcript\n",
- " - Generate audio from PDF\n",
- " - Generate podcast from raw text\n",
+ " - Processing PDFs\n",
+ " - Raw text as input\n",
+ " - Podcast from topic\n",
"- Generate podcast from images\n",
"- Conversation Customization\n",
"- Multilingual Support\n",
@@ -49,18 +50,9 @@
},
{
"cell_type": "code",
- "execution_count": 1,
+ "execution_count": null,
"metadata": {},
- "outputs": [
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "/home/tobias/src/podcastfy-pypi/podcastfy/.venv/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
- " from .autonotebook import tqdm as notebook_tqdm\n"
- ]
- }
- ],
+ "outputs": [],
"source": [
"# Import necessary modules\n",
"from podcastfy.client import generate_podcast"
@@ -75,34 +67,9 @@
},
{
"cell_type": "code",
- "execution_count": 4,
+ "execution_count": null,
"metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Requirement already satisfied: ipython in ./.venv/lib/python3.11/site-packages (8.28.0)\n",
- "Requirement already satisfied: decorator in ./.venv/lib/python3.11/site-packages (from ipython) (5.1.1)\n",
- "Requirement already satisfied: jedi>=0.16 in ./.venv/lib/python3.11/site-packages (from ipython) (0.19.1)\n",
- "Requirement already satisfied: matplotlib-inline in ./.venv/lib/python3.11/site-packages (from ipython) (0.1.7)\n",
- "Requirement already satisfied: prompt-toolkit<3.1.0,>=3.0.41 in ./.venv/lib/python3.11/site-packages (from ipython) (3.0.48)\n",
- "Requirement already satisfied: pygments>=2.4.0 in ./.venv/lib/python3.11/site-packages (from ipython) (2.18.0)\n",
- "Requirement already satisfied: stack-data in ./.venv/lib/python3.11/site-packages (from ipython) (0.6.3)\n",
- "Requirement already satisfied: traitlets>=5.13.0 in ./.venv/lib/python3.11/site-packages (from ipython) (5.14.3)\n",
- "Requirement already satisfied: typing-extensions>=4.6 in ./.venv/lib/python3.11/site-packages (from ipython) (4.12.2)\n",
- "Requirement already satisfied: pexpect>4.3 in ./.venv/lib/python3.11/site-packages (from ipython) (4.9.0)\n",
- "Requirement already satisfied: parso<0.9.0,>=0.8.3 in ./.venv/lib/python3.11/site-packages (from jedi>=0.16->ipython) (0.8.4)\n",
- "Requirement already satisfied: ptyprocess>=0.5 in ./.venv/lib/python3.11/site-packages (from pexpect>4.3->ipython) (0.7.0)\n",
- "Requirement already satisfied: wcwidth in ./.venv/lib/python3.11/site-packages (from prompt-toolkit<3.1.0,>=3.0.41->ipython) (0.2.13)\n",
- "Requirement already satisfied: executing>=1.2.0 in ./.venv/lib/python3.11/site-packages (from stack-data->ipython) (2.1.0)\n",
- "Requirement already satisfied: asttokens>=2.1.0 in ./.venv/lib/python3.11/site-packages (from stack-data->ipython) (2.4.1)\n",
- "Requirement already satisfied: pure-eval in ./.venv/lib/python3.11/site-packages (from stack-data->ipython) (0.2.3)\n",
- "Requirement already satisfied: six>=1.12.0 in ./.venv/lib/python3.11/site-packages (from asttokens>=2.1.0->stack-data->ipython) (1.16.0)\n",
- "Note: you may need to restart the kernel to use updated packages.\n"
- ]
- }
- ],
+ "outputs": [],
"source": [
"%pip install ipython\n",
"from IPython.display import Audio, display\n",
@@ -523,43 +490,103 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "## Generate podcast from images\n",
+ "## Generate podcast from topic\n",
"\n",
- "Images can be provided as input to generate a podcast. This can be useful when users want to generate a podcast from images such as works of art, physical spaces, historical events, etc. One or many images can be provided as input. The following example generates a podcast from two images: Senecio, 1922 (Paul Klee) and Connection of Civilizations (2017) by Gheorghe Virtosu.\n"
+ "Users can also generate a podcast from a specific topic of interest, e.g. \"Latest News in U.S. Politics\" or \"Modern art in the 1920s\". Podcastfy will generate a podcast based on *grounded* real-time information about the most recent content published on the web about the topic."
]
},
{
"cell_type": "code",
- "execution_count": 11,
+ "execution_count": 3,
"metadata": {},
"outputs": [
{
- "name": "stdout",
+ "name": "stderr",
"output_type": "stream",
"text": [
- "[('\"Welcome to PODCASTFY - Your Personal Generative AI Podcast. Today, we\\'re diving into the vibrant world of abstract art! Buckle up!\"', '\"I\\'m all ears! Abstract art can be so captivating, but also a bit puzzling sometimes. What kind of abstract pieces are we looking at today?\"'), ('\"Well, imagine a canvas bathed in warm, almost fiery, orange hues. On this canvas, we see a circular face, divided into sections like a carefully pieced-together puzzle. The eyes are striking - bright red dots that seem to stare right at you. It\\'s geometric, yet full of emotion. That\\'s our first piece.\"', '\"Wow, I can practically feel the energy radiating from that description! It sounds like the artist used simple shapes and colors to create something incredibly powerful. What about the second piece?\"'), ('\"Ah, the second one is a whole other story! Imagine the same vibrant orange, but this time, it\\'s like a wild dance of brushstrokes, a whirlwind of texture. The figure here is more abstract, with jagged lines, bold shapes, and a single blue eye peering out from the chaos. It\\'s dynamic, almost chaotic, but undeniably captivating.\"', '\"It\\'s fascinating how both pieces use a similar color palette but evoke completely different feelings. The first one sounds almost serene in its geometric precision, while the second one sounds like it\\'s bursting with raw energy. It really shows the range of abstract art, doesn\\'t it?\"'), ('\"Absolutely! And that\\'s the beauty of it, isn\\'t it? Abstract art invites us to interpret, to feel, to connect with the emotions the artist is conveying through color, shape, and form. It\\'s a conversation between the artist and the viewer, with no right or wrong answers.\"', '\"I totally agree! It\\'s like a visual puzzle that each person gets to solve in their own way. No wonder abstract art continues to fascinate and inspire people all over the world.\"'), ('\"That\\'s all the time we have for today. Thanks for tuning in to PODCASTFY. Until next time, keep exploring the fascinating world of art!\"', 'Bye Bye!')]\n"
+ "2024-11-07 13:52:07,107 - podcastfy.client - INFO - Podcast generated successfully using openai TTS model\n"
]
+ }
+ ],
+ "source": [
+ "audio_file_from_topic = generate_podcast(topic=\"Latest news about OpenAI\", tts_model=\"gemini\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
},
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Audio player embedded for: ./data/audio/podcast_e9ef119af37c45b6abd8326cb382b3b0.mp3\n"
+ ]
+ }
+ ],
+ "source": [
+ "embed_audio(audio_file_from_topic)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The generate conversation captures the rapid pace of OpenAI's developments as of today (11/07/2024) including their $15M acquisition of Chat.com and the launch of new products like Canvas and SimpleQA."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Generate podcast from images\n",
+ "\n",
+ "Images can be provided as input to generate a podcast. This can be useful when users want to generate a podcast from images such as works of art, physical spaces, historical events, etc. One or many images can be provided as input. The following example generates a podcast from two images: Senecio, 1922 (Paul Klee) and Connection of Civilizations (2017) by Gheorghe Virtosu.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
- "2024-10-12 18:33:15,834 - podcastfy.client - INFO - Podcast generated successfully using openai TTS model\n"
+ "2024-11-07 14:26:53,017 - podcastfy.client - INFO - Podcast generated successfully using openai TTS model\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
- "Podcast generated from images: ./data/audio/podcast_9e4e617c7ab546ada4f103521a330468.mp3\n"
+ "Podcast generated from images: ./data/audio/podcast_9032999e59384861b48c3bbc915118f4.mp3\n"
]
}
],
"source": [
"# Generate podcast from input images\n",
"image_paths = [\n",
- "\t\"./data/images/Senecio.jpeg\",\n",
- "\t\"./data/images/connection.jpg\"\n",
+ " \"https://raw.githubusercontent.com/souzatharsis/podcastfy/refs/heads/main/data/images/Senecio.jpeg\",\n",
+ " \"https://raw.githubusercontent.com/souzatharsis/podcastfy/refs/heads/main/data/images/connection.jpg\",\n",
"]\n",
"\n",
"audio_file_from_images = generate_podcast(image_paths=image_paths)\n",
diff --git a/podcastfy/__init__.py b/podcastfy/__init__.py
index 9de7a5d..ddde97c 100644
--- a/podcastfy/__init__.py
+++ b/podcastfy/__init__.py
@@ -1,2 +1,2 @@
# This file can be left empty for now
-__version__ = "0.3.0" # or whatever version you're on
+__version__ = "0.3.1" # or whatever version you're on
diff --git a/podcastfy/client.py b/podcastfy/client.py
index e00a34d..3c955c0 100644
--- a/podcastfy/client.py
+++ b/podcastfy/client.py
@@ -28,17 +28,18 @@
def process_content(
- urls=None,
- transcript_file=None,
- tts_model="edge",
- generate_audio=True,
- config=None,
+ urls: Optional[List[str]] = None,
+ transcript_file: Optional[str] = None,
+ tts_model: Optional[str] = None,
+ generate_audio: bool = True,
+ config: Optional[Dict[str, Any]] = None,
conversation_config: Optional[Dict[str, Any]] = None,
image_paths: Optional[List[str]] = None,
is_local: bool = False,
text: Optional[str] = None,
model_name: Optional[str] = None,
api_key_label: Optional[str] = None,
+ topic: Optional[str] = None,
):
"""
Process URLs, a transcript file, image paths, or raw text to generate a podcast or transcript.
@@ -68,16 +69,21 @@ def process_content(
)
combined_content = ""
+ if urls or topic:
+ content_extractor = ContentExtractor()
if urls:
logger.info(f"Processing {len(urls)} links")
- content_extractor = ContentExtractor()
contents = [content_extractor.extract_content(link) for link in urls]
combined_content += "\n\n".join(contents)
if text:
combined_content += f"\n\n{text}"
+ if topic:
+ topic_content = content_extractor.generate_topic_content(topic)
+ combined_content += f"\n\n{topic_content}"
+
# Generate Q&A content using output directory from conversation config
random_filename = f"transcript_{uuid.uuid4().hex}.txt"
transcript_filepath = os.path.join(
@@ -162,6 +168,9 @@ def main(
api_key_label: str = typer.Option(
None, "--api-key-label", "-k", help="Environment variable name for LLMAPI key"
),
+ topic: str = typer.Option(
+ None, "--topic", "-tp", help="Topic to generate podcast about"
+ ),
):
"""
Generate a podcast or transcript from a list of URLs, a file containing URLs, a transcript file, image files, or raw text.
@@ -194,15 +203,16 @@ def main(
text=text,
model_name=llm_model_name,
api_key_label=api_key_label,
+ topic=topic,
)
else:
urls_list = urls or []
if file:
urls_list.extend([line.strip() for line in file if line.strip()])
- if not urls_list and not image_paths and not text:
+ if not urls_list and not image_paths and not text and not topic:
raise typer.BadParameter(
- "No input provided. Use --url to specify URLs, --file to specify a file containing URLs, --transcript for a transcript file, --image for image files, or --text for raw text input."
+ "No input provided. Use --url, --file, --transcript, --image, --text, or --topic."
)
final_output = process_content(
@@ -216,6 +226,7 @@ def main(
text=text,
model_name=llm_model_name,
api_key_label=api_key_label,
+ topic=topic,
)
if transcript_only:
@@ -247,6 +258,7 @@ def generate_podcast(
text: Optional[str] = None,
llm_model_name: Optional[str] = None,
api_key_label: Optional[str] = None,
+ topic: Optional[str] = None,
) -> Optional[str]:
"""
Generate a podcast or transcript from a list of URLs, a file containing URLs, a transcript file, or image files.
@@ -264,6 +276,7 @@ def generate_podcast(
text (Optional[str]): Raw text input to be processed.
llm_model_name (Optional[str]): LLM model name for content generation.
api_key_label (Optional[str]): Environment variable name for LLM API key.
+ topic (Optional[str]): Topic to generate podcast about.
Returns:
Optional[str]: Path to the final podcast audio file, or None if only generating a transcript.
@@ -310,6 +323,7 @@ def generate_podcast(
text=text,
model_name=llm_model_name,
api_key_label=api_key_label,
+ topic=topic,
)
else:
urls_list = urls or []
@@ -317,9 +331,10 @@ def generate_podcast(
with open(url_file, "r") as file:
urls_list.extend([line.strip() for line in file if line.strip()])
- if not urls_list and not image_paths and not text:
+ if not urls_list and not image_paths and not text and not topic:
raise ValueError(
- "No input provided. Please provide either 'urls', 'url_file', 'transcript_file', 'image_paths', or 'text'."
+ "No input provided. Please provide either 'urls', 'url_file', "
+ "'transcript_file', 'image_paths', 'text', or 'topic'."
)
return process_content(
@@ -333,6 +348,7 @@ def generate_podcast(
text=text,
model_name=llm_model_name,
api_key_label=api_key_label,
+ topic=topic,
)
except Exception as e:
diff --git a/podcastfy/content_generator.py b/podcastfy/content_generator.py
index 23e7d4c..760bb6e 100644
--- a/podcastfy/content_generator.py
+++ b/podcastfy/content_generator.py
@@ -50,16 +50,20 @@ def __init__(
if is_local:
self.llm = Llamafile()
- elif "gemini" in self.model_name.lower(): #keeping original gemini as a special case while we build confidence on LiteLLM
+ elif (
+ "gemini" in self.model_name.lower()
+ ): # keeping original gemini as a special case while we build confidence on LiteLLM
self.llm = ChatGoogleGenerativeAI(
model=model_name,
temperature=temperature,
max_output_tokens=max_output_tokens,
)
- else: # user should set api_key_label from input
- self.llm = ChatLiteLLM(model=self.model_name,
- temperature=temperature,
- api_key=os.environ[api_key_label])
+ else: # user should set api_key_label from input
+ self.llm = ChatLiteLLM(
+ model=self.model_name,
+ temperature=temperature,
+ api_key=os.environ[api_key_label],
+ )
class ContentGenerator:
@@ -114,7 +118,7 @@ def __compose_prompt(self, num_images: int):
for i in range(num_images):
key = f"image_path_{i}"
image_content = {
- "image_url": {"path": f"{{{key}}}", "detail": "high"},
+ "image_url": {"url": f"{{{key}}}", "detail": "high"},
"type": "image_url",
}
image_path_keys.append(key)
@@ -224,7 +228,7 @@ def generate_qa_content(
output_filepath: Optional[str] = None,
is_local: bool = False,
model_name: str = None,
- api_key_label: str = "OPENAI_API_KEY"
+ api_key_label: str = "OPENAI_API_KEY",
) -> str:
"""
Generate Q&A content based on input texts.
@@ -248,7 +252,7 @@ def generate_qa_content(
)
if is_local:
model_name = "User provided local model"
-
+
llmbackend = LLMBackend(
is_local=is_local,
temperature=self.config_conversation.get("creativity", 0),
@@ -256,7 +260,7 @@ def generate_qa_content(
"max_output_tokens", 8192
),
model_name=model_name,
- api_key_label=api_key_label
+ api_key_label=api_key_label,
)
num_images = 0 if is_local else len(image_file_paths)
@@ -287,48 +291,44 @@ def generate_qa_content(
logger.error(f"Error generating content: {str(e)}")
raise
-
- def __clean_tss_markup(self, input_text: str, additional_tags: List[str] = ["Person1", "Person2"]) -> str:
+ def __clean_tss_markup(
+ self, input_text: str, additional_tags: List[str] = ["Person1", "Person2"]
+ ) -> str:
"""
Remove unsupported TSS markup tags from the input text while preserving supported SSML tags.
Args:
input_text (str): The input text containing TSS markup tags.
- additional_tags (List[str]): Optional list of additional tags to preserve. Defaults to ["Person1", "Person2"].
+ additional_tags (List[str]): Optional list of additional tags to preserve. Defaults to ["Person1", "Person2"].
- Returns:
- str: Cleaned text with unsupported TSS markup tags removed.
- """
+ Returns:
+ str: Cleaned text with unsupported TSS markup tags removed.
+ """
# List of SSML tags supported by both OpenAI and ElevenLabs
- supported_tags = [
- "speak", "lang", "p", "phoneme",
- "s", "sub"
- ]
+ supported_tags = ["speak", "lang", "p", "phoneme", "s", "sub"]
# Append additional tags to the supported tags list
supported_tags.extend(additional_tags)
# Create a pattern that matches any tag not in the supported list
- pattern = r'?(?!(?:' + '|'.join(supported_tags) + r')\b)[^>]+>'
+ pattern = r"?(?!(?:" + "|".join(supported_tags) + r")\b)[^>]+>"
# Remove unsupported tags
- cleaned_text = re.sub(pattern, '', input_text)
+ cleaned_text = re.sub(pattern, "", input_text)
# Remove any leftover empty lines
- cleaned_text = re.sub(r'\n\s*\n', '\n', cleaned_text)
+ cleaned_text = re.sub(r"\n\s*\n", "\n", cleaned_text)
# Ensure closing tags for additional tags are preserved
for tag in additional_tags:
cleaned_text = re.sub(
f'<{tag}>(.*?)(?=<(?:{"|".join(additional_tags)})>|$)',
- f'<{tag}>\\1{tag}>',
+ f"<{tag}>\\1{tag}>",
cleaned_text,
- flags=re.DOTALL
+ flags=re.DOTALL,
)
- return cleaned_text.replace('(scratchpad)', '').strip()
-
-
+ return cleaned_text.replace("(scratchpad)", "").strip()
def main(seed: int = 42, is_local: bool = False) -> None:
@@ -375,4 +375,4 @@ def main(seed: int = 42, is_local: bool = False) -> None:
if __name__ == "__main__":
- main()
\ No newline at end of file
+ main()
diff --git a/podcastfy/content_parser/content_extractor.py b/podcastfy/content_parser/content_extractor.py
index cf8eb8a..06966ac 100644
--- a/podcastfy/content_parser/content_extractor.py
+++ b/podcastfy/content_parser/content_extractor.py
@@ -74,6 +74,29 @@ def extract_content(self, source: str) -> str:
except Exception as e:
logger.error(f"Error extracting content from {source}: {str(e)}")
raise
+
+ def generate_topic_content(self, topic: str) -> str:
+ """
+ Generate content based on a given topic using a generative model.
+
+ Args:
+ topic (str): The topic to generate content for.
+
+ Returns:
+ str: Generated content based on the topic.
+ """
+ try:
+ import google.generativeai as genai
+
+ model = genai.GenerativeModel('models/gemini-1.5-pro-002')
+ topic_prompt = f'Be detailed. Search for {topic}'
+ response = model.generate_content(contents=topic_prompt, tools='google_search_retrieval')
+
+ return response.candidates[0].content.parts[0].text
+ except Exception as e:
+ logger.error(f"Error generating content for topic '{topic}': {str(e)}")
+ raise
+
def main(seed: int = 42) -> None:
"""
diff --git a/podcastfy/text_to_speech.py b/podcastfy/text_to_speech.py
index 2e9045b..1e1737f 100644
--- a/podcastfy/text_to_speech.py
+++ b/podcastfy/text_to_speech.py
@@ -45,7 +45,9 @@ def __init__(
api_key = getattr(self.config, f"{model.upper()}_API_KEY", None)
# Initialize provider using factory
- self.provider = TTSProviderFactory.create(provider_name=model, api_key=api_key, model=model)
+ self.provider = TTSProviderFactory.create(
+ provider_name=model, api_key=api_key, model=model
+ )
# Setup directories and config
self._setup_directories()
@@ -80,44 +82,51 @@ def convert_to_speech(self, text: str, output_file: str) -> None:
Args:
text (str): Input text to convert to speech.
output_file (str): Path to save the output audio file.
-
+
Raises:
ValueError: If the input text is not properly formatted
"""
# Validate transcript format
- #self._validate_transcript_format(text)
-
+ # self._validate_transcript_format(text)
+
cleaned_text = text
-
try:
- if self.provider.model.lower() == "gemini": # refactor this ugly if statement. We should have multispeaker and single speaker classes
- #provider_config = self._get_provider_config()
- #voice = provider_config.get("default_voices", {}).get("question")
- #voice2 = provider_config.get("default_voices", {}).get("answer")
- #model = provider_config.get("model")
- audio_data = self.provider.generate_audio(cleaned_text,
- voice="S",
- model="en-US-Studio-MultiSpeaker",
- voice2="R",
- ending_message=self.ending_message)
+ if (
+ self.provider.model.lower() == "gemini"
+ ): # refactor this ugly if statement. We should have multispeaker and single speaker classes
+ # provider_config = self._get_provider_config()
+ # voice = provider_config.get("default_voices", {}).get("question")
+ # voice2 = provider_config.get("default_voices", {}).get("answer")
+ # model = provider_config.get("model")
+ audio_data = self.provider.generate_audio(
+ cleaned_text,
+ voice="S",
+ model="en-US-Studio-MultiSpeaker",
+ voice2="R",
+ ending_message=self.ending_message,
+ )
with open(output_file, "wb") as f:
f.write(audio_data)
logger.info(f"Audio saved to {output_file}")
else:
with tempfile.TemporaryDirectory(dir=self.temp_audio_dir) as temp_dir:
- audio_segments = self._generate_audio_segments(cleaned_text, temp_dir)
+ audio_segments = self._generate_audio_segments(
+ cleaned_text, temp_dir
+ )
self._merge_audio_files(audio_segments, output_file)
logger.info(f"Audio saved to {output_file}")
-
+
except Exception as e:
logger.error(f"Error converting text to speech: {str(e)}")
raise
def _generate_audio_segments(self, text: str, temp_dir: str) -> List[str]:
"""Generate audio segments for each Q&A pair."""
- qa_pairs = self.provider.split_qa(text, self.ending_message, self.provider.get_supported_tags())
+ qa_pairs = self.provider.split_qa(
+ text, self.ending_message, self.provider.get_supported_tags()
+ )
audio_files = []
provider_config = self._get_provider_config()
@@ -181,8 +190,6 @@ def get_sort_key(file_path: str) -> Tuple[int, int]:
logger.error(f"Error merging audio files: {str(e)}")
raise
-
-
def _setup_directories(self) -> None:
"""Setup required directories for audio processing."""
self.output_directories = self.tts_config.get("output_directories", {})
@@ -200,13 +207,13 @@ def _setup_directories(self) -> None:
def _validate_transcript_format(self, text: str) -> None:
"""
Validate that the input text follows the correct transcript format.
-
+
Args:
text (str): Input text to validate
-
+
Raises:
ValueError: If the text is not properly formatted
-
+
The text should:
1. Have alternating Person1 and Person2 tags
2. Each opening tag should have a closing tag
@@ -216,32 +223,36 @@ def _validate_transcript_format(self, text: str) -> None:
# Check for empty text
if not text.strip():
raise ValueError("Input text is empty")
-
+
# Check for matching opening and closing tags
person1_open = text.count("")
person1_close = text.count("")
person2_open = text.count("")
person2_close = text.count("")
-
+
if person1_open != person1_close:
- raise ValueError(f"Mismatched Person1 tags: {person1_open} opening tags and {person1_close} closing tags")
+ raise ValueError(
+ f"Mismatched Person1 tags: {person1_open} opening tags and {person1_close} closing tags"
+ )
if person2_open != person2_close:
- raise ValueError(f"Mismatched Person2 tags: {person2_open} opening tags and {person2_close} closing tags")
-
+ raise ValueError(
+ f"Mismatched Person2 tags: {person2_open} opening tags and {person2_close} closing tags"
+ )
+
# Check for alternating pattern using regex
pattern = r".*?\s*.*?"
matches = re.findall(pattern, text, re.DOTALL)
-
+
# Calculate expected number of pairs
expected_pairs = min(person1_open, person2_open)
-
+
if len(matches) != expected_pairs:
raise ValueError(
"Tags are not properly alternating between Person1 and Person2. "
"Each Person1 section should be followed by a Person2 section."
)
-
- # Check for malformed tags (unclosed or improperly nested)
+
+ # Check for malformed tags (unclosed or improperly nested)
stack = []
for match in re.finditer(r"<(/?)Person([12])>", text):
tag = match.group(0)
@@ -251,12 +262,12 @@ def _validate_transcript_format(self, text: str) -> None:
stack.pop()
else:
stack.append(tag[1:-1])
-
+
if stack:
raise ValueError(f"Unclosed tags: {', '.join(stack)}")
-
+
logger.debug("Transcript format validation passed")
-
+
except ValueError as e:
logger.error(f"Transcript format validation failed: {str(e)}")
raise
diff --git a/pyproject.toml b/pyproject.toml
index d618950..fdb2566 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,6 +1,6 @@
[tool.poetry]
name = "podcastfy"
-version = "0.3.0"
+version = "0.3.1"
description = "An Open Source alternative to NotebookLM's podcast feature: Transforming Multimodal Content into Captivating Multilingual Audio Conversations with GenAI"
authors = ["Tharsis T. P. Souza"]
license = "Apache-2.0"
diff --git a/tests/test_audio.py b/tests/test_audio.py
index fa57058..5e6597e 100644
--- a/tests/test_audio.py
+++ b/tests/test_audio.py
@@ -45,6 +45,7 @@ def test_text_to_speech_edge(self):
# Clean up
os.remove(output_file)
+
@pytest.mark.skip(reason="Testing edge only on Github Action as it's free")
def test_text_to_speech_google(self):
tts = TextToSpeech(model="google")
diff --git a/tests/test_client.py b/tests/test_client.py
index 2467afc..40323a5 100644
--- a/tests/test_client.py
+++ b/tests/test_client.py
@@ -19,8 +19,8 @@
MOCK_FILE_CONTENT = "\n".join(MOCK_URLS)
MOCK_TRANSCRIPT = "Joe Biden and the US PoliticsJoe Biden is the current president of the United States of America"
MOCK_IMAGE_PATHS = [
- "tests/data/images/Senecio.jpeg",
- "tests/data/images/connection.jpg",
+ "https://raw.githubusercontent.com/souzatharsis/podcastfy/refs/heads/main/data/images/Senecio.jpeg",
+ "https://raw.githubusercontent.com/souzatharsis/podcastfy/refs/heads/main/data/images/connection.jpg",
]
MOCK_CONVERSATION_CONFIG = """
word_count: 300
@@ -108,7 +108,9 @@ def test_generate_podcast_from_file(mock_files, sample_config):
assert "Podcast generated successfully using edge TTS model" in result.stdout
assert os.path.exists(result.stdout.split(": ")[-1].strip())
assert result.stdout.split(": ")[-1].strip().endswith(".mp3")
- assert os.path.getsize(result.stdout.split(": ")[-1].strip()) > 1024 # Check if larger than 1KB
+ assert (
+ os.path.getsize(result.stdout.split(": ")[-1].strip()) > 1024
+ ) # Check if larger than 1KB
def test_generate_podcast_from_transcript(mock_files, sample_config):
@@ -119,7 +121,9 @@ def test_generate_podcast_from_transcript(mock_files, sample_config):
assert "Podcast generated successfully using edge TTS model" in result.stdout
assert os.path.exists(result.stdout.split(": ")[-1].strip())
assert result.stdout.split(": ")[-1].strip().endswith(".mp3")
- assert os.path.getsize(result.stdout.split(": ")[-1].strip()) > 1024 # Check if larger than 1KB
+ assert (
+ os.path.getsize(result.stdout.split(": ")[-1].strip()) > 1024
+ ) # Check if larger than 1KB
def test_generate_transcript_only(sample_config):
@@ -148,6 +152,7 @@ def test_generate_transcript_only(sample_config):
for tag in re.findall(r".*?", content)
)
+
@pytest.mark.skip(reason="Not supported yet")
def test_generate_podcast_from_urls_and_file(mock_files, sample_config):
result = runner.invoke(
@@ -165,7 +170,9 @@ def test_generate_podcast_from_urls_and_file(mock_files, sample_config):
assert "Podcast generated successfully using edge TTS model" in result.stdout
assert os.path.exists(result.stdout.split(": ")[-1].strip())
assert result.stdout.split(": ")[-1].strip().endswith(".mp3")
- assert os.path.getsize(result.stdout.split(": ")[-1].strip()) > 1024 # Check if larger than 1KB
+ assert (
+ os.path.getsize(result.stdout.split(": ")[-1].strip()) > 1024
+ ) # Check if larger than 1KB
def test_generate_podcast_from_image(sample_config):
@@ -174,7 +181,9 @@ def test_generate_podcast_from_image(sample_config):
assert "Podcast generated successfully using edge TTS model" in result.stdout
assert os.path.exists(result.stdout.split(": ")[-1].strip())
assert result.stdout.split(": ")[-1].strip().endswith(".mp3")
- assert os.path.getsize(result.stdout.split(": ")[-1].strip()) > 1024 # Check if larger than 1KB
+ assert (
+ os.path.getsize(result.stdout.split(": ")[-1].strip()) > 1024
+ ) # Check if larger than 1KB
@pytest.mark.skip(reason="To be further tested")
@@ -215,7 +224,9 @@ def test_generate_podcast_from_urls_and_images(sample_config):
assert "Podcast generated successfully using edge TTS model" in result.stdout
assert os.path.exists(result.stdout.split(": ")[-1].strip())
assert result.stdout.split(": ")[-1].strip().endswith(".mp3")
- assert os.path.getsize(result.stdout.split(": ")[-1].strip()) > 1024 # Check if larger than 1KB
+ assert (
+ os.path.getsize(result.stdout.split(": ")[-1].strip()) > 1024
+ ) # Check if larger than 1KB
@pytest.mark.skip(reason="Requires local LLM running")
@@ -266,45 +277,53 @@ def test_generate_podcast_with_custom_llm():
result = runner.invoke(
app,
[
- "--url", MOCK_URLS[0],
- "--tts-model", "edge",
- "--llm-model-name", "gemini-1.5-pro-latest",
- "--api-key-label", "GEMINI_API_KEY"
- ]
+ "--url",
+ MOCK_URLS[0],
+ "--tts-model",
+ "edge",
+ "--llm-model-name",
+ "gemini-1.5-pro-latest",
+ "--api-key-label",
+ "GEMINI_API_KEY",
+ ],
)
-
+
assert result.exit_code == 0
assert "Podcast generated successfully using edge TTS model" in result.stdout
-
+
# Extract and verify the audio file
audio_path = result.stdout.split(": ")[-1].strip()
assert os.path.exists(audio_path)
assert audio_path.endswith(".mp3")
assert os.path.getsize(audio_path) > 1024 # Check if larger than 1KB
-
+
# Clean up
os.remove(audio_path)
+
def test_generate_transcript_only_with_custom_llm():
"""Test generating only a transcript with a custom LLM model using CLI."""
result = runner.invoke(
app,
[
- "--url", MOCK_URLS[0],
+ "--url",
+ MOCK_URLS[0],
"--transcript-only",
- "--llm-model-name", "gemini-1.5-pro-latest",
- "--api-key-label", "GEMINI_API_KEY"
- ]
+ "--llm-model-name",
+ "gemini-1.5-pro-latest",
+ "--api-key-label",
+ "GEMINI_API_KEY",
+ ],
)
-
+
assert result.exit_code == 0
assert "Transcript generated successfully" in result.stdout
-
+
# Extract and verify the transcript file
transcript_path = result.stdout.split(": ")[-1].strip()
assert os.path.exists(transcript_path)
assert transcript_path.endswith(".txt")
-
+
# Verify transcript content
with open(transcript_path, "r") as f:
content = f.read()
@@ -314,14 +333,36 @@ def test_generate_transcript_only_with_custom_llm():
assert "" in content
assert len(content.split("")) > 1 # At least one question
assert len(content.split("")) > 1 # At least one answer
-
+
# Verify content is substantial
min_length = 500 # Minimum expected length in characters
- assert len(content) > min_length, \
- f"Content length ({len(content)}) is less than minimum expected ({min_length})"
-
+ assert (
+ len(content) > min_length
+ ), f"Content length ({len(content)}) is less than minimum expected ({min_length})"
+
# Clean up
os.remove(transcript_path)
+
+@pytest.mark.skip(reason="Too expensive to be auto tested on Github Actions")
+def test_generate_podcast_from_topic():
+ """Test generating a podcast from a topic using CLI."""
+ result = runner.invoke(
+ app, ["--topic", "Artificial Intelligence Ethics", "--tts-model", "edge"]
+ )
+
+ assert result.exit_code == 0
+ assert "Podcast generated successfully using edge TTS model" in result.stdout
+
+ # Extract and verify the audio file
+ audio_path = result.stdout.split(": ")[-1].strip()
+ assert os.path.exists(audio_path)
+ assert audio_path.endswith(".mp3")
+ assert os.path.getsize(audio_path) > 1024 # Check if larger than 1KB
+
+ # Clean up
+ os.remove(audio_path)
+
+
if __name__ == "__main__":
pytest.main()
diff --git a/tests/test_content_parser.py b/tests/test_content_parser.py
index 6583a16..39b08a1 100644
--- a/tests/test_content_parser.py
+++ b/tests/test_content_parser.py
@@ -82,6 +82,27 @@ def test_pdf_extractor(self):
extracted_content[:500].strip(), expected_content[:500].strip()
)
+ @pytest.mark.skip(reason="Too expensive to be auto tested on Github Actions")
+ def test_generate_topic_content(self):
+ """Test generating content for a specific topic."""
+ extractor = ContentExtractor()
+ topic = "Latest news about OpenAI"
+
+ # Generate content for the topic
+ content = extractor.generate_topic_content(topic)
+
+ # Verify the content
+ self.assertIsNotNone(content)
+ self.assertIsInstance(content, str)
+ self.assertGreater(len(content), 100) # Content should be substantial
+
+ # Check if content is relevant to the topic
+ lower_content = content.lower()
+ self.assertTrue(
+ any(term in lower_content for term in ["openai"]),
+ "Generated content should be relevant to the topic",
+ )
+
if __name__ == "__main__":
unittest.main()
diff --git a/tests/test_genai_podcast.py b/tests/test_genai_podcast.py
index 5b0f4f0..ec740f7 100644
--- a/tests/test_genai_podcast.py
+++ b/tests/test_genai_podcast.py
@@ -7,6 +7,13 @@
from podcastfy.utils.config import Config
from podcastfy.utils.config_conversation import ConversationConfig
from podcastfy.content_parser.pdf_extractor import PDFExtractor
+from podcastfy.content_parser.content_extractor import ContentExtractor
+
+
+MOCK_IMAGE_PATHS = [
+ "https://raw.githubusercontent.com/souzatharsis/podcastfy/refs/heads/main/data/images/Senecio.jpeg",
+ "https://raw.githubusercontent.com/souzatharsis/podcastfy/refs/heads/main/data/images/connection.jpg",
+]
# TODO: Should be a fixture
@@ -61,10 +68,7 @@ def test_custom_conversation_config(self):
def test_generate_qa_content_from_images(self):
"""Test generating Q&A content from two input images."""
- image_paths = [
- "tests/data/images/Senecio.jpeg",
- "tests/data/images/connection.jpg",
- ]
+ image_paths = MOCK_IMAGE_PATHS
content_generator = ContentGenerator(self.api_key)
@@ -120,22 +124,49 @@ def test_generate_qa_content_from_raw_text(self):
def test_generate_qa_content_with_custom_model(self):
"""Test generating Q&A content with a custom model and API key."""
content_generator = ContentGenerator(
- self.api_key,
- conversation_config=sample_conversation_config()
+ self.api_key, conversation_config=sample_conversation_config()
)
input_text = "United States of America"
-
+
# Test with OpenAI model
result = content_generator.generate_qa_content(
input_text,
model_name="gemini-1.5-pro-latest",
- api_key_label="GEMINI_API_KEY"
+ api_key_label="GEMINI_API_KEY",
)
-
+
self.assertIsNotNone(result)
self.assertNotEqual(result, "")
self.assertIsInstance(result, str)
+ @pytest.mark.skip(reason="Too expensive to be auto tested on Github Actions")
+ def test_generate_qa_content_from_topic(self):
+ """Test generating Q&A content from a specific topic."""
+ topic = "Latest news about OpenAI"
+ content_generator = ContentGenerator(self.api_key)
+ extractor = ContentExtractor()
+ topic = "Latest news about OpenAI"
+
+ # Generate content for the topic
+ content = extractor.generate_topic_content(topic)
+
+ result = content_generator.generate_qa_content(input_texts=content)
+
+ self.assertIsNotNone(result)
+ self.assertNotEqual(result, "")
+ self.assertIsInstance(result, str)
+
+ # Verify Q&A format
+ self.assertIn("", result)
+ self.assertIn("", result)
+
+ # Verify content relevance
+ lower_result = result.lower()
+ self.assertTrue(
+ any(term in lower_result for term in ["openai"]),
+ "Generated content should be relevant to the topic",
+ )
+
if __name__ == "__main__":
unittest.main()
diff --git a/tests/test_generate_podcast.py b/tests/test_generate_podcast.py
index d74ce7f..d76a22a 100644
--- a/tests/test_generate_podcast.py
+++ b/tests/test_generate_podcast.py
@@ -7,6 +7,11 @@
TEST_URL = "https://en.wikipedia.org/wiki/Friends"
+MOCK_IMAGE_PATHS = [
+ "https://raw.githubusercontent.com/souzatharsis/podcastfy/refs/heads/main/data/images/Senecio.jpeg",
+ "https://raw.githubusercontent.com/souzatharsis/podcastfy/refs/heads/main/data/images/connection.jpg",
+]
+
@pytest.fixture
def sample_config():
@@ -57,7 +62,6 @@ def sample_conversation_config():
return conversation_config
-
@pytest.fixture(autouse=True)
def setup_test_directories(sample_conversation_config):
"""Create test directories if they don't exist."""
@@ -72,12 +76,11 @@ def setup_test_directories(sample_conversation_config):
if temp_dir:
os.makedirs(temp_dir, exist_ok=True)
+
@pytest.mark.skip(reason="Testing edge only on Github Action as it's free")
def test_generate_podcast_from_urls_11labs(default_conversation_config):
"""Test generating a podcast from a list of URLs."""
- urls = [
- TEST_URL
- ]
+ urls = [TEST_URL]
audio_file = generate_podcast(urls=urls, tts_model="elevenlabs")
print(f"Audio file generated using ElevenLabs model: {audio_file}")
@@ -89,6 +92,7 @@ def test_generate_podcast_from_urls_11labs(default_conversation_config):
"text_to_speech", {}
).get("output_directories", {}).get("audio")
+
@pytest.mark.skip(reason="Testing edge only on Github Action as it's free")
def test_generate_podcast_from_urls_openai(default_conversation_config):
"""Test generating a podcast from a list of URLs."""
@@ -129,9 +133,7 @@ def test_generate_podcast_from_urls_gemini(default_conversation_config):
def test_generate_podcast_from_urls_edge(default_conversation_config):
"""Test generating a podcast from a list of URLs."""
- urls = [
- TEST_URL
- ]
+ urls = [TEST_URL]
audio_file = generate_podcast(urls=urls, tts_model="edge")
print(f"Audio file generated using Edge model: {audio_file}")
@@ -143,11 +145,10 @@ def test_generate_podcast_from_urls_edge(default_conversation_config):
"text_to_speech", {}
).get("output_directories", {}).get("audio")
+
def test_generate_transcript_only(default_conversation_config):
"""Test generating only a transcript without audio."""
- urls = [
- TEST_URL
- ]
+ urls = [TEST_URL]
result = generate_podcast(urls=urls, transcript_only=True)
print(f"Transcript file generated: {result}")
@@ -159,6 +160,7 @@ def test_generate_transcript_only(default_conversation_config):
"text_to_speech", {}
).get("output_directories", {}).get("transcripts")
+
def test_generate_podcast_from_transcript_file(sample_conversation_config):
"""Test generating a podcast from an existing transcript file."""
# First, generate a transcript
@@ -188,6 +190,7 @@ def test_generate_podcast_from_transcript_file(sample_conversation_config):
"text_to_speech", {}
).get("output_directories", {}).get("audio")
+
def test_generate_podcast_with_custom_config(sample_config, sample_conversation_config):
"""Test generating a podcast with a custom conversation config."""
urls = ["https://en.wikipedia.org/wiki/Artificial_intelligence"]
@@ -208,6 +211,7 @@ def test_generate_podcast_with_custom_config(sample_config, sample_conversation_
== sample_conversation_config["text_to_speech"]["output_directories"]["audio"]
)
+
def test_generate_from_local_pdf(sample_config):
"""Test generating a podcast from a local PDF file."""
pdf_file = "tests/data/pdf/file.pdf"
@@ -219,14 +223,16 @@ def test_generate_from_local_pdf(sample_config):
assert audio_file.endswith(".mp3")
assert os.path.getsize(audio_file) > 1024 # Check if larger than 1KB
+
def test_generate_podcast_no_urls_or_transcript():
"""Test that an error is raised when no URLs or transcript file is provided."""
with pytest.raises(ValueError):
generate_podcast()
+
def test_generate_podcast_from_images(sample_config, default_conversation_config):
"""Test generating a podcast from two input images."""
- image_paths = ["tests/data/images/Senecio.jpeg", "tests/data/images/connection.jpg"]
+ image_paths = MOCK_IMAGE_PATHS
audio_file = generate_podcast(
image_paths=image_paths, tts_model="edge", config=sample_config
@@ -250,6 +256,7 @@ def test_generate_podcast_from_images(sample_config, default_conversation_config
]
assert len(transcript_files) > 0
+
def test_generate_podcast_from_raw_text(sample_config, default_conversation_config):
"""Test generating a podcast from raw input text."""
raw_text = "The wonderful world of LLMs."
@@ -264,6 +271,7 @@ def test_generate_podcast_from_raw_text(sample_config, default_conversation_conf
"text_to_speech", {}
).get("output_directories", {}).get("audio")
+
def test_generate_transcript_with_user_instructions(
sample_config, default_conversation_config
):
@@ -318,18 +326,19 @@ def test_generate_transcript_with_user_instructions(
conversation_config["podcast_tagline"].lower() in content.lower()
), f"Expected to find podcast tagline '{conversation_config['podcast_tagline']}' in transcript"
+
def test_generate_podcast_with_custom_llm(sample_config, default_conversation_config):
"""Test generating a podcast with a custom LLM model."""
urls = ["https://en.wikipedia.org/wiki/Artificial_intelligence"]
-
+
audio_file = generate_podcast(
urls=urls,
tts_model="edge",
config=sample_config,
llm_model_name="gemini-1.5-pro-latest",
- api_key_label="GEMINI_API_KEY"
+ api_key_label="GEMINI_API_KEY",
)
-
+
assert audio_file is not None
assert os.path.exists(audio_file)
assert audio_file.endswith(".mp3")
@@ -338,39 +347,45 @@ def test_generate_podcast_with_custom_llm(sample_config, default_conversation_co
"text_to_speech", {}
).get("output_directories", {}).get("audio")
-def test_generate_transcript_only_with_custom_llm(sample_config, default_conversation_config):
+
+def test_generate_transcript_only_with_custom_llm(
+ sample_config, default_conversation_config
+):
"""Test generating only a transcript with a custom LLM model."""
urls = ["https://en.wikipedia.org/wiki/Artificial_intelligence"]
-
+
# Generate transcript with custom LLM settings
result = generate_podcast(
urls=urls,
transcript_only=True,
config=sample_config,
llm_model_name="gemini-1.5-pro-latest",
- api_key_label="GEMINI_API_KEY"
+ api_key_label="GEMINI_API_KEY",
)
-
+
assert result is not None
assert os.path.exists(result)
assert result.endswith(".txt")
assert os.path.dirname(result) == default_conversation_config.get(
"text_to_speech", {}
).get("output_directories", {}).get("transcripts")
-
+
# Read and verify the content
with open(result, "r") as f:
content = f.read()
-
+
# Verify the content follows the Person1/Person2 format
assert "" in content
assert "" in content
assert len(content.split("")) > 1 # At least one question
assert len(content.split("")) > 1 # At least one answer
-
+
# Verify the content is substantial
min_length = 500 # Minimum expected length in characters
- assert len(content) > min_length, f"Content length ({len(content)}) is less than minimum expected ({min_length})"
+ assert (
+ len(content) > min_length
+ ), f"Content length ({len(content)}) is less than minimum expected ({min_length})"
+
if __name__ == "__main__":
pytest.main()