Skip to content

Commit 752f190

Browse files
author
souzatharsis
committed
#165 Fixed audio generation in Windows OS issue: Normalize path separators for cross-platform compatibility
1 parent 94a8224 commit 752f190

File tree

6 files changed

+11
-7
lines changed

6 files changed

+11
-7
lines changed

CHANGELOG.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Changelog
22

3-
## [0.3.1] - 2024-11-07
3+
## [0.3.3] - 2024-11-08
44

55
### Breaking Changes
66
- Loading images from 'path' has been removed for security reasons. Please specify images by passing an 'url'.
@@ -15,6 +15,9 @@
1515
- Start TESTIMONIALS.md
1616
- Add apps using Podcastfy to README.md
1717

18+
### Fixed
19+
- #165 Fixed audio generation in Windows OS issue: Normalize path separators for cross-platform compatibility
20+
1821
## [0.2.3] - 2024-10-15
1922

2023
### Added

README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -72,9 +72,12 @@ This sample collection is also [available at audio.com](https://audio.com/thatup
7272
## Updates 🚀
7373

7474
### v0.3.0+ release
75+
- Generate podcasts from input topic using real-time internet search
7576
- Integrate with 100+ LLM models (OpenAI, Anthropic, Google etc) for transcript generation
7677
- Integrate with Google's Multispeaker TTS model for high-quality audio generation
7778

79+
See [CHANGELOG](CHANGELOG.md) for more details.
80+
7881
## Quickstart 💻
7982

8083
### Prerequisites
@@ -108,8 +111,6 @@ python -m podcastfy.client --url <url1> --url <url2>
108111

109112
- [CLI](usage/cli.md)
110113

111-
- [Docker Image](usage/docker.md)
112-
113114
- [How to](usage/how-to.md)
114115

115116
Experience Podcastfy with our [HuggingFace](https://huggingface.co/spaces/thatupiso/Podcastfy.ai_demo) 🤗 Spaces app. (Note: This UI app is less extensively tested than the Python package.)

podcastfy/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
11
# This file can be left empty for now
2-
__version__ = "0.3.1" # or whatever version you're on
2+
__version__ = "0.3.3" # or whatever version you're on

podcastfy/text_to_speech.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -134,7 +134,7 @@ def _generate_audio_segments(self, text: str, temp_dir: str) -> List[str]:
134134
for speaker_type, content in [("question", question), ("answer", answer)]:
135135
temp_file = os.path.join(
136136
temp_dir, f"{idx}_{speaker_type}.{self.audio_format}"
137-
)
137+
).replace('\\', '/') # Normalize path separators for cross-platform compatibility
138138
voice = provider_config.get("default_voices", {}).get(speaker_type)
139139
model = provider_config.get("model")
140140

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[tool.poetry]
22
name = "podcastfy"
3-
version = "0.3.1"
3+
version = "0.3.3"
44
description = "An Open Source alternative to NotebookLM's podcast feature: Transforming Multimodal Content into Captivating Multilingual Audio Conversations with GenAI"
55
authors = ["Tharsis T. P. Souza"]
66
license = "Apache-2.0"

usage/conversation_custom.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -187,7 +187,7 @@ creativity: 0.7
187187
- The `word_count` is a target, and the AI may generate more or less than the specified word count. Low word counts are more likely to generate high-level discussions, while high word counts are more likely to generate detailed discussions.
188188
- The `output_language` defines both the language of the transcript and the language of the audio. Here's some relevant information:
189189
- Bottom-line: non-English transcripts are good enough but non-English audio is work-in-progress.
190-
- Transcripts are generated using Google's Gemini 1.5 Pro, which supports 100+ languages by default.
190+
- Transcripts are generated using Google's Gemini 1.5 Pro by default, which supports 100+ languages. Other user-defined models may or may not support non-English languages.
191191
- Audio is generated using `openai` (default), `elevenlabs`, `gemini`,or `edge` TTS models.
192192
- The `gemini`(Google) TTS model is English only.
193193
- The `openai` TTS model supports multiple languages automatically, however non-English voices still present sub-par quality in my experience.

0 commit comments

Comments
 (0)