Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 36 additions & 0 deletions datasets/ivrit-ai-audio-v2.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
Name: ivrit-ai Hebrew Audio v2 Dataset
Description: >
The ivrit-ai audio-v2 dataset is a curated collection of Hebrew speech recordings and metadata designed to advance speech recognition and AI research using high-quality, crowd-sourced and/or institutional audio. Contact ivrit.ai for information about its composition and source domains.Documentation: https://huggingface.co/datasets/ivrit-ai/audio-v2
Contact: [email protected]
ManagedBy: ivrit.ai
UpdateFrequency: Updated several times per year
Tags:
- natural language processing
- automatic speech recognition
- speech processing
License: >
ivrit.ai license (modified CC-BY, permitting use for training AI models only and prohibiting deepfake generation; see https://www.ivrit.ai/en/license-faqs/ for full terms)
Citation: >
If you use this dataset, cite:
Marmor, Yanir and Lifshitz, Yair and Snapir, Yoad and Misgav, Kinneret (2025). Building an Accurate Open-Source Hebrew ASR System through Crowdsourcing. Proc. Interspeech 2025, pp. 723–727.
"[ivrit-ai Crowd-Transcribe Hebrew Speech Dataset] was accessed on [DATE] at registry.opendata.aws/ivrit-ai-crowdtranscribe"
Resources:
- Description: "Hebrew speech audio and aligned metadata in plain text and other formats. Data is available via Hugging Face Datasets. Contact ivrit.ai for bulk/alternative access methods."
ARN: ""
Region: ""
Type: "External Resource"
Explore:
- "https://huggingface.co/datasets/ivrit-ai/crowd-transcribe-v5"
DataAtWork:
Tutorials:
- Title: "Building an Accurate Open-Source Hebrew ASR System through Crowdsourcing"
URL: https://www.isca-archive.org/interspeech_2025/marmor25_interspeech.pdf
AuthorName: Marmor, Yanir et al.
Tools & Applications: []
Publications:
- Title: "Building an Accurate Open-Source Hebrew ASR System through Crowdsourcing"
URL: https://www.isca-archive.org/interspeech_2025/marmor25_interspeech.pdf
AuthorName: Marmor, Yanir; Lifshitz, Yair; Snapir, Yoad; Misgav, Kinneret
ADXCategories:
- Language
- Speech
38 changes: 38 additions & 0 deletions datasets/ivrit-ai-crowdtranscribe.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
Name: ivrit-ai Crowd-Transcribe Hebrew Speech Dataset
Description: >
The ivrit-ai Crowd-Transcribe v5 dataset is a comprehensive Hebrew speech dataset contributed and vetted by a crowd of volunteers, designed to support the development of open-source Hebrew ASR systems and other language technologies. It is available for the purposes of training AI models, subject to the ivrit.ai license, which prohibits use for non-AI-model training and deepfake creation. The dataset enables robust Hebrew speech-to-text and downstream research.
Documentation: https://huggingface.co/datasets/ivrit-ai/crowd-transcribe-v5
Contact: [email protected]
ManagedBy: ivrit.ai
UpdateFrequency: Updated several times per year
Tags:
- natural language processing
- automatic speech recognition
- speech processing
License: >
ivrit.ai license (modified CC-BY, permitting use for training AI models only and prohibiting deepfake generation; see https://www.ivrit.ai/en/license-faqs/ for full terms)
Citation: >
If you use this dataset, cite:
Marmor, Yanir and Lifshitz, Yair and Snapir, Yoad and Misgav, Kinneret (2025). Building an Accurate Open-Source Hebrew ASR System through Crowdsourcing. Proc. Interspeech 2025, pp. 723–727.
"[ivrit-ai Crowd-Transcribe Hebrew Speech Dataset] was accessed on [DATE] at registry.opendata.aws/ivrit-ai-crowdtranscribe"
Resources:
- Description: "Hebrew crowd-sourced transcribed speech audio and aligned metadata in plain text and other formats. Data is available via Hugging Face Datasets. Contact ivrit.ai for bulk/alternative access methods."
ARN: ""
Region: ""
Type: "External Resource"
Explore:
- "https://huggingface.co/datasets/ivrit-ai/crowd-transcribe-v5"
DataAtWork:
Tutorials:
- Title: "Building an Accurate Open-Source Hebrew ASR System through Crowdsourcing"
URL: https://www.isca-archive.org/interspeech_2025/marmor25_interspeech.pdf
AuthorName: Marmor, Yanir et al.
Tools & Applications: []
Publications:
- Title: "Building an Accurate Open-Source Hebrew ASR System through Crowdsourcing"
URL: https://www.isca-archive.org/interspeech_2025/marmor25_interspeech.pdf
AuthorName: Marmor, Yanir; Lifshitz, Yair; Snapir, Yoad; Misgav, Kinneret

ADXCategories:
- Language
- Speech
37 changes: 37 additions & 0 deletions datasets/ivrit-ai-knesset-plenums.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
Name: ivrit-ai Knesset Plenum Transcriptions Dataset
Description: >
The ivrit-ai Knesset Plenum Transcriptions dataset comprises aligned Hebrew speech and transcriptions from Israeli Knesset parliamentary plenary sessions. The dataset supports research on parliamentary speech, political discourse, and automatic speech recognition.
Documentation: https://huggingface.co/datasets/ivrit-ai/knesset-plenums
Contact: [email protected]
ManagedBy: ivrit.ai
UpdateFrequency: Updated several times per year
Tags:
- natural language processing
- automatic speech recognition
- speech processing
License: >
ivrit.ai license (modified CC-BY, permitting use for training AI models only and prohibiting deepfake generation; see https://www.ivrit.ai/en/license-faqs/ for full terms)
Citation: >
If you use this dataset, cite:
Marmor, Yanir and Lifshitz, Yair and Snapir, Yoad and Misgav, Kinneret (2025). Building an Accurate Open-Source Hebrew ASR System through Crowdsourcing. Proc. Interspeech 2025, pp. 723–727.
"[ivrit-ai Crowd-Transcribe Hebrew Speech Dataset] was accessed on [DATE] at registry.opendata.aws/ivrit-ai-crowdtranscribe"
Resources:
- Description: "Hebrew Knesset plenum audio and transcriptions, with aligned metadata. Access via Hugging Face Datasets or by contacting ivrit.ai for bulk."
ARN: ""
Region: ""
Type: "External Resource"
Explore:
- "https://huggingface.co/datasets/ivrit-ai/knesset-plenums"
DataAtWork:
Tutorials:
- Title: "Building an Accurate Open-Source Hebrew ASR System through Crowdsourcing"
URL: https://www.isca-archive.org/interspeech_2025/marmor25_interspeech.pdf
AuthorName: Marmor, Yanir et al.
Tools & Applications: []
Publications:
- Title: "Building an Accurate Open-Source Hebrew ASR System through Crowdsourcing"
URL: https://www.isca-archive.org/interspeech_2025/marmor25_interspeech.pdf
AuthorName: Marmor, Yanir; Lifshitz, Yair; Snapir, Yoad; Misgav, Kinneret
ADXCategories:
- Language
- Speech