Audio Search Engine

Voice-powered search with NLP and speech recognition

AI/ML

# features

What It Does

Core capabilities of the Audio Search Engine.

Speech Recognition

State-of-the-art NLP models for high-precision voice-to-text conversion in noisy environments.

Semantic Search

Goes beyond keywords to understand the context and intent behind every voice query.

Scalable Indexing

Fast and memory-efficient indexing architecture for searching through millions of audio files.

Audio Fingerprinting

Identify audio clips through unique acoustic signatures with minimal data requirements.

# source

Project Source Code

Browse the core Python modules that power the audio search engine.

Files

analyze.pyPython

1import os
2import src.analyzer as analyzer
3from src.filereader import FileReader
4from termcolor import colored
5from src.db import SQLiteDatabase
6
7MUSICS_FOLDER_PATH = "mp3"
8
9if __name__ == '__main__':
10    db = SQLiteDatabase()
11
12    for filename in os.listdir(MUSICS_FOLDER_PATH):
13        # Skip hidden files and non-WAV files
14        if not filename.endswith(".wav") or filename.startswith('.'):
15            continue
16
17        try:
18            file_path = os.path.join(MUSICS_FOLDER_PATH, filename)
19            reader = FileReader(file_path)
20            audio = reader.parse_audio()
21        except Exception as e:
22            print(colored(f"Error processing {filename}: {str(e)}", "red"))
23            continue
24
25        song = db.get_song_by_filehash(audio['file_hash'])
26
27        if not song:
28            song_id = db.add_song(filename, audio['file_hash'])
29        else:
30            song_id = song['id']
31
32        print(colored(f"Analyzing music: {filename}", "green"))
33
34        hash_count = db.get_song_hashes_count(song_id)
35        if hash_count > 0:
36            msg = f'Warning: This song already exists ({hash_count} hashes), skipping'
37            print(colored(msg, 'yellow'))
38            continue
39
40        hashes = set()
41
42        for channeln, channel in enumerate(audio['channels']):
43            channel_hashes = analyzer.fingerprint(channel, Fs=audio['Fs'])
44            channel_hashes = set(channel_hashes)
45            msg = f'Channel {channeln} saved {len(channel_hashes)} hashes'
46            print(colored(msg, attrs=['dark']))
47            hashes |= channel_hashes
48
49        values = [(song_id, hash, offset) for hash, offset in hashes]
50        db.store_fingerprints(values)
51
52    print(colored('Done', "green"))

# demo

Live Terminal Output

See the audio fingerprinting and matching in action. Click Run to simulate.

terminal — python

$_▋

# repository

Get the Code

Clone the repository and start searching from your terminal.

Audio-Search-Engine

A Python-based audio fingerprinting and recognition system. Analyzes WAV files, generates acoustic fingerprints, stores them in SQLite, and matches recorded audio against the database in real-time.

pythonaudio-fingerprintingsqlitesignal-processingmusic-recognition

Python

Quick Start

$ git clone https://github.com/prathapselvakumar/Audio-Search-Engine.git

$ cd Audio-Search-Engine

$ pip install -r requirements.txt

$ python analyze.py

$ python listen.py -s 10