COVID-19 Search Engine

Course Assignment, COL764 (Information Retrieval), Fall'23

An inverted index based search engine in C++ for COVID-19 related news and research

This course assignment was aimed at understanding the core of a search engine. We built an inverted index based search engine for retrieving COVID-19 related academia papers for scientific information needs. This involved understanding how index structures work, how queries and corpuses are processed. We optimized the search engine for speed and memory usage using compression techniques and search algorithms. We also implemented and trained Byte-Pair encoding for pre-processing and tokenization of documents and queries. We also implemented re-ranking using pseudo-relevance language modeling and local word embeddings based query expansion based on Word2Vec. We incorporated rank fusion algorithms like Bayes Fuse and Condorcet Voting to combine multiple ranking systems.

Chinmay Mittal
Chinmay Mittal

My research interests include Artificial Intelligence, particularly applications of Deep Learning in Natural Language Processing.