Abstract

The rush of the James Webb Space Telescope’s data release has impacted astronomers and hobbyists alike. This impact generated a lot of data to be classified, leaving the question mark on whether this classification can be made through text inference alone. This study will attempt to use NLP techniques such as regex expression search, tagging, and Jaccard similarity index for various identification of celestial objects and their place of observance through the months of June-December 2022 by the most populated astronomical Reddit datasets, namely r/Astronomy and r/astrophotography. The results found from this research prove that with enough data and search query items, it is possible to identify common objects from the submissions alone — paving the path for alternative classification of astronomical data.