Text Mining on arXiv.org to Summarize the Latest Research
The arXiv Explorer is a text mining app on Streamlit Cloud that pulls the most recent papers submitted to the arXiv, which is an open-access resource for academic literature, and summarizes the activity using a series of Natural Language Processing (NLP) tools. We also implement a HuggingFace LLM model to read a scientific PDF and summarize its contents for the layperson.
Tools include Python for Streamlit, topic modeling w/ Latent Dirichlet Allocation (LDA), and HuggingFace API calls to Large Language Models (LLMs)
Thank you to arXiv for use of its open access interoperability. This app was not reviewed or approved by, nor does it necessarily express or reflect the policies or opinions of, arXiv.