Portfolio: Coding and Other Projects

Mining the arXiv with NLP and LLMs (2024)

Forecasting Parking Meter Transactions in San Diego (2023)

Modeling Earthquake Predictions (2023)

Tableau Dashboard: Police Misconduct in New York City (2023)

Data Pipelines for Flight Information (2023)

Follow the Money - Executive Pay Resources (2022)

Data Science for Investing (2021)

Climate Change Analysis (2021)

Pulsar Classification (2021)
Written Reports
-
May 2024
Master’s Degree thesis, written collaboratively with Trevor Sauerbrey
Using academic abstracts mined from arXiv.org preprints, we use NLP and time series techniques to identify and forecast trends.
Abstract: This article demonstrates time series methods and natural language processing (NLP) techniques applied to an arXiv-based open-source corpus of academic abstracts to forecast token frequency in academic research over time. Current efforts to predict research trends either weigh tokens improperly based on article length, apply inaccessible mathematics to forecasting, or assign weight to authorship and citations rather than vocabulary usage in academic writing. A novel “integral” similarity technique to cluster time series variables for vector autoregression (VAR) models is introduced, and unique evaluation metrics are employed to show forecasting errors follow a Cauchy distribution. Final results for modeling the corpus dataset do not significantly outperform naïve baseline models, and several opportunities for improvement are identified.
This was written to satisfy the requirements of a Masters-level thesis.
-
December 2022
Collaborative report with Kevin W.S. Baum.
Using Summary Compensation Table data sourced from sec-api.io and financial data from stockrow.com, we attempt to classify executive roles as “CEO” or “CFO” on the basis of (1) Company financials, and (2) Executive pay, both over a 10-year window.
This was written to satisfy the requirements of a Masters-level course in applied data mining.
-
October 2022
This is a report outlining some of the ethical and practical considerations of Executive Compensation, and proposing a data-driven knowledge base to raise awareness of executive pay practices.
This was written to satisfy the requirements of a Masters-level course in the ethical foundations of data science.
-
Early 2020 (pre-pandemic)
Collaboration with Dr. David Ott (Sr. Staff Engineer) and Manish Gaur (Sr. Director, R&D Security).
This is an unpublished paper describing the need for large enterprises to begin planning for Post-Quantum Cryptography (PQC) security algorithm migration. This is driven by advancements in quantum computing, specifically the looming security threat of Shor’s algorithm for factoring numbers, and also the ongoing NIST competition for PQC security standards.
-
December 2013
This is a directed study report on numerical methods, specifically finding numerical solutions for nonlinear wave equations by discretizing the equation parameters and applying difference methods as an approximation for continuous derivatives. The report also includes an introduction to Elliptical Equations, Linear and Non-Linear Wave Equations, iterations and convergence, and techniques for approximating initial conditions with difference methods.
This report was written to satisfy the senior thesis requirements of a Bachelor of Science degree in Applied Mathematics.
-
December 2013
Final report for a Senior-level Thermal Physics course. We characterized polymers with different concentrations of graphene using ThermoGravimetric Analysis (TGA) and Differential Scanning Calorimetry (DSC).
Abstract: We took samples of pure PLGA, along with PLGA mixed with graphene nanoparticles and (NH2) and characterized the copolymer according to various thermal properties, including glass transition temperatures, changes in heat capacity, and degrees of crystallinity. We achieved this using the thermal techniques of differential scanning calorimetry and thermogravimetric analysis.
Written in collaboration with Alba Rubi Banegas.
-
Summer 2013
This is a quantum computing algorithm for finding square roots of a positive-definite matrix, built on the architecture of the Harrow/Hassidim/Llloyd (HHL) algorithm for solving linear equations with a quantum computer. The original algorithm relies on eigendecomposition, breaking a matrix into form QAQ* and performing inversion operations on the diagonal eigenvalue matrix A in quantum superposition. We extend this idea to square roots rather than inversion, including analysis of error and runtime.
This report was written to satisfy the senior thesis requirements of a Bachelor of Science degree in Applied Mathematics.