I am a Master's student in Biostatistics at the University of Washington, holding a Bachelor's degree in Data Science with a minor in mathematics from New York University. My past experiences include working as a research assistant at NYU under Professor Manpreet Katari. In this role, I developed a Python-based statistical model using the Generalized Linear Model of the negative binomial distribution for analyzing gene differential expression. Currently, I am exploring the use of Contrastive Learning methods to understand the low-dimensional representation of static lineage barcoded single-cell RNA-seq data, supervised by Professor Kevin Lin at the University of Washington.

I am broadly interested in Biostatistics and Data Science, and enjoy the process of engaging domain science, statistical models and programmatic data analysis.

For more information about me, please check out my resume or CV, or reach out via email or linkedin.

Shizhao Joshua Yang

Education


MS in Biostatistics, 2025

Department of Biostatistics

University of Washington School of Public Health

Seattle, WA

BS in Data Science (Genomics concentration, Minor: Mathematics), 2023

New York University

New York, NY

Projects

Deep-learning embedding for static lineage barcoded single-cell RNA-seq data
  • Designed and implemented a contrastive learning algorithm to learn the high-dimensional embeddings of static lineage barcoded single-cell RNA-seq data, facilitating the identification of lineage-specific gene expression patterns.
  • Develop an evaluation metric for comparing the performance of different embeddings of single-cell data generated by the model.
Python-based RNA-seq Analysis Algorithm using Negative Binomial GLM
  • Build a python-based Stats Model based on GLM with negative binomial distribution for gene’s differential expression analysis.
  • Use backtracking line search in dispersion estimation and IRLS (iterative reweighted least squares) in coefficient estimation and apply the Wald Test to the estimated log fold changes.
  • Test the algorithm on several different datasets and compare the results with actual and estimated values generated by Deseq2.
Refined SIR Model with Vaccination and its Application in 2022 NYC Influenza A Activity Prediction
  • Implemented a series of ODE methods (including fixed point and its stability, phase plane, and Herd Immunity) to analyze the modified SIR model with the consideration of Vaccination analytically.
  • Simulated the NYC influenza data of the past six years using the SIRV model and estimated the transmission rate, removal rate, and reproduction number (R0) using the Quasi-Newton method on Python.
  • Compared the estimated coefficients of the past seasons and predicted the infection peak in NYC this year under different vaccination rates.
Plain Academic