Alex Lee
  • CV
  • Data Notes
  • Blog

CV

I am a Clinical Data Scientist and Machine Learning Engineer with a background in mathematics.

I have extensive experience leading data science projects in business, government and academia, developing, validating and deploying machine learning models and collaborating with a broad range of technical and clinical specialists.

Work history

  • 2025 - present: Machine Learning Engineer, Beyond Blue
    • LLM prompt engineering to extract clinical information from large datasets of text
    • Implementation in databricks of machine learning models for de-identification, information extraction and voice to text conversion
  • 2021 - 2025: Data Scientist and Researcher, Victorian Comprehensive Cancer Centre Data Connect
    • Development and implementation of cancer risk prediction algorithms in patients with non-specific symptoms
    • Development of machine learning models for clinical information extraction and predictive modelling
    • Working with a range of clinicians, data scientists, data analysts and data engineers to develop linked data infrastructure to support cancer research
    • Supervision of PhD students
  • 2018 - 2021: Data Scientist, Victorian Centre for Data Insights
    • Led projects in the areas such as disease surveillance, geolocation of COVID-19 cases and data visualisation of transport patterns
  • 2016 - 2018: Signal Processing and Machine Learning Scientist, DST Group
    • Researched and developed machine learning algorithms for information extraction from large sets of noisy time-series data
  • 2016 - 2018: Data Analyst and Programmer, GapMaps
    • Developed a system for automated collection of geospatial data to inform business network planning decisions

Skill Set

Data science:

  • Python: Pandas, Polars, Numpy, Sklearn, Keras, Pytorch, spaCy, Requests, Jupyter Notebook
  • R: Arrow, Tidyverse
  • SQL: PostgreSQL, DuckDB
  • Algorithms: Clustering (K-means, Spectral Clustering), Random Forests, LDA, Word Embeddings, Linear and Logistic Regression, Neural Networks, Record Linkage
  • Visualisation:
    • Python: Seaborn, Matplotlib, Plotly Express
    • Tableau
    • R: ggplot

Data engineering:

  • Microsoft Azure
  • Databricks certified machine learning associate

Software development:

  • Github, CI / CD, Python package development
  • Development of whereabouts package for scalable geocoding and record linkage in Python. Jointly awarded the Venables Award for Open Source Software, 2024

Education

  • PhD, University of Melbourne
  • Bachelor of Science (First Class Honours), Major in Mathematical Physics
  • Bachelor of Arts, Major in Spanish and Latin American Studies

Some recent publications

2025

  • Primary care patients presenting with unexpected weight loss in Australian general practices: replication of a diagnostic accuracy study. Lee et al., BMJ Open. 2025 Jul 28;15(7):e104690
  • Rural variations in primary care prostate cancer diagnosis and survival: A cohort study using linked Australian Primary Care Electronic Medical Record data, Olivia Wawryk, Ian M Collins, Alex Lee, Mark Buzza, Simonne Neil, Jessica Freeman, Jon Emery, Meena Rafiq
  • A methodology for the development and validation of phenotyping algorithms suitable for use in clinical decision support in primary care, Alex Lee, Sophie Chima, Luncas de Mendonça, Philip Ly, Meena Rafiq, Brent Venning, Brian D Nicholson, Jon Emery, Javeria Martinez Gutierrez, 2025, Under review

2024

  • Patient Preferences for Investigation of Cancer Symptoms in Australian General Practice: A Discrete Choice Experiment, Brent Venning, Alison Pearce, Richard De Abreu Lourenco, Rebekah Hall, Rebecca Bergin, Alex Lee, Keith Donohoe, Jon Emery , British Journal of General Practice, Feb 2024

2023

  • Data Resource Profile: Victorian Comprehensive Cancer Centre Data Connect, Lee A, McCarthy D, Bergin R, et al., International Journal of Epidemiology, Volume 52, Issue 6, December 2023, Pages e292–e300
  • Factors affecting patient decisions to undergo testing for cancer symptoms: an exploratory qualitative study in Australian general practice, Brent Venning, Rebecca Bergin, Alison Pearce, Alex Lee, Jon D Emery, BJGP Open, Mar 2023.

Some recent presentations

  • 2025: A new open source library in python for fast, accuracy geocoding, Government Advances in Statistical Programming, Washington DC, United States (online)
  • 2024: Illuminating the cancer continuum of care through large-scale primary care data linkage, Health Services Research Conference, Brisbane, Australia
  • 2024: Replication of a diagnostic accuracy study for primary care patients with unintended weight loss, Cancer in Primary Care Network Conference, Melbourne, Australia
  • 2024: Research capabilities of linked data, VCCC Data Connect Showcase, Melbourne Australia
  • 2023: Fast, accurate, open-source geocoding in Python, PyConAU, Adelaide, Australia
  • 2023: VCCC Data Connect launch, Melbourne, Australia
  • 2022: Comparing primary care patients with unintended weight loss between Australia and the UK, Oxford University, United Kingdom
  • 2022: Machine learning for detection of upper GI cancers, Making Digital Health Real seminar, University of Melbourne

Teaching

  • Developed and taught modules for a course Transforming Healthcare with Data and Analytics at the Centre for Digital Transformation of Health, University of Melbourne

Supervision

  • Currently co-supervising two PhD students
  • Co-supervised one Masters student
  • Mentored a PhD student in clinical data science