Welcome to my portfolio website! Feel free to explore my project portfolio, education, and current goals. Feel free to reach out to me via the provided links and emails if you would enjoy having me as part of your team, if you enjoy discussing entrepreneurial or scientific ideas, or if you would enjoy having a new networking friend. You can also challenge me to a game of chess on chess.com (user id: tjamesbu)!

Peaks of Los Illinizas Volcanoes visible from Laguna de Quilotoa - Photo by Thomas J. James

tjames.data@gmail.com

Education

Masters in Applied Data Science - University of Michigan August 2021 to April 2023

Awards & Achievements: School of Information Leadership Award, 3.7/4.0 GPA, Learning Analytics Summer Institute Scholarshp, 2nd Prize in School of Information 2023 Capstone Project Exposition

Bachelors of Science in Biology - Benedictine University August 2002 to May 2009

Awards & Achievements: Musical Talent Scholarship, Schmitt Math & Physics Scholarship, Cuneo Foundation Scholarship

Favorite Quotes

Make the Best Better

Be Prepared

"Because of its tiny wings and heavy body, aerodynamically the bumblebee shouldn't be able to fly. But the bumblebee doesn't know that, so it flies anyways" - Mary Kay Ash

"If you can't explain it simply, you don't understand it well enough." - Albert Einstein


"If you don't like the road you're walking, start paving another one." - Dolly Parton

Project Portfolio

Projects that have had a significant impact during my journey gaining experience in data science, machine learning, and artificial intelligence

Real-World Work Experience

Projects developed based-upon requests made by a client & information provided by the stakeholders

Role: Lead Machine Learning Engineer

● Create a Libera malaria-forecasting application to aid in the optimal distribution of malaria vaccinations

● Develop ensemble and time-series analysis models to predict malaria cases & death-counts grouped by county

● Conduct multivariate data analysis & observe what is causing 3,548 annual malaria-deaths per year

● Provide motivation when-needed to fellow project leaders and members

● Present Liberian community with a malaria-forecasting application to improve malaria survival rates

Project: Predictive Modeling and Forecasting of Malaria-Spread in Liberia

Employer: Omdena

Role: Machine Learning Engineer

Project: AI-Powered Warning System for Extreme Weather Conditions in Tanzania

Employer: Omdena

● Develop an extreme weather forecasting app to assist the Tanzania Meteorological Agency

● Determine which feature-variables can provide the most accurate predictions using EDA & feature-engineering

● Preprocess satellite-imagery into numerical-vector embedding using segmentation, masking, banding, and development of a convolutional neural network

● Synthesize CNN-LSTM & ConvLSTM models for comparison of model-performance strengths & weaknesses

● Lead semiweekly meetings & provide feedback for fellow project members' progresses

● Provide Tanzanian citizens a Streamlit application capable of predicting extreme weather events 20% faster than currently available applications

Role: Machine Learning Engineer

Project: Developing Personalized AI Travel Advisors for Paris Olympics 2024

Employer: Omdena

● Design & develop a personalized AI-Travel-Advisor for Paris tourists attending the 2024 Olympic Games

● Guide team through proper methods of data-security while performing data-collection

● Perform Prompt-Engineering on multiple LLM models; including Llama3, Gemma, and Mistral

● Retrieve external CSV & PDF file data using Retrieval Augmented Generation (RAG)

● Develop multiple multi-AI agent models which use RAG, APIs, & LLMs to personalize the user-experience

● Exchange ideas with project teammates to promote collaboration & teamwork

● Deliver the Paris 2024 Olympics AI Travel Advisor application prior to the Olympics' official start

Role: Junior Machine Learning Engineer

Project: Mapping Seagrass Meadows with Satellite Imagery & Computer Vision

Employer: Omdena

● Develop predictions of where the highest decline of Posidonia oceanica seagrass meadows will occur next in Italy and Greece regions of the Mediterranean Sea

● Collect & preprocess satellite imagery using Google Earth Engine, masking, segmentation, banding, histogram analysis, data augmentation, and image-resolution adjustments necessary while ensuring data-variation

● Experiment & analyze application of multiple computer-vision deep-learning models such as UNet, Unet+, UNet++, and DeepLabV2

● Fine-tune & apply the UNet++ convolutional neural network capable of pixel-level classification between satellite-imagery ground-truth data and variable-data

● Forecast locations of species' disappearance-rates to help support Mediterranean-seagrass restoration efforts

Academic Work Experience - University of Michigan

Projects developed while completing the University of Michigan's Master in Applied Data Science degree

Role: Data Scientist

Project: Recommender System Engineering - Developing Online-Course Recommendations

Institution: University of Michigan School of Information

● Develop content-based recommender system to suggest online courses according to occupation title

● Perform data gathering and preprocessing using Python, ETL, SQL (PostgreSQL), Natural Language Processing, Tensorflow, and AWS

● Conduct comparisons amongst word-vectorizers and word-embeddings: GloVe, BERT, RoBERTa, and tf-idf

● Reach cosine similarity levels of up to 93% for recommended courses

Project Pitch: "Thousands of people are interested in professional development courses online, but do not know where to start. Online-course recommendations can provide directional guidance to professionals eager for continuous growth."

Role: Data Scientist

Project: NBA Data Predictions & Analytics Using Machine Learning & Natural Language Processing

Institution: University of Michigan School of Information

● Contribute teamwork towards successful visualization of Golden State Warriors and Boston Celtics Twitter conversations’ potential positive and negative influences

● Detect influences of Twitter conversations using regex, sentiment-analysis, semantic-analysis, sarcasm-detection, Word2Vec models, latent semantic indexing, latent dirichlet allocation, and topic-coherence models on 8,195 columns of text data

● Gather and present scalable insights & analytics through data visualizations, observing the increase in conversations, and dominance of criticisms, during the 2022 playoffs

Project: Predicting Rankings of Marathon Runners

● Predict rankings of marathon runners using raw datasets, python programming, exploratory data analysis, feature engineering, data preprocessing, data analytics, and machine learning algorithms

● Establish optimal hyper-parameter levels and feature variables for machine-learning rank forecasting to produce predictive model results at a 74% accuracy

● Forecast marathon ranking predictions developed while reducing inaccuracy percentages from 35% down to 26%

● Assisted an architectural student’s project through synthesizing new office floor plan images

● Created reconstruction of project plans after 45 days to accommodate team adjustments

● Developed a pix-2-pix GAN model using an undisclosed library of office floor-plan images, AWS, and experimentation with using different analytical and statistical approaches

● Halted project during fine-tuning stages to join a new group and new project

● Extract and recode the "final result" grade data

● Perform data wrangling and feature engineering to ensure all sub-set and engineered files are combined into one flat file

● Apply a weight to the scores to acheive a more accurate data representation

● Constrain the prediction model to only record up to day 60 of the course

● Apply Logistic Regression and Random Forest Classifier models to visualize predicted probabilities and identify a probability cut-off for identification of highest-risk students needing extra attention

● Develop data analysis that compares Zillow house prices with LANDSAT satellite imageries and air pollution data

● Prepare data for analysis using ETL, data cleaning, data preprocessing, EDA, and data visualization

● Perform exploratory data analysis to determine which feature variables and feature engineering are needed to develop optimal visualizations

● Observe and analyze results statistically through visualizations such as heat maps, histograms, and scatterplots

● Correlate satellite imagery of vegetation levels existing with air pollution and house prices

● Discover 17.5% variance in housing-prices

● Learn technical strategies to utilize and pitfalls to avoid while conducting data analysis and supervised learning

Role: Data Scientist

Institution: University of Michigan School of Information

Role: Voluntary Data Scientist for an Architecture Project

Project: Synthesizing Office Floor-Plans Using Generative Adversarial Networks

Institution: University of Michigan School of Information

Role: Data Scientist

Project: Predictive Model to Identify Students Who Are At-Risk of Failing a Course

Institution: University of Michigan School of Information

Role: Data Scientist

Project: Environmental Effects On Housing Prices (Milestone I)

Institution: University of Michigan School of Information

Text Prediction Application - Data Science Specialization Capstone Project

● Develop a word based statistical model

● Send the text data through a Markov Chain Model

● Add smoothing to develop a less-biased prediction model

● Predict the next word, or group of words, in a sentence

Building a Dashboard - Data Analyst Specialization Capstone Project

● Collect data using APIs and webscraping

● Conduct data cleaning, data normalizing, data wrangling, and exploratory data analysis

● Visualize distribution of the data using histograms, box plots, scatter plots, bubble plots, pie charts, stacked charts, line charts, and bar charts

● Asemble a dashboard with Cognos Dashboard Embedded

● Present the results

Advanced Computer Vision with TensorFlow

Building, Evaluating, and Testing Pre-trained Models - AI Engineering Capstone Project

● Basic Transfer Learning with Cats & Dogs Data

● Transfer learning to train CIFAR-10 dataset on ResNet50

● Image Classification and Object Localization

● Use transfer-learning on a pre-trained model to predict the bounding boxes in the Caltech Birds - 2010 dataset

● Object Detection using TensorFlow

● Zombie Detection - Use object detection API and retrain RetinaNet to spot zombies using only 5 training images

● Fine-tuning of a RetinaNet architecture

● CNNs & U-Net for Image Segmentation

● Mask R-CNN Image Segmentation

● Image Segmentation of Handwritten Digits

● Class Activation Maps

● Generating Gradient-Weighted Class Activation Maps

● Saliency Maps

● Leverage pre-trained models to build image-classifiers

● Utilize a linear classifer with PyTorch to determine the potential maximum accuracy using validation data for 5 epochs

● Build an image classifier using the VGG16 pre-trained model

● Evaluate and compare the image classfier's performance to another image classifier model built using the ResNet50 pre-trained model

Building Deep Learning Models with TensorFlow

● Initiate Eager Execution when needed

● Perform Linear Regression and Logistic Regression using TensorFlow

● Classify hand-written digits using a Multi-layer Perceptron network and Convolutional Neural Network

● Execute a Recurrent Neural Network for language-modeling using the Long Short-Term Memory unit model

● Detect the most important data features using a Restricted Boltzmann Machine model

● Utilize Autoencoders, the building blocks of Deep Belief Networks, to perform Feature Extraction and Dimensionality Reduction

● Utilize Autoencoders to extract the emotion that a person in a photograph is feeling

● Scale processing speeds between CPU and GPU

Scalable Machine Learning on Big Data using Apache Spark

● Apply basic functional and parallel programming

● Analyze a real-world dataset and apply machine learning using Apache Spark

● Classify hand-written digits using a Multi-layer Perceptron network and Convolutional Neural Network

● Execute a Recurrent Neural Network for language-modeling using the Long Short-Term Memory unit model

● Detect the most important data features using a Restricted Boltzmann Machine model

● Utilize Autoencoders, the building blocks of Deep Belief Networks, to perform Feature Extraction and Dimensionality Reduction

● Utilize Autoencoders to extract the emotion that a person in a photograph is feeling

● Scale processing speeds between CPU and GPU

Deep Neural Networks with PyTorch

Machine Learning with Python

● Use Dropout method for Classification

● Test Sigmoid, Tanh, and Relu activation functions on an image set with two hidden layers

● Create model, optimizer, and total loss (cost) function using PyTorch

● Train the model via Mini Batch Gradient Descent

● Plot the Dropout model with and without Dropout applied

● Classify an image dataset using Convolutional Neural Networks (CNN)

● Create a dataset class

● Create a CNN using CNN-batch or CNN as a constructor

● Create objects for criterion and optimizer

● Adjust optimizer to provide a Standard Gradient Descent, a 0.1 learning-rate, and Cross-Entropy Loss

● Create a model, train the model, and test the model using:
Simple Linear Regression
Non-linear Regression
Multi-linear Regression
Polynomial Regression
Decision Trees
K-Nearest Neighbors
Logistic Regression (Customer Churn)
Support Vector Machines
Agglomerative Hierarchical Clustering
K-Means Clustering (Customer Segmentation)
Density Based Clustering (DBSCAN)

● Identify the best classification algorithm

● Develop a Recommender System with a Collaborative Filter that can recommend movies

● Develop a Recommender System with a Content-based Filter that can recommend movies

The Zombies are Coming!!!

Introduction to Deep Learning & Neural Networks with Keras

● Building, training, and testing a Neural Network to create Classification Models using Keras

● Increase the number of hidden layers

● Compute the Mean Squared Error

● Convolutional Neural Networks with Keras

● Regression models with Keras

● Forward Propagation using Artificial Neural Networks

Geographic Information Systems

● Identify a current challenge in existence for a specific region that leaders can observe and potentially address

● Region of Seattle is idenified as a location with potential rain-water drainage issues

● Flow direction, Flow accumulation, and Flow length are mapped mapped using ArcGIS software

● Improved observations of Seattle areas with the most critical rain-water drainage issues are identified and potentially addressed

Academic Work Experience - Coursera

Projects developed while completing Coursera online courses & specializations

Completed Online Course Specializations

Online Course Specializations are taken through Coursera. Specializations typically are three to ten courses in length, with courses lasting anywhere from three to eight weeks.

Contact Me

Feel free to reach out to me if you have any questions, are interested in networking & connecting, enjoy discussing new ideas, or would like to collaborate on any projects

tjames.data@gmail.com