Review Notes

I was finally able to finish this nanodegree (due to some unfortunate changes in holiday plans). It was definitely the most time-consuming nanodegree I’ve done, even more so than the AI for Trading one. I’ve got a sort of love/hate relationship with Udacity as their projects are often bit confusing and even buggy, which makes it both frustrating but also nicely simulates the real-life situation where there is no pre-processed dataset and boilerplate template code (as in some ML courses). In this nanodegree I had to do four separate projects:

In the first project I designed a algorithm for detecting pneumonia from over 100k X-ray images and metadata from 30k patients, stored in DICOM files. I used a transfer learning by applying a CNN architecture (VGG16) that had been pre-trained on the ImageNet dataset, and only trained the final few, fully connected layers while freezing the convolutional layers. In addition I had to write a full FDA Validation Plan for medical algorithms, including intented use, algorithm design & function as well as details on the training and performance.

In the second project I dived into 3D medical imaging by quantifying hippocampal volume foe Alzheimer’s Progression using an Unet architecture implemented in PyTorch. Lot of time went just to making sense of the DICOM/NIFTI formats, exploring scans in 3D slicer to figure out the voxel spacings and which way the axial/coronal/sagittal planes are aligned. The project was especially tricky as it included simulating a whole hospital infra with Orthanc PACS server that received the DICOM files from simulated scanner and routed them to the AI server that runs the Unet inference and returns the results back to PACS.

In the third project I build a predictive regression model for patient selection in diabetes drug testing (estimated hospitalization time). I’m a PyTorch person, but this project required me to dig quite deep into Tensorflow ecosystem and use many pieces like the TF Data Validation, DenseFeatures, Sequential API, Feature Column API. I even estimated the uncertainty of the model using TensorFlow Probability faciliating risk priorization and triaging of predictions. Furthermore, I used the Aequitas framework to assess the biases based on gender and ethnicity.

In the final project I got to work on familiar topic creating a heart-rate estimator for PPG signal in the presence of movement artifacts recorded with the accelometer (classic case with wearable wrist devices).

While I learned a ton, I also felt that most of the modeling was again done in the usual Jupyter notebook fashion. I’m really curious about applying the latest MLops techniques and technologies to this domain to see what the benefits could be for research collaboration. To also learn myself by teaching, I’ve been thinking of starting a blog series where I’ll in each article take a separate hype MLops stack and see could it streamline the analysis of, for example, tons of MRI images with messy metadata.