Token-Aware Vision Transformers for Efficient Inference
MD. Faysal Islam Fahad, A. Chen, J. Smith
We present a learnable token-pruning module that adapts the computation graph of a Vision Transformer at inference.
I research vision transformers, multimodal learning, and large-scale model training. I turn cutting-edge papers into shipped, production-grade AI.
I bridge cutting-edge AI research with engineering rigor — building deep learning systems that actually ship.
I am a Deep Learning Engineer and AI researcher working at the intersection of computer vision, multimodal learning, and large-scale model training. My work focuses on pushing transformer architectures into resource-constrained, real-world settings — from medical imaging to autonomous perception. I enjoy turning recent papers into production systems, and shipping models that survive contact with real data.
Trained ViT-B/16 from scratch on a custom 5M image dataset
Published at top-tier vision conferences (CVPR / ICCV workshop track)
Led on-device CV inference on Jetson Orin (sub-30ms latency)
Open-source contributor with 1.5k+ GitHub stars across repos
Active investigations across vision transformers, self-supervised learning, and efficient inference.
A selection of deep learning case studies — from architecture design to deployment-grade inference pipelines.
A vision transformer that detects pathology in chest X-rays at radiologist-level accuracy.
Frameworks, models, and ops I use daily — calibrated by years of training, debugging and shipping.
Papers, workshops and preprints — the slowest-moving but most rewarding part of the work.
MD. Faysal Islam Fahad, A. Chen, J. Smith
We present a learnable token-pruning module that adapts the computation graph of a Vision Transformer at inference.
MD. Faysal Islam Fahad, R. Patel
Cross-modality masked autoencoder pretraining for medical image classification.
MD. Faysal Islam Fahad
A study of distillation strategies for compact object detectors targeting embedded hardware.
Roles, programs and research positions that shaped how I think about AI.
Leading research on efficient multimodal models and edge deployment.
Shipped real-time perception models for warehouse robotics.
Master's degree with thesis on self-supervised learning for medical imaging.
Selected highlights — the moments that pushed the work forward.
CVPR Workshop on Efficient DL
Awarded best paper for our work on token-aware Vision Transformers.
NVIDIA
Won first place for an autonomous drone perception stack.
Kaggle
Ranked Master tier across image classification competitions.
Essays on deep learning, research engineering, and the messy reality of training models.
Whether it's a research collaboration, a hard CV problem, or a deep learning hire — I'd love to hear about it.