trtutils
Lightweight and generic TensorRT engines in Python, with a fast YOLO implementation
Lightweight and generic TensorRT engines in Python, with a fast YOLO implementation
JIT compiled wrappers and utilities for OpenCV allowing fast and highly threaded workloads.
Allows processor utilization and power draw profiling with context managers, and other utilities for NVIDIA Jetson.
Wrapper around depthai API for easily defining pipelines, additionally packages a custom model compiler using PyTorch.
Remotely run Python scripts on multiple devices concurrently.
Justin Davis and Mehmet E. Belviranli
Published in Design Automation Test Europe (DATE) 2024, 2024
Improving energy efficiency on heterogenous compute systems by exploiting non-monotonic relationships between accuracy-energy-latency between model and hardware architecture pairs.
Published:
Delivered a guest lecture on interrupts in operating systems, covering key concepts, practical examples, and engaging in a Q&A session with students.
Published:
Delivered a guest lecture on parallel programming models, focusing on CUDA and demonstrating kernel creation. Additionally, covered general-purpose parallel programming frameworks such as OpenCL and SYCL with brief examples. Discussed domain-specific languages like Halide and hardware-specific acceleration models such as TensorRT, highlighting their applications and distinctions in high-performance computing.
Published:
Delivered a guest lecture on non-CPU hardware models and non-Von Neumann architectures, covering Flynn’s taxonomy, including Single Instruction Single Data (SISD), Single Instruction Multiple Data (SIMD), Multiple Instruction Single Data (MISD), and Multiple Instruction Multiple Data (MIMD) models. Additionally, discussed very-long-instruction-word (VLIW) processors and field-programmable gate arrays (FPGAs), highlighting their architectures, applications, and distinctions from traditional computing paradigms.
Published:
Presented a conference proceeding talk on optimizing object detection deep neural networks (DNNs) for edge devices, focusing on the role of context awareness in improving energy efficiency. The talk explored the inefficiencies of a one-size-fits-all approach in continuous mobile object detection (OD) tasks and introduced SHIFT, a framework that dynamically selects among multiple OD models based on contextual information and computational constraints. Additionally, the discussion highlighted how SHIFT leverages multi-accelerator execution to optimize energy efficiency while meeting latency requirements, achieving up to 7.5× energy savings and 2.8× latency reduction compared to state-of-the-art GPU-based single-model OD approaches.