Research
The EDGE
(DistributED computinG infrastructurE
) lab evaluate the design, performance, and
security aspects of complex distributed computing infrastructures. Following the teacher-scholar
model of West Chester University, I encourage undergraduate students to reach out and talk to me
about participating in my research projects.
Current Research Themes
Machine Learning
Starting in 2022, the EDGE lab has developed an interest in areas related to AI/ML. The following projects are currently active.
- Understanding how modern LLM frameworks (ChatGPT/Gemini) can help solving
complex programming problems.
- Tran, N. D., May, J. J., Ho, N., & Ngo, L. B. (2023).
Exploring ChatGPT's Ability to Solve Programming Problems with Complex Context
. Journal of Computing Sciences in Colleges, 39(3), 195-209. - Ho, N., Le, T. C., Huynh, N. T., & Ngo, L. B. (2023, December).
Causal Associations between Temporal Events
. In 2023 IEEE International Conference on Big Data (BigData) (pp. 1135-1142). IEEE. - Avina, V. D., Amiruzzaman, M., Amiruzzaman, S., Ngo, L. B., & Dewan, M. A. A. (2023).
An AI-Based Framework for Translating American Sign Language to English and Vice Versa
. Information, 14(10), 569.
- Tran, N. D., May, J. J., Ho, N., & Ngo, L. B. (2023).
Complex Data Analytics
This theme aligns with the materials from Big Data Engineering (CSC467). We identify large/complex data sets and attempt to develop interesting analytical questions using various data engineering, machine learning, and natural language processing techniques.
-
Development of a different measure of movie success that is based on understanding the actual creative content of the movies, expressed via script analysis.This work requires an comprehensive acquisition of different relevant data, including manual data downloading activities (box office results), automated mining (using libraries to mine movie scripts), and large-scale data processing (mining online user reviews from Reddit’s movie subreddits). Next, data mining techniques are applied to consolidate these various data sources using strings of predefined movie titles as the keys. Statistical and textual analysis using natural language processing will be applied to the acquired data to identify related patterns between movies' metrics of success.
- Cassel, H., Pham, M. N. A., Nguyen, T. H., & Ngo, L. B. (2024, October).
Understanding Success Metrics of Movies using Natural Language Processing Script Analysis​
. Poster Presentation at the CCSC Eastern's 40th Annual Regional Conference.
- Cassel, H., Pham, M. N. A., Nguyen, T. H., & Ngo, L. B. (2024, October).
-
Understands how to extract critical information from security logs.
- Clark, T., Codd, K., & Ngo, L. B. (2021, April).
Studying Break-in Attempts Across Multiple Servers Using Apache Spark and Security Logs
. Spring Conference of the Pennsylvania Computer and Information Science Education. Best undergraduate paper.
- Clark, T., Codd, K., & Ngo, L. B. (2021, April).
Cloud-based Large Scale Infrastructures
This research theme focuses on the deployment and performance evaluation of large scale computing infrastructure within a cloud environment using containers.
-
The container-based development of large-scale infrastructures lead to useful study about how they can be applied to undergraduate education, especially at institutions with limited resources.
- Reppert, A., Montecinos-Velazquez, B., Kahl, H., Reid, R., Rivas, D., Spampinato,
D., Zhong, H. & Ngo, L. B. (2023).
A Kubernetes Framework for Learning Cloud Native Development
. Journal of Computing Sciences in Colleges, 38(8), 99-108. - Ngo, L. B., & Bui, H. (2023).
Sustainable and Scalable Setup for Teaching Big Data Computing
. Journal of Computational Science, 14(1). - Ngo, L. B. (2022).
Experience in Teaching Cloud Computing with a Project-Based Approach
. Journal of Computing Sciences in Colleges, 38(3), 107-119.
- Reppert, A., Montecinos-Velazquez, B., Kahl, H., Reid, R., Rivas, D., Spampinato,
D., Zhong, H. & Ngo, L. B. (2023).
-
Development of a CloudLab-based Docker profile that contains all relevant components of a high performance computing infrastructures, including login nodes, scheduler, compute nodes, and parallel file systems. The goal is to create a light-weight cloud-based HPC blueprint that can either be deployed on small platforms (or single PCs) for educational purposes or on large-scale platforms in conjunction with Docker Swarm/Kubernetes for production activities.
- Ngo, L. B., & Kilgannon, J. (2020).
Virtual cluster for HPC education
. Journal of Computing Sciences in Colleges, 36(3), 20-30.
- Ngo, L. B., & Kilgannon, J. (2020).
Past Research
Large Scale Computing Infrastructure for Connected Vehicle Technology (2016-2022)
This project area focused on studying the application of large scale computing infrastructures in connected vehicle technologies. This includes high performance computing, framework to support complex workflow, and message delivery system for CV applications.
-
Developing of a workflow pipeline to support combining both traffic simulation and robotic-based collision simulation running on HPC clusters in order to generate synthetic traffic collision data.
- Franchi, M., Kahn, R., Chowdhury, M., Khan, S., Kennedy, K., Ngo, L. B. , & Apon, A. (2022).
Webots. HPC: a parallel simulation pipeline for autonomous vehicles
. In Practice and Experience in Advanced Research Computing (pp. 1-4). - Franchi, M., Kahn, R., Ngo, L. B., Khan, S., Chowdhury, M., Kennedy, K., & Apon, A. (2022). A Parallel Autonomous Vehicle Simulation Pipeline on High-Performance Computing (No. TRBAM-22-03729).
- Halabi, W. H., Smith, D. N., Hill, J. C., Anderson, J. W., Kennedy, K. E., Posey, B. M., Ngo, L. B. & Apon, A. W.
(2021, April).
Viability of azure iot hub for processing high velocity large scale IoT data
. In Companion of the ACM/SPEC International Conference on Performance Engineering (pp. 73-76).
- Franchi, M., Kahn, R., Chowdhury, M., Khan, S., Kennedy, K., Ngo, L. B. , & Apon, A. (2022).
-
Developing an efficient and low-latency distributed message delivery system for CV applications using a distributed message delivery platform. This strategy enables large-scale ingestion, curation, and transformation of unstructured data (roadway traffic-related and roadway non-traffic-related data) into labeled and customized topics for a large number of subscribers or consumers, such as CVs, mobile devices, and data centers.
- Du, Y., Chowdhury, M., Rahman, M., Dey, K., Apon, A., Luckow, A., & Ngo, L. B. (2017).
A distributed message delivery infrastructure for connected vehicle technology applications
. IEEE Transactions on Intelligent Transportation Systems, 19(3), 787-801. - Khan, S. M., Chowdhury, M., Ngo, L. B., & Apon, A. (2020).
Multi-class twitter data categorization and geocoding with a novel computing framework
. Cities, 96, 102410.
- Du, Y., Chowdhury, M., Rahman, M., Dey, K., Apon, A., Luckow, A., & Ngo, L. B. (2017).
Scalable Forward Flux Sampling, ScaFFS: Software platform to study rare events in molecular simulations (2015-2016)
This project develops a novel software platform called Scalable Forward Flux Sampling (ScaFFS) to perform large scale forward flux sampling (FFS) calculations. FFS is an advanced sampling technique developed to enhance the sampling of rare events in molecular simulations. Rare events, by definition, are events that have very low probability of occurring in the typical observation timescales. They often mark an important transition and are processes of considerable interest. In materials science, several important processes such as nucleation driven phase transitions are rare events. Studying such processes through molecular simulations tools is challenging since the waiting time for them to occur is longer than the common timescales spanned through straightforward simulations. Consequently, majority of the computational effort is wasted on the uninteresting part of waiting for the event to occur. ScaFFS represents a collaboration of state-of-the-art techniques in molecular simulations with those from Big Data to enable rare event simulations at massive scales. ScaFFS is designed to be adaptive, data-intensive, high-performance. ScaFFS is actively used for current research in molecular modeling and computer simulations at Clemson University.
Synthetic data generation for the internet of things (2014-2015)
The concept of Internet of Things (IoT) is rapidly moving from a vision to being pervasive in our everyday lives. This can be observed in the integration of connected sensors from a multitude of devices such as mobile phones, healthcare equipment, and vehicles. There is a need for the development of infrastructure support and analytical tools to handle IoT data, which are naturally big and complex. But, research on IoT data can be constrained by concerns about the release of privately owned data. In this project, funded by our industry partner, we designed and implemented of a synthetic IoT data generation framework. The framework enabled research on synthetic data that exhibit the complex characteristics of original data without compromising proprietary information and personal privacy.
Evaluating the effect of cyberinfrastructure on universities' production process (2012-2015)
This project undertakes an interdisciplinary and novel approach to the problem of measuring the effects of investment in cyberinfrastructure to universities' production processes of research outputs and vital educational services. A decision to support funding of the infrastructure that supports research, or a decision to support funding of focused research activities, is an increasingly critical decision with far-reaching impacts not only to the institutions receiving those funds, but also to national competitiveness. While it is generally agreed that cyberinfrastructure is essential to scholarly inquiry in some science fields, the scope of cyberinfrastructure's broad effects on the growth of knowledge, to the academic enterprise, and to areas of science has not been explicitly quantified. Using new central limit theorems and hypothesis testing techniques, we applied frontier efficiency analysis to examine the impact of policy about cyber infrastructure on the efficiency of institutional research performance. During the project, we have utilized HPCC, an alternative data infrastructure, to ingest, curate, and integrate large amount of higher education data from various sources. We have also proposed the adoption of social media paradigm onto academic research publication environment. The outcomes of the project have been presented and well received at professional venues such as Birds-of-a-Feather Sessions in Supercomputer 2013 and 2014, and at the annual Coalition for Academic Scientific Computation meeting.