The STC Cloud Platform is now up and is a great resource for benefiting STC-related research. This cloud service holds servers that can be accessed over the internet, and correlating advantages of the cloud include larger storage, faster computing, easier management, and helping reduce cost. The computer infrastructure for the cloud servers is based at the GMU Data Center and holds three computing clusters with 600 nodes designed for different purposes in the GMU Data Center, which contains 16,800 cores, 75 terabytes of memory, and 600 terabytes of storage in total.
The GMU Data Center uses Hadoop Distributed Filesystem (CDH), message passing interface powered by High-Performance Computing cluster (HPC), and OpenStack based Cloud Sharing Platform. CDH, or Clouder Distribution Hadoop, has automated deployment and configuration, customizable monitoring and reporting, effortless/robust troubleshooting, and zero downtime maintenance. CDH can help with STC research by utilizing Hadoop Distributed File System (HDFs), Yet Another Resource Negotiator (Y ARN), programming-based data processing (MapReduce), in-memory data processing (Spark), query-based processing of data services (PIG, HIVE), NoSQL Database (HBase), machine learning algorithm libraries (Mahout, Spark, MLLib), searching and indexing (Solar, Lucene), manager cluster (Zookeeper), and job scheduling (Oozie). Research that is already based on the Cloud platform includes COVID-19 Data collection and maintenance, Air Quality, Cloud Classification, COVID19 vaccine-related tweets analysis, and Street-level AQ Downscaling.