Funded Projects

ID Name Period
STC-19-00 Advancing Spatiotemporal Studies to Enable 21st Century Sciences and ApplicationsMany 21st century challenges to contemporary society, such as natural disasters, happen in both space and time, and require spatiotemporal principles and thinking to be incorporated into computing process. A systematic investigation of the principles would advance human knowledge by providing trailblazing methodologies to explore the next generation computing models for addressing the challenges. This overarching center project is to analyze and collaborate with international leaders, and the science advisory committee to generalize spatiotemporal thinking methodologies, produce efficient computing software and tools, elevate the application impact, and advance human knowledge and intelligence. This final report will summarize our efforts and achievements on : a) building spatiotemporal infrastructure from theoretical, technological and application aspects, b) innovating the spatiotemporal studies with new tools, systems, and applications, c) educating K-16 and graduate students with proper knowledge, d) developing a community for spatiotemporal studies from center sites and members at regional, national to global levels through IAB meetings, symposium, and other venues. 2022-2023
GMU-23-03 ClassX: Automatic Labeling Tool
The ClassX project is based on the development of an automatic training dataset labelling tool and online service to fill the gap of missing high-quality training image datasets. Novel spatiotemporal AI/ML-based capabilities are being developed to automatically classify, label, store, and share training datasets among a group of needed users. Specifically, 1) The auto labelling tool fills the gap between time-consuming, tedious manual labelling and the demands for enormous amounts of high-quality training datasets; 2) Well-labelled image datasets will be able to help build accurate and reliable machine-learning models across many science and engineering domains; 3) The image labelling tool can automatically classify and label digital images and thus provide image labelling services to various industries; 4) While initially designed for sea ice research that can automatically label sea ice images, the ICAP program helped us confirm that the tool and service have potential in many domains such as heliophysics, climate change, and biomedical industry.
2023-2024
GMU-21-01 Improving ground-level air quality prediction using numerical simulations 

Objective: Develop an innovative methodology to retrieve the PM2.5 in global scale and further downscale the spatiotemporal resolution to 1 km and hourly level in some key regions, using artificial intelligence (AI) models.
Major task: 1. Deep learning for PM 2.5 retrieval using satellite remote sensing, model simulation and ground observation; 2. Deep learning for PM2.5 prediction and downscaling using meteorological data with AOD spatial pattern.
Example: AQ database, prediction and monitoring systems of LA city
Expected results: 1. PM 2.5 estimation covering global scale; 2. hourly 1km*1km (500m*500m) PM2.5 for LA region

2022-2023
STC-20-01 Innovating a Computing Infrastructure for Spatiotemporal Studies 

With AI-based processing, big data analytics, and simulations, researchers must have access to the computing power necessary to fulfill these demands. We have created a cross-site center-wide computing infrastructure to provide these capabilities to center members. The computing infrastructure consists of three computing clusters of 600 nodes used for various purposes within the center. This infrastructure offers advanced computing capabilities that all center projects utilize in their research. Continuous maintenance and support for the infrastructure enable center members to have reliable resources improving their research capabilities.

2022-2023
STC-19-02 Spatiotemporal Innovation Testbed 

Space situational awareness (SSA) learns current and predictive knowledge of space events, threats, activities, conditions and space system (space, ground, link) status capabilities, constraints and employment. With data collected from telescopes, satellites and other sources, thousands of space objects are tracked, cataloged, and maintained, however, big observation data need to be collected constantly to distill such kinds of knowledge, which pose grant challenges to data management system. The goal of this project is to develop a big space observation data analytical platform to better assist space situational awareness. The distributed storage layer supports storage and access to space observation data with parallel I/O. The metadata layer will manage metadata and interact with a smart search engine to provide efficient and accurate data discovery functionalities. The analytical layer serves as an efficient and effective tool to mine spatiotemporal patterns, detect and predict events in near-Earth space. Finally, the visualization layer presents the orbit of natural and manmade objects in the near-Earth space. By distilling knowledge from dispersed observation data, this big data analytical platform is expected to advance space situational awareness across government agencies and scientific communities.

2022-2023
GMU-17-04 Geo-JPSS Flood Detection  Flood detection software has been developed to generate near real-time flood products from VIIRS imagery. SNPP/JPSS VIIRS data show special advantages in flood detection. The major activities and accomplishments specific objectives in the reporting period is Flooding application. The plan for next reporting period is to 1) improvement current flood product, 2) develop 3-D flood parameters: flood water surface level, flood water depth, high resolution flood maps and 3) Further analysis on regional flood patterns.  2021-2022
Hvd-20-04 Developing an active-learning based platform to evaluate the Impacts of China’s Belt-Road Initiatives using high-resolution satellite imagery  

Currently, there is not any geospatial dataset that tracks the growth or deterioration of roads and railways worldwide. This has been a major obstacle in assessing the effectiveness of transportation development projects, which is important for developing countries to make informed decisions on these expensive investments. We propose to develop an active learning-based platform to generate such data for the benefit of a broad research community with interest in transportation evaluation. This system will facilitate human annotators to map roads and railways using historical and up-to-date high-resolution satellite imagery. It has three building blocks: First, combine pixel-wise segmentation-based and graph-based neural networks to generate proposed roads connections based on existing labels from Open Street Map; Second, enable annotators to accept or edit correctly predicted roads, reject false positives; Third, internalize the inputs from annotators and retrain the model with the new data, which will reinforce the model to make better predictions over time. As a proof of concept, our first application of the system is analyzing the impact of the Belt and Road Initiative (BRI), which embodies unprecedented transportation upgrade and construction projects in Asia in the past decade, on economic development in related countries. Applying recent developments in remote sensing to satellite imagery before and after BRI projects were undertaken, we will link the extracted road and rail networks with the detected expansion of urban areas detected from a larger set of daytime and nighttime imagery, and estimate the impact of BRI investments on the spatial distribution of economic activity.

2021-2022
GMU-20-04 Using Machine Learning Methods to Improve the Categorization of Health Questions  

Currently, there is not any geospatial dataset that tracks the growth or deterioration of roads and railways worldwide. This has been a major obstacle in assessing the effectiveness of transportation development projects, which is important for developing countries to make informed decisions on these expensive investments. We propose to develop an active learning-based platform to generate such data for the benefit of a broad research community with interest in transportation evaluation. This system will facilitate human annotators to map roads and railways using historical and up-to-date high-resolution satellite imagery. It has three building blocks: First, combine pixel-wise segmentation-based and graph-based neural networks to generate proposed roads connections based on existing labels from Open Street Map; Second, enable annotators to accept or edit correctly predicted roads, reject false positives; Third, internalize the inputs from annotators and retrain the model with the new data, which will reinforce the model to make better predictions over time. As a proof of concept, our first application of the system is analyzing the impact of the Belt and Road Initiative (BRI), which embodies unprecedented transportation upgrade and construction projects in Asia in the past decade, on economic development in related countries. Applying recent developments in remote sensing to satellite imagery before and after BRI projects were undertaken, we will link the extracted road and rail networks with the detected expansion of urban areas detected from a larger set of daytime and nighttime imagery, and estimate the impact of BRI investments on the spatial distribution of economic activity.

2021-2022
GMU-22-02 A formal Study of AI/ML for Air Quality data analyticsAccording to the World Health Organization, climate change and pollutant emissions worsen our air, killing 7 million people yearly. Additionally, one third of deaths from stroke, lung cancer, and heart disease are due to air pollution(https://www.who.int/airpollution/news-and-events/how-air-pollution-is-destroying-our-health). 

Economic development and industrial reopening following the COVID-10 Pandemic have worsened air quality (AQ) in recent years. Many air pollution and AQ measurement systems are used to collect data about air pollutants such as methane, ozone(O3), nitrogen dioxide (NO2), sulfur dioxide (SO2), and particulate matter (PMx). Due to distinctions between collection methods, resulting datasets have different resolutions, frequencies, and reliabilities. Quality control and cross-calibration techniques are needed to allow for comparison between pollutants and measurement techniques.

Producing comprehensive AQ datasets to estimate ground level pollutant concentrations has posed a grand challenge in the past decades for Earth science and information technologies research. Previous researchers have used AI and machine learning (ML) to analyze and estimate AQ data; however, no systematic study has been done to determine the optimal model/algorithm for each pollutant and training dataset.

We will conduct a formal study on AI/ML and geospatial methods for air pollutant simulation, retrieval, and prediction utilizing relevant training datasets and pure machine learning (ML) tools to provide event simulation, and accurate forecasting. We will prepare pollutant (methane, O3, NO2, SO2, and PMx) and covariate datasets and apply various machine learning models that predict surface level pollutant concentrations.

We will leverage our results to develop and configure an open source ML package which integrates with the AQACF air quality analytics tool and provides users with the optimal machine learning algorithms and parameters to downscale various pollutants.

This project will further the advancement of cloud computing and new observation systems. George Mason University (GMU) proposes to collaborate with NASA JPL, Los Angeles (LA) City, NASA center for climate simulations (NCCS), and the Earth Science Information Partnership (ESIP).

2022-2023
Hvd-20-01 Cloud-based Large-scale High-resolution Mangrove Forests Mapping with Satellite Big Data and Machine Learning 

Mangrove forests make up one of the most productive ecosystems on the planet, providing a variety of goods and services from which we benefit. In addition, mangrove forests have the capability to sequester four times more carbon dioxide than upland forests, mitigate the impacts of natural hazards on coastal communities, and support biodiversity conservation. However, they are being destroyed at an alarming rate by human activities such as aquaculture, agriculture, and coastal development. To characterize mangrove forest changes, evaluate their impacts, and support relevant protection and restoration decision makings by government agencies and NGOs, accurate and up-to-date mangrove forests mapping at large spatial scales is essential.

Available large-scale mangrove forest data products were created commonly with 30 m Landsat imagery, and significant inconsistencies remain among these data products. With high resolution satellite data (e.g., Sentinel-1 and Sentinel-2) open to the public, the availability of high performance cloud computing, and the recent progresses in machine learning, it has become feasible to map coastal mangrove forests at large spatial scales with better resolution, accuracy, and frequency.

The objective of this proposed project is to develop a methodology that can be used for generating 10 m mangrove forest spatial distribution data products for any region across the globe annually, therefore providing the most accurate information about the spatial temporal changes of mangrove forests and effectively supporting mangrove ecosystem protection and restoration efforts. Our approach is to combine satellite big data processing on cloud platform (e.g., Google Earth Engine) and machine learning algorithms (e.g., Neural Network, Random Forest) based on the knowledge gained from our NASA project. Study areas will be selected from different regions of the globe, accuracy will be assessed quantitatively, and mangrove forest maps will be compared with existing mangrove data products.

2020-2021
STC-19-00 Advancing Spatiotemporal Studies to Enable 21st Century Sciences and Applications
Many 21st century challenges to contemporary society, such as natural disasters, happen in both space and time, and require spatiotemporal principles and thinking be incorporated into computing process. A systematic investigation of the principles would advance human knowledge by providing trailblazing methodologies to explore the next generation computing models for addressing the challenges. This overarching center project is to analyze and collaborate with international leaders, and the science advisory committee to generalize spatiotemporal thinking methodologies, produce efficient computing software and tools, elevate the application impact, and advance human knowledge and intelligence. Objectives are to: a) build spatiotemporal infrastructure from theoretical, technological and application aspects, b) innovate the spatiotemporal studies with new tools, systems, and applications, c) educate K-16 and graduate students with proper knowledge, d) develop a community for spatiotemporal studies from center sites and members at regional, national to global levels through IAB meetings, symposium, and other venues. 
2020-2021
GMU-20-01  Improving the Air Quality in the Urban Setting  

Climate change and pollutant emissions continue to worsen our breathing air, which is killing 7 million people every year according to World Health Organization. 1/3 of deaths from stroke, lung cancer and heart disease are due to air pollution (https://www.who.int/airpollution/news-and-events/how-air-pollution-is-destroying-our-health). Based on EPA observations, Climate Central found that 40 U.S. cities had at least 20 unhealthy air days since 2015, many of them experienced an uptick in unhealthy air days in recent years. For example, over 100 accumulated days with unhealthy air quality (AQ) in the past two decades are observed in Los Angeles. Timely forecasting air pollution and disseminating the results to citizens would help save lives and improve their health. It has been a long time gap dreamed to be filled by atmospheric scientists and urban managers. Fortunately, the increasingly available low-cost sensors, Internet of Things, and satellite observations start to provide new Earth observation system for feeding into numerical simulations to enhance the reliability and accuracy of AQ prediction. The emergence of 5G mobile technologies also brings enormous benefits to AQ observation with higher data transmission speed and more connected networks. However, it is critical and challenging to integrate spatiotemporally heterogeneous observation data with the numerical AQ prediction. Using Los Angles as an example, we propose to fuse a variety of geoscience observations from satellite to the ground-based Internet of Things, feed into numerical methods-based AQ simulations models, and output the results to be validated by and disseminated to academic geoscientists and citizens.

2020-2021
GMU-20-03 Spatiotemporal Analysis of Medical Resource Deficiencies under COVID-19 Pandemic 

The COVID-19 pandemic swept the entire world in the past 5 months and the U.S. became the epic center with the most confirmed cases. Although many states are reopening for the economy, the risk is still high and there are many debates about resurgence of the outbreak and high pressure on the medical system with immature openings. Sufficient medical equipment and health care professionals are critical to save lives and better prepare our communities. Accurately assessment and predication of the medical resource demands are important to avoid over committed (e.g., NYC didn’t use many resources asked for) and under committed (e.g., there’s only a few ICU beds in the Alabama capital).
We propose to develop a timely assessment system of medical resource demands based on current confirmed cases and hospitalized patients as well as ML/AI based prediction. This system (shown in the following figure) will be based on our current spatiotemporal distribution and demands of medical resources in USA at county level for COVID-19 pandemic. The system dashboard supports monitoring, analyzing, visualizing, and sharing the medical resource and analyzed results. The medical resource includes county-based summary of licensed beds and ICU beds from hospital and medical agencies, and medical stuff, specifically critical care stuff for COVID-19 treatment. Integrated and analyzed with dynamic active cases, the medical resource dynamic index is created and calculated in real time to show the medical resource deficiencies in the U.S. under COVID-19 Pandemic.

2020-2021
Hvd-20-03  Historical Forest Changes Detection Using Satellite Imagery and Google Earth Engine  

In the southeastern US region, over 90% of forests are privately-owned and managed. To achieve sustainable timber production from these forest lands, understanding the forest change history and continuous monitoring of forest land, including harvest and replantation, are essential. The objective of this project is to build a software solution that uses the 35-year history of satellite imagery from Google Earth Engine to recreate the silvicultural history of timberland in the southeastern US region. Specifically, this pilot project takes Union County, South Carolina as the pilot study area, tests the effectiveness of methodologies based on time series satellite data on Google Earth Engine for identifying hardwood, natural or planted pine, and mixed hardwood/pine forest on satellite imagery; and detecting their silvicultural history such as clear-cuts, natural-growth or replanted, age and height. The output may support economic acquisition of under-managed and privately owned timberland in the Southeast United States and sustainably manage the timberlands.

2020-2021
STC-20-02  Spatiotemporal Analytics of COVID-19’s Second-Order Impacts on Global Vulnerable Urban Areas  

This project addresses the role of spatiotemporal data, including open data, upon the understanding and mitigation of impacts from the global COVID-19 pandemic.  The research undertaken will focus on possible long-term and second-order impacts of COVID-19 and the responses that have been enacted at multiple scales, from multinational regions to neighborhood-levels.  Development backsliding during this pandemic is a high risk for developing countries and rapidly growing cities due to new migration patterns, a collapse of informal economies, lacking supplies, disparate basic services and health sites, and overcrowded informal settlements. A key goal of this project is to facilitate discussion and conduct research and reporting to inform participatory mapping and open data creation taking place in developing countries to mitigate COVID second order impacts.

2020-2021
Hvd-21-03 Developing the International Geospatial Health Research Network (IGHRN)
The concept of the International Geospatial Health Research Network (IGHRN) has prompted a series of high-level workshops and symposia on geography and international health research in recent years.  With a focus on Fostering International Geospatial Health Research Collaborations, leading GIScience and health researchers from North America, Asia, Europe, Africa and Latin America identified an interim Steering Committee to develop and sustain an operational IGHRN Network. The IGHRN Secretariat functions and management are jointly handled at the two hub universities, Harvard University and the Chinese University of Hong Kong (CUHK). An International Advisory Committee comprising leading geospatial and health researchers from around the world is also being developed.          

The International Geospatial Health Research Network aims to share new international research and data, help develop geospatial health methods, and support new technologies to foster international collaborations and synergies across borders, and to bridge the gap between GIScience health research and the needs of health practitioners on the ground. 

After the COVID-19 pandemic, it is now clear that the IGHRN is much needed, and that an expanded IGHRN has never been needed more than it is now. With that in mind, the IGHRN Steering Committee has recently begun to restructure the IGHRN, with multiple university and organizational affiliates involved.  

We welcome the engagement, ideas, participation, and funding of the International Geospatial Health Research Network by the NSF, NIH, non-governmental organizations, foundations, and private-sector geospatial and health tech companies, as we develop and expand the IGHRN. 

2021-2022
Hvd-21-02 Assessing household preparedness for Covid-19 in Bangladesh
This project assesses household preparedness for COVID-19 in Bangladesh with special attention to district-level inequalities. The definition of COVID-19 prepared households is based on guidelines from WHO. A household is considered as prepared for COVID-19 when it meets the five conditions: (1) adequate space for quarantine, (2) adequate sanitation, (3) soap and water available for handwashing, (4) phone available for communication, and (5) regular exposure to mass media. The main data source is the 2019 Multiple Indicator Cluster Surveys (MICS) for Bangladesh. The study investigates the association between the district-level prevalence of COVID-19 and household preparation and identify those district-level factors (e.g. population density, economic development, health system performance) that are associated with household preparation for COVID-19.  Findings from this study will provide policy makers in Bangladesh and other stakeholders with solid evidence for improving the situation in those households with poor preparedness for COVID-19.   
2021-2022
Hvd-21-01 Enabling replicable spatiotemporal research with virtual spatial data lab
This project is a continuing effort based on achievements of the Spatial Data Lab project. It is designed to provide a new generation of data services with cutting-edge methodology and technology for reproducible, replicable, and generalizable spatiotemporal research. It will allow researchers to develop case studies with easy-to-use workflow tools and share the case study as a package with others. The project will also support case-based training and teaching programs for multi-disciplinary and inter-disciplinary research in the applications of public health, economics, urban planning, social science, and others. In detail, this project will expand Spatial Data Lab’s capabilities and collaborate with various academic and business partners on the following missions:  

  • Promote Spatial Data Services. Collect and integrate more datasets from partners and various sources and provide standard data services for data access, integration and sharing.    
  • Tools Development for Spatial Data Analysis. Enrich current workflow platform to build spatial data analysis tools, such as hotspot analysis, spatial correlation analysis, geographical regression modelling, and spatiotemporal modelling.  
  • Workflow based Spatial Data Case Studies. Develop easy-to-use workflows to lower the barrier for spatiotemporal data analysis and build a case study repository for reproducible, replicable and generalizable research.   
  • Training Programs for Spatial Data Science. Collaborate with partners to organize a series of training programs on different spatial data science topics, such as urban development, public health, human movement, and environment.  
2021-2022
GMU-21-04 Developing Cloud-based Image Classification Management and Processing Service for High Spatial Resolution Sea Ice Imagery The creation of comprehensive image datasets with accurate ground truth labels is both time-consuming and expensive. This project focuses on this issue, aiming to create an image-based labeling tool for machine learning models, assisting in interdisciplinary discoveries. 

a) Utilizing web-based technologies, this tool fosters a cooperative environment for multiple users, simplifying the process of building extensive datasets. Satellite or aerial imagery, usually characterized by high pixel count, presents difficulties in web-based labeling. The tool addresses this by automatically cropping images to a default size of 256×256 pixels, while also allowing manual cropping to dimensions of 256×256 or 512×512 as needed.

b) Four advanced segmentation techniques, namely Watershed, SLIC(Simple Linear Iterative Clustering), Quickshift, and Felzenswalb, are incorporated into the tool. These techniques partition the cropped images into various polygons. Users have the option to modify segment parameters, which allows for refined segmentation, irrespective of the selected technique. The tool also incorporates additional pre and post processing features to further enhance and refine each segmentation.

c) Additionally, the tool integrates an image annotation feature, which allows users to label each segment according to a specific classification schema. An innovative feature of the tool is the auto-labeling function based on training data. This complements the manual image annotation feature, enabling users to label each segment in line with a designated classification schema.

d) The platform also includes visualization features, which permit users to inspect and analyze labeled images. A built-in sharing feature further enhances collaborative work and knowledge exchange within the platform.

2021-2022
STC-20-01 Innovating a computing infrastructure for spatiotemporal studies  

  • Objective: upgrade our current CI structure to support increasing demands on spatiotemporal innovations and computing 
  • Major task: obtain a 600-node cluster from NASA Goddard CISTO, and evolve the current system management and monitoring web services  
  • Examples: all current center projects need computing support 
  • Expected results: An operational CI to support all current and ongoing research from ST center and potential external usage needs, especially computationally intensive tasks. Projects from members would also be welcomed to use the facility. 
2021-2022
GMU-21-03 Using Machine Learning Methods to Improve the Categorization and Answers of Health Questions  

  • Objective: develop an automatic question answer system for questions regarding health data, information and knowledge with better query understanding, ranking, and recommendation. 
  • Major task: Collect and Index Health and Human Services Experts Knowledge from historical database; Build an HHS knowledge base; Implement a question understanding tool; Build a smart search engine for user queries 
  • Example: An organization may (the United States Department of Health & Human Services) receive 1,000s to 100,000s of related questions from the public on a daily basis 
  • Expected results: A health-specific spatiotemporal question answer portal 
2021-2022
GMU-21-02 Spatiotemporal Open-Source workflow with COVID-19 and Cloud Classification as an example  

In recent times, Deep Learning (DL) has become an important tool to discover patterns and predict earth science processes. In most cases, open-source code for the DL models is shared to perform research. While it is easy for subject matter experts or tech-savvy to quickly set up the computing environment and replicate the DL research, non-programmers or beginners find it difficult to utilize open-source code. To mend this gap, this project aims to develop a formalized process and workflow to effectively publicize and share DL research for Earth System applications so that people from any background can effectively replicate, reproduce, and reobtain the results. The open-source workflow primarily consists of three major phases: (i) open-source software development, (ii) sharing and maintenance, and (iii) reproducible research. Recently, we publicized the rainy cloud detection deep learning model. The open-source activities for rainy cloud detection applications include (i) testing the cloud classification DL model in a various computing platform that supports CPU, single-GPU, and multi-GPU with Windows and Ubuntu OS, (ii) documenting the steps to reproduce the research and creating a tutorial video (ii), sharing the deep learning model, training datasets, user guide, tutorial video, and interpretation of the model results with the community.

2021-2022
STC-21-01 Expand campus reopen to a school system by considering population density and human dynamics  

  • Objective: Expand current school/campus reopen decision support system to accommodate county-based school system reopen decision support 
  • Major task: expand current school reopen model to predict/simulate COVID-19 cases trajectories for multiple schools in a county under specific control strategies with population density and human dynamics dataset as input 
  • Examples: school system simulation in Fairfax County, VA 
  • Expected results: An operational web service to assist county-based school system reopen decision support during COVID-19 pandemic. Potentially extend to simulate stadium for sports, industry or research campus like Goddard for using by chief medical officer or related decision makers 
2021-2022
GMU-21-05 PM 2.5 retrieval and spatiotemporal downscaling using earth observation data  

  • Objective: Develop an innovative methodology to retrieve the PM2.5 in global scale and further downscale the spatiotemporal resolution to 1 km and hourly level in some key regions, using artificial intelligence (AI) models. 
  • Major task: 1. Deep learning for PM 2.5 retrieval using satellite remote sensing, model simulation and ground observation; 2. Deep learning for PM2.5 prediction and downscaling using meteorological data with AOD spatial pattern. 
  • Example: AQ database, prediction and monitoring systems of LA city 
  • Expected results: 1. PM 2.5 estimation covering global scale; 2. hourly 1km*1km (500m*500m) PM2.5 for LA region 
2021-2022
GMU-21-01 Improving ground-level air quality prediction by integrating spatiotemporal new observation system datasets and numerical simulations  

  • Objective: Leverage our advanced cyberinfrastructure (CI) projects, such as EarthCube conceptual design, cloud computing and big data innovations, and big Earth data analytics, to produce an agile, flexible, and sustainable architecture for supporting efficient big spatiotemporal data ingesting and integration. An inter-disciplinary interoperable model will be refined from our past investigations funded by NSF/NASA on WRF for dust storms, WRF-Chem, CMAQ for NO2, voxel-based cellular automata simulation and Earth science research on ground-level data collection and integration. 
  • Major task: 1. Data preprocessing and spatiotemporal collocation; 2. ML-based data preprocessing and downscaling; 3. AQ model simulation; 4. post processing; 5. evaluation and testing 
  • Example: AQ database, prediction and monitoring systems of LA city 
  • Expected results: 1) a robust, high-fidelity ground level AQ dataset for geoscience research of both atmospheric and Earth science divisions; 2) an integrated and re-interfaced advanced cyberinfrastructure for fusing and collocating spatiotemporal AQ data from satellite, airborne, ground and in-situ observations; 3)   an improved high-resolution AQ model to facilitate metropolitan area forecasting; 
2021-2022
Center Related Projects (I-Corps & 2yr Associate Degree Students START training)  

  • The I-CORPS project and the START program are associated to the NSF Spatiotemporal I/UCRC. The I-CORPS project will commercialize the campus reopening system to enable the safe reopen of thousands of college campuses and school systems, as well as many more companies and other organizations with a campus setting. The product will also benefit worldwide schools, especially those from developing countries, to combat the global pandemic. We will also engage community colleges as one of the first steps in providing services to help them safely reopen as a priority identified from our ICAP program interviews.
  • The START program is in response to the NSF DCL 21-076 START that GMU College of Science and the NSF Spatiotemporal I/UCRC will collaborate on providing training to selected 2-year associate students from Valdosta State University and community colleges engage them in Skills Training in the Advanced Research and Technology. Seven faculty members and 10+ Ph.D. students leading by Prof. Chaowei Yang, director of NSF Spatiotemporal Innovation Center and assistant director Dr. Hai Lan, as well as 2-yr IHE lead Prof. Jia Lu will mentor selected students and involve them into current research projects of the center.
2021-2022
HVD-21-09 Predicting Human Insecurity with Multi-faceted Spatiotemporal Factors 

Human security extends beyond the provision of core human needs and protection from acute harm to the creation of supports for home, community, and a sense of hope in the future that contribute to population stability and sustainable development. The range of threats that contribute to human insecurity are multi-faceted and complex, and underscore the need for multi-disciplinary approaches to addressing policy and programmatic strategies for local and regional contexts across the spectrum of the disaster cycle. This preliminary proposal explores four possible research areas, including 1) climate, conflict and migration prediction; 2) atrocity prevention via early warning and early action; 3) spatial vulnerability and climate predictive models for disaster preparedness; and 4) COVID 19 and conflict. All involve integrating spatiotemporal climate change, conflict, demographic, infrastructural, socio-economic and resource availability data, as well as quantifiable perception and behavioral data into predictive models.

 
2021-2022
HVD-21-08 Mapping of Secondary and Tertiary Boundaries Over Time
Currently there is no open, authoritative global source for primary and secondary administrative boundaries.  While some countries provide access to current boundaries, most do not.  Fewer support the ability to see how a given boundary has changed over time. This situation holds true despite the fact small changes in administrative boundaries can have huge impacts on people and their livelihoods. To address this deficiency and design a system for storing and updating such boundaries, we will take a survey of existing efforts to create global boundary datasets and evaluate the strengths and weaknesses of each. Then, to understand the requirements for handling the historic dimension, we will create a historic district boundary dataset for India. Finally, based on the lessons learned from past efforts, and our experience building a historical dataset for one country, we will design a platform to support public access to global boundaries, and their evolution over time.
2021-2022
HVD-21-07 Developing Workbenches for Spatial Data ScienceThis project will explore methodologies and establish protocols for developing workbenches for spatial data science research and teaching. Using Knime, a freeware developed by a German based company, the project will conduct experiments on Workbench development with peer-reviewed case studies, producing at least 60 added nodes for spatial statistics, modeling and visualization, one Workbook for Quantitative Methods and Socio-Economic Applications, 30 replicable, reproducible and expandable workflow based case studies for spatial data science, business applications, and spatial social sciences, 20 online webinars and onsite training workshops for workflow based data analysis with Knime, the User Guide for the Workbench, and 4-6 peer-review publications. Results of this project will provide a consistent and compatible platform for spatial data analysis programs developed in R, Python, and JAVA on different computing environment, promote a new generation of workflow data analysis as well as their applications for teaching and research across different disciplines. 2021-2022
HVD-21-06 Village Level Spatial Prediction of Health Indicators for India Using Machine Learning and Environmental Remote Sensing and Socioeconomic Data CombinationHealth indicators are metrics for population health and development and can be used as effective tools for relevant policy decision makings. To support precision policy making regarding population health and development, village level data science analysis is needed which provides the highest administrative resolution therefore being able to reveal the ultimate details of the spatial patterns of public health and development conditions. This project aims to improve spatial predictions of health indicators at village level to support precision policy making regarding population health and development and implementation planning of the UN SDGs related to public health. While we will take India as our study area and child stunting, underweight, and wasting as our case health indicators, the outputs from this study will be expandable to other developing countries and other health indicators. 2021-2022
HVD-21-05 Global Urban Impulse Index:
Monitoring Human Mobility with Internet and Social Media Data
Human mobility plays an important role in understanding global socio-economic networks, epidemic control, and climate change in the context of global urbanization. Wherein, social media data has become a timely and massive data source for characterizing human flows, widely used in research on various topics. However, there are still difficulties for the public to obtain continuous, instant, and comprehensive social perception analysis based on social media data, which may prevent government agencies from timely decision-making and global cooperation. This project plans to build a set of global city impulse indices on the open KNIME workflow platform, using social media dataset archived by Harvard CGA and other open Internet data. These indices consist of the multi-scale intercity connectivity index at the core, and other ancillary indices based on sentiment text mining. The project will greatly facilitate global cooperation on sustainability, crime morphology, and other potential regional issues.
2021-2022
HVD-21-04 Development of High-Performance System for Collection and Processing of data from Sina-Weibo Social media platforms have made available vast quantities of digital text, providing researchers unique data to investigate human interactions, communication, and well-being. Motivated by this, Harvard Center for Geographic Analysis (CGA) maintains the Geotweet Archive, a global collection of billions of tweets from 2010-present. However, this archive has little information about China because Twitter is not accessible in the country. This creates a significant spatial gap for researchers who are trying to study a global phenomenon. To fill this gap, we need an archive of data from Sina-Weibo, the second largest social media platform in China. Due to its large sample size Weibo has the powerful ability to study and track sentiment, behaviors, and communications within the Chinese socio-cultural context. Therefore, Harvard CGA and Sustainable Urbanization Lab (SUL) at MIT are collaborating to jointly build a high-performance system for collection, processing and analysis of data from Sina-Weibo. 2021-2022
GMU-21-06 USDA Climate Impact on Agricultural OutputThe climate change induced increase of temperature and change in precipitation extremes have significant impacts on agricultural production thus threaten the food security in the US. A comprehensive analysis of the climate change influences on crop yield is essential for decision making and citizen’s life. The objective of this project is to identify and understand how U.S. historical crop production and agricultural productivity were affected by climate variability and climate change. In addition to using ERS national and state-level TFP data, we will collect the annual records of county-level crop acreage and yield for all major crops from USDA survey statistics to conduct a spatiotemporal multifactor analysis of their variations in corresponding to local climate conditions. We will also consider other factors, such as domestic needs for food/feed/fiber commodity, international trades, commodity prices, and government policies in the modeling. We will use machine learning approaches, especially the “structural equation model” framework, to identify the significant signals and underlying mechanisms linking U.S. crop productions and TFP factors to weather conditions (mean, extremes, anomalies) in specific periods and regions. 2021-2022
HVD-21-10 Assessing and Enhancing Values and Lifespan of Geospatial Datasets for Global Humanitarian ResearchGeospatial data are the backbone of transdisciplinary research and critical to geospatial analysis efforts. The COVID-19 pandemic exposed the importance of geospatial data at the local level to track impacts of the virus and explore linkages of causal relationships between health and policy. The plethora of geospatial data developed from the COVID-19 pandemic such as data dashboards and unique datasets tracking COVID-19-related outcomes are in danger of disappearing if strategies for maintenance and archiving are not developed. We propose to use two Department of State-sponsored projects to examine this issue: the Secondary Cities Initiative (2C) and the Cities’ COVID-19 Mapping Mitigation Program (C2M2). Both projects aim at the generation of geospatial data at the local level in urban areas in low- and middle-income countries. A key goal of these projects was to ensure accessibility and adopt open data strategies to ensure long-term sharing of these data. 2021-2022
GMU-17-04 Geo-JPSS Flood DetectionFlood detection software has been developed to generate near real-time flood products from VIIRS imagery. SNPP/JPSS VIIRS data show special advantages in flood detection. The major activities and accomplishments specific objectives in the reporting period is Flooding application. The plan for next reporting period is to 1) improvement current flood product, 2) develop 3-D flood parameters: flood water surface level, flood water depth, high resolution flood maps and 3) Further analysis on regional flood patterns.  2020-2024
GMU-20-02 Agent-based multi-scale COVID-19 outbreak simulationThe outbreak of the Coronavirus disease 2019 (COVID-19) is becoming a globally pandemic, which affects deeply in daily life of people in China, Spain, Italy, the U.S. and many other countries across the world. Many effective policies and strategies has been made to slow down the spreading of COVID-19 for different areas around the world, which could potentially be considered as guidance to prevent the possible outbreaks of places and counties that have not been serious effected by the virus yet. However, the question will be, how to identify the possible outbreak places with existing observation-based evidence? Agent-based models (ABMs) are widely used media to be applied as standalone simulator or be integrated with models in related disciplines to help strengthen existing studies including the field of infectious disease epidemiology. In previous epidemiological studies, ABMs have been adopted to simulated and predicted the effectiveness containment strategies under different policies, the time and space of outbreaks, medical resource deficiencies, and impact on logistics systems. In our study, we propose to develop a comprehensive ABM-based simulator coupled with multivariate impact factors such as spatiotemporal distribution of coronavirus, human migration and activities, climate conditions and environmental factors, containment strategies and policies to reveal and predict the pandemic pattern of COVID-19 at different scales such as county-level, state-level, nation-wide and even global scale. We will then apply the simulation model to describe the multi-scale COVID-19 outbreak patterns with certain confidence level as well as predict the possibility of outbreak in some places like India and South Africa, which may help prevent the pandemic and save lives in those no outbreak areas. 2020-2021
STC-19-00 Advancing Spatiotemporal Studies to Enable 21st Century Sciences and Applications 

Many 21st century challenges to contemporary society, such as natural disasters, happen in both space and time, and require spatiotemporal principles and thinking be incorporated into computing process. A systematic investigation of the principles would advance human knowledge by providing trailblazing methodologies to explore the next generation computing models for addressing the challenges. This overarching center project is to analyze and collaborate with international leaders, and the science advisory committee to generalize spatiotemporal thinking methodologies, produce efficient computing software and tools, elevate the application impact, and advance human knowledge and intelligence. Objectives are to: a) build spatiotemporal infrastructure from theoretical, technological and application aspects, b) innovate the spatiotemporal studies with new tools, systems, and applications, c) educate K-16 and graduate students with proper texts and curriculum, d) develop a community for spatiotemporal studies from center sites, members, regional, national to global level through IAB meetings, symposium, and other venues.

2019-2020
STC-19-01 Innovating a Computing Infrastructure for Spatiotemporal Studies  

In phase I, the spatiotemporal innovation center built a 500 computing nodes cloud facility, which enables most projects of the center. After 5 years operation of the infrastructure, we see a need for an upgrade infrastructure with more RAM, faster CPU speed, more storage on each node and hopefully with a GPU cluster that can help us to address the growing challenges on image/graphics processing and deep learning in the phase II operation. Based on our IAB’s recommendation and center projects, we propose to a) develop and maintain an upgraded computing infrastructure with more computing power and graphics, deep learning capabilities, b) provide spatiotemporal computing coordination and research to all center projects with computing needs by maintaining a highly capable research staff support to optimize the computing infrastructure, c) serve the campus needs of computing with spatiotemporal interest to gain broader impact and engagement of scientists and students, and d) adopt and develop advanced spatiotemporal computing technologies to innovate the next generation computing tools. 

2019-2020
STC-19-02 Spatiotemporal Innovation Testbed 

The first phase of the Spatiotemproal I/UCRC has witnessed many spatiotemporal innovations in the past six years. Like innovations in any other domain and technology area, spatiotemporal innovation also takes the hype cycle of maturity and many of them emerge in recent years. The community needs a comprehensive information source on what, when, where, and how much efforts are needed for maturing, adopting, and operating the new innovations. To reduce the high illusion of innovation hype cycle and meet this community need, we propose to establish a testbed utilizing the center’s infrastructure as part of the spatiotemporal infrastructure envisioned for the center to implement in its 15 years of investigation. This project will draw best practices from past investigations such as the computing infrastructure, big data testbed, cloud testbed, EarthCube testbed, ESIP testbed to maintain and automate a testbed environment. The testbed will serve as a platform for members, faculty, students, and the community to validate, verify new technologies & emerging innovations, and to produce white papers, review papers, evaluation publications for the broadest impacts. 

2019-2020
Hvd-19-01 CDL: Developing an online spatial data sharing and management platform  

This project is to develop an online platform for the creation, management and sharing of spatiotemporal data, analytical tools and study cases, nicknamed Spatial Data Lab (SDL). Currently, Harvard’s Dataverse and WorldMap have been integrated with the SDL platform. Data-driven analytical workflows are sharable and accessible from Harvard Dataverse with encrypted links to the SDL platform. This year the project takes COVID-19 as one of the case studies. The team has been actively building resource repositories for COVID-19 research since January. We are providing standardized datasets, executable workflows and training materials on the SDL platform for collaborating researchers to easily and quickly conduct research, enhance methodology, publish results, and deliver education, on COVID-19 related research topics. 

2019-2020
Hvd-19-06 Elevating Research Excellence with Data Repository and AI Ecosystem 

Dataverse is an open source data repository platform where users can share, preserve, cite, and explore research data. RMDS Lab is a startup company developing transformative technologies for research with big data and AI. This project is to establish a collaboration between the two teams and two platforms to create synergy that will advance the shared goal of supporting worldwide scholars in data-driven research. The main objective of this project is to explore solutions to apply AI technology in evaluating data science studies, provide measurable references for data scientists on the accuracy, impactfulness, replicability, applicability, and other merit scores of data science study cases; and to promote high-quality data science research through platform development, data sharing, community building, and user training. The Coronavirus crisis has strengthened the collaboration between the two organizations expanding the project to use datasets from not only the Harvard Dataverse but also the over 60 Dataverse installations worldwide. 

2019-2020
STC-15-02 Dynamic Mapping of Secondary Cities 

Secondary Cities are non-primary cities, characterized by population size, function and/or economic status.  They are urban centers of governance, logistics, and production and are often data poor. This project is a global initiative to address critical geospatial data needs of secondary cities. The objective is to enhance emergency preparedness, human security and resilience. The project facilitates partnership with local organizations for data generation and sharing, using open source tools, and focuses on applied geography – human geography thematic areas. 

2019-2020
GMU-19-01 Cloud classification 

Cloud types, coverage and distribution have significant influence on the characteristic and dynamic of global climate. They are directly related to the energy balance of the earth. Therefore, accurate cloud classification and analysis are essential for the research of atmosphere and climate change. Cloud classification assigns a predetermined label to cloud in the image, e.g., cirrus, altostratus and altocumulus. With cloud segmentation, satellite imagery can be utilized to support a series of local mesoscale climate analysis like rainy cloud detection, cyclone detection, or extreme weather event (e.g. heavy rainfall) predictions. However, it is a challenging task to distinguish different clouds from satellite imagery because of intraclass spectral variations and interclass spectral similarities. 

Traditionally, cloud types are classified using selected features and threshold such as cloud-top pressure (CTP), cloud optical thickness (COT), brightness temperature (BT) and multilayer flag (MLF). One drawback is that the model accuracy heavily relies on threshold and feature selection. The past years have witnessed the successful deep learning applications in automatically feature selection for object detection from images with the aid of CNN model and its variants such as VGGNet, ResNet. Inspired by successful applications of deep learning in computer vision, we propose to implement an automatic cloud classification system based on deep neural network to identify the 8 kinds of cloud from geostationary and polar orbit satellite data, with cloud types from 2B-CLDCLASS product of CloudSat-CPR as the reference of label. 

2019-2020
GMU-19-03 Planetary Defense  

Programs like NASA’s Near-Earth Object (NEO) Survey supply the PD community with the necessary information that can be utilized for NEO mitigation. However, information about detecting, characterizing and mitigating NEO threats is still dispersed throughout different organizations and scientists, due to the lack of structured architecture. This project is aimed to develop a knowledge base and engine to provide discovery and easy access to the PD related resources by developing 1) a domain-specific Web crawler to automate the large-scale up-to-date discovery of PD related resource, and 2) a search ranking method to better rank the search results. The Web crawler is based on Apache Nutch, one of the well-recognized highly scalable web crawlers. In this research, Apache Nutch is extended in three aspects: 1) a semi-supervised approach is developed to create PD-related keyword list; 2) an improved similarity scoring function is utilized to set the priority of the web pages in the crawl frontier; and 3) an adaptive approach is designed to re-crawl/update web pages. The search ranking module is built upon Elasticsearch. Rather than using the basic search relevance function of Elasticsearch, a PageRank based link analysis and an LDA based topic modelling approach are developed to better support the ranking of interconnected web pages. 

2019-2020
GMU-19-04 Micro-scale Urban Heat Island Spatiotemporal Analytics and Prediction Framework 

As one of the adverse effects of urbanization and climate change, Urban Heat Island (UHI) can affect human health. Most researches have been relying on remote sensing imagery or sparsely distributed station sensor data and focusing on the broad understanding of the meso- or city- scale UHI phenomenon and mitigation support. However, challenges remain for the micro-level. This project aims to: 1) build an in-depth investigation of the human-weather-climate relations for the urban area; 2) fill the gap between short-term weather impact effects from buildings, traffics, human mobilities, and long-term microclimate from understanding such relations with real-time urban sensing (IoT) data; 3) establish a machine-learning enabled ensemble model for fast near-future temperature forecasts by considering the human-weather-climate relationships; 4) provide guideline for the precautionary local-human-activity management strategy design and implementation according to the forecasts to reduce public health-related risks, allowing better urban living spaces.

2019-2020
GMU-18-01 Rapid extreme weather events detection and tracking from 4D/5D climate simulations 

Climate simulations provide valuable information to represent the situations of the atmosphere, ocean and land. Increasingly advanced computational technologies and Earth observation capabilities have enabled the climate models to have higher spatial and temporal resolution, providing an ever realistic coverage of the Earth. The high spatiotemporal resolution also provides us the opportunity to more precisely pinpoint and identify/segment the occurrence of extreme weather events, such as tropical cyclones, which can have dramatic impacts on populations and economies. Deep learning techniques are considered as one of the breakthroughs in recent years, achieving compelling results on many practical tasks including disease diagnosis, facial recognition, autonomous driving. We propose to utilize deep learning techniques on the rapid detection of two extreme weather events: tropical cyclones and dust storms. Deep learning models trained on past climate simulations will inform the effectiveness of the approach on future simulations. Our technological motivation is that currently high-resolution simulations and observations have been generating too much data for researchers, scientists, and organizations to store for their applications. Machine learning methods performing real-time segmentation and classification of relevant features for extreme weather events can generate such list or database storing these features, and detailed information can be obtained by rerunning the simulation with high spatiotemporal data when needed.

2018-2019
GMU-18-02 Climate Indicators downscaling 

Weather condition has become one of the most essential factors that people concern about in their daily life. People may want to check the weather forecast every day even every several hours especially in some activities that very sensitive to temperature, precipitation or winds, for example, taking flights, etc. But nowadays, civil weather forecasts data are issued every six hours, which is far insufficient to the actual needs. And the spatial resolutions of most weather data such as precipitation and surface winds are around several kilometers which are too coarse for some regions. This project will focus on weather data downscaling to fulfill the increasing needs for short term forecast with high spatial and temporal resolutions.

2018-2019
STC-17-01 Big Data Learning Platform 

The objective is investigating and advancing the technology of a Deep Learning system to integrate big data, learn hidden knowledge, and discover new information of significance to human dynamics, Earth environment, and space detection.
The expect result is to 1) integrate a Deep Learning System based on Hybrid Cloud Computing for Geospatial Intelligence with SOA Algorithm/Models supported. 2) An advanced big data contained system will be built to manage, fast access various types of data. 3) A number of scenarios including weather, events, new information, and space detection will be tested on the system developed. 4) Quality of new information discovered will be checked with a knowledge base and factual information.

 
GMU-17-04 Integration and Applications of Geo-JPSS Flood Detection 

Flood detection software has been developed to generate near real-time flood products from VIIRS imagery. SNPP/JPSS VIIRS data show special advantages in flood detection. The major activities and accomplishments specific objectives in the reporting period is Flooding application. The plan for next reporting period is to 1) improvement current flood product, 2) develop 3-D flood parameters: flood water surface level, flood water depth, high resolution flood maps and 3) Further analysis on regional flood patterns. 

 
STC-16-03 Big Data Deep Learning 

Big Data emerged with unprecedented values for research, development, innovation and business, and most of them have a spatiotemporal stamp. However, the transformation of Big Data into value poses grand challenges for big data management, spatiotemporal data modeling, and spatiotemporal data mining. To enable such transformation, we propose to develop a deep learning platform based on the spatiotemporal innovation current project. The platform will have advanced data management and computing technologies to mine valuable knowledge from Big Spatiotemporal Data. More robust models will be built to discover the implicit spatiotemporal dynamic patterns in climate, dust storm, and weather with remote sensing and model simulation data to solve the concerned environment and health issues. Meanwhile, user-generated data, such as PO.DAAC and social media, will be mined to improve geospatial data discovering and form a knowledge base for spatiotemporal data. In addition, high performance computing (e.g. GPU and parallel computing) and cloud computing technologies will be utilized to accelerate the knowledge discovering process. The proposed deep learning platform for Big Spatiotemporal Data will develop/integrate a suite of software for big spatiotemporal data mining, and contribute a core to spatiotemporal innovation

 
GMU-16-05 Data Container Study for Handling array-based data using Rasdaman, SciDB, Hive, Spark, and MongoDB 

Geoscience communities have come up with various big data storage solutions, such as Rasdaman and Hive, to address the grand challenges for massive Earth observation data management and processing. To examine the readiness of current technologies and tools in supporting big Earth observation data archive, discovery, access, and processing, we investigated and compared several popular data solutions, including Rasdaman, SciDB, Hive, Spark, CliamteSpark, and MongoDB. Using different types of spatial and non-spatial queries, and datasets stored in common scientific data formats (e.g., NetCDF and HDF), the feature and performance of these data containers are systematically compared and evaluated. The evaluation metrics focus on their performance related to discover and access datasets for upper level geoscience applications. The computing resources (e.g. CPU, memory, hard drive, network) consumed while performing various queries and operations are monitored and recorded for the performance evaluation. The initial results show that 1) MongoDB has the best performance for queries on statistical and operational functions, but does not support NetCDF data format better than HDF; 2) ClimateSpark has better performance than the pure Spark and Hive in most cases, except the single point extraction in the long time series; and 3) Hive is not good at querying small datasets since it uses MapReduce as the processing engine with a lot of overhead. A comprehensive report will detail the experimental results, and compare their pros and cons regarding system performance, ease of use, accessibility, scalability, compatibility, and flexibility.

2015-2016
STC-15-01 Developing and Maintaining a Spatiotemporal Computing Infrastructure Project Space 

Take the demands from our IAB and center projects, this project is to a) develop and maintain a spatiotemporal computing infrastructure by acquiring a high performance computing facility, b) provide spatiotemporal computing coordination and research to all center projects with computing needs by maintaining a highly capable research staff support to optimize the computing infrastructure, c) adopt and develop advanced spatiotemporal computing technologies to innovate the next generation computing tools.

2015-2020
STC- 15-02 Dynamic Mapping of Secondary Cities 

Secondary Cities are non-primary cities, characterized by population size, function and/or economic status.  They are urban centers of governance, logistics, and production and are often data poor. This project is a global initiative to address critical geospatial data needs of secondary cities. The objective is to enhance emergency preparedness, human security and resilience. The project facilitates partnership with local organizations for data generation and sharing, using open source tools, and focuses on applied geography – human geography thematic areas. 

2015-2020
GMU-15-01 ClimateSpark: An In-memory Distributed Computing Framework for Big Climate Data Analytics Project Space 

Large collections of observational, reanalysis, and climate model output data are being assembled as part of the work of the Intergovernmental Panel on Climate Change (IPCC). These collections may grow to as large as a 100 PB in the coming years. The NASA Center for Climate Simulation (NCCS) will host much of this data. Ideally, such big data can be provided to scientists with on-demand analytical and simulation capabilities to relieve them from time-consuming computational tasks. However, it is challenging to realize this goal, because processing such big data requires efficient big data management strategies, complex parallel computing algorithms, and scalable computing resources. Based on the extensive experience at NCCS and GMU in big climate data analytics, Hadoop, cloud computing, and other technologies, a high-performance computing framework, ClimateSpark, has been developed to better support big climate data analytics. A hierarchical indexing strategy has been designed and implemented to support efficient big multi-dimensional climate data management and query in a scalable environment. The high-performance Taylor-Diagram service has been developed as a tool to help climatologists evaluate different climate model outputs. A web portal has been developed to ease the remote interaction between users, data, analytic operations, and computing resources by using SQL, scala/python notebook, or RESTful API.

2015-2016
GMU-15-08 Automatic Near-Real- TimeFlood Detection using Suomi-NPP/VIIRS Data 

Near real-time satellite-derived flood maps are invaluable to river forecasters and decision-makers for disaster monitoring and relief efforts. With the support from the JPSS Proving ground and Risk Reduction Program, a flood detection package has been developed using SNPP/VIIRS (Suomi National Polar-orbiting Partnership/ Visible Infrared Imaging Radiometer Suite) imagery to generate daily near real-time flood maps automatically for National Weather Service (NWS)-River Forecast Centers (RFC) in the USA. In this package, a series of algorithms have been developed including water detection, cloud shadow removal, terrain shadow removal, minor flood detection, water fraction retrieval and flooding water determination. The package has been running routinely with the direct broadcast SNPP/VIIRS data since 2014. Flood maps were carefully evaluated by river forecasters using airborne imagery and hydraulic observations. Offline validation was also made via visual inspection with VIIRS false-color composite images on more than 10,000 granules across a variety of scenes and comparison with river gauge observations year-round. Evaluation of the product has shown high accuracy and promising performance of the product has won positive feedback and recognition from end-users.

2015-2016
GMU-15-09 Planetary Defense Project Space 

Programs like NASA’s Near-Earth Object (NEO) Survey supply the PD community with the necessary information that can be utilized for NEO mitigation. However, information about detecting, characterizing and mitigating NEO threats is still dispersed throughout different organizations and scientists, due to the lack of structured architecture. This project is aimed to develop a knowledge discovery search engine to provide discovery and easy access to the PD related resources by developing 1) a domain-specific Web crawler to automate the large-scale up-to-date discovery of PD related resource, and 2) a search ranking method to better rank the search results. The Web crawler is based on Apache Nutch, one of the well-recognized highly scalable web crawler. In this research, Apache Nutch is extended in three aspects: 1) a semi-supervised approach is developed to create PD-related keyword list; 2) an improved similarity scoring function is utilized to set the priority of the web pages in the crawl frontier; and 3) an adaptive approach is designed to re-crawl/update web pages. The search ranking module is built upon Elasticsearch. Rather than using the basic search relevance function of Elasticsearch, a PageRank based link analysis and a LDA based topic modelling approach are developed to better support the ranking of interconnected web pages.

2015-2016
UCSB-15-01 Linked Data for the National Map 

The proposed project aims at providing Linked Data access to National Map vector data which resides in the ArcGIS Geodatabase format. These data include hydrography, transportation, structures, and boundaries. The project will address the challenge of how to efficiently make large data volumes available and queryable at the same times. Previous research and the PIs experience suggest that in the context of the National Map, offering hundreds of Gigabyte of Linked Data via an unrestricted endpoint will not scale. To address this challenge a variety of methods will be tested to determine the sweet spot between data dumps, i.e., just storing huge RDF files for download, on the one side, and unrestricted public (Geo)SPARQL endpoints on the other side. Methods and combination of methods will include (Geo)SPARQL-SQL rewriting, transparent Web Service proxies for WFS, Linked Data Fragments, query optimization, restricted queries via a user interface, and so forth. The sweet spot will be defined as the method (or combination of methods) that enables common usage scenarios for Linked National Map Data, i.e., that is able to retain as much of the functionality that would be provided by having full Linked Data query access via a public endpoint while keeping server load and average query runtime (for common usage queries) at an acceptable level. A Web-based user interface will expose the resulting data and make them queryable and explorable via the follow-your-nose paradigm.

2015-2016
Harvard-14-03 Development and application of ontologies for NHD and related TNM data layers 

Feature layers in the US National Map (TNM) are fundamental contexts for spatiotemporal data collection and analysis, but largely exist independent of each other as map layers. This project will explore the use of formal ontologies and semantic technology to represent functional relationships within and between “wet” hydrography and “dry” landscape layers to express the basis for occurrence of lakes and rivers. It will then test these representations in applications for discovering and analyzing related water science data.

2014-2016
Harvard-14-02 Developing a place name extraction methodology for historic maps 

We propose to develop an approach for automating the extraction and organization of place name information from georeferenced historic map series (in multiple languages) and will focus on scales better than 250k. Such information essential to the spatialization of unstructured text documents and social media. Phase I will be a feasibility study which evaluates best existing technologies against current extraction costs (including outsourcing) and then recommends next steps for establishing a production system; options will be: 1) do nothing as there is currently no cost effective approach, 2) make use of an existing platform and develop work flows for it, 3) develop a new system which combines existing technologies and/or develops new technologies.

2014-2015
GMU-14-01 Improving geospatial data discovery, access, visualization and analysis for Data.gov, geospatial platform and other systems 

Develop a set of efficient tools to better discover, access and visualize the data and services from Data.gov and Geospatial Platform to meet the following requirements:1) Support the discovery using enhanced semantic context and inferences to improve discovery recall and precision. 2) Provide and enhance an open-source viewer capability to visualize and analyze different online map services. 3) Develop an open-source analytical workbench prototype for incorporation into the Data.gov and Geospatial Platform to enable end-user computational analysis on multiple remote geospatial web services that can be captured as services for optional re-execution, resulting in analytical data products (data, graphs, maps) as a result of raster and vector overlay. 4) Supply a QoS module to check and track the service quality information

2014-2015
GMU-14-05 Developing a Hadoop-based middleware for handling multi-dimensional NetCDF 

Climate observations and model simulations are producing vast amounts of array-based spatiotemporal data. Efficient process of these data is essential to global challenges such as climate change, natural disasters, diseases, and other emergencies. However, this is challenging not only because of the large data volume but also the intrinsic nature of high dimensionalities of climate data. To tackle this challenge, this paper proposes a Hadoop-based middleware to efficiently manage and process big climate data in a highly scalable environment. With this approach, big climate data are directly stored in Hadoop Distributed File System in its original format without any special configuration for the Hadoop cluster. A spatiotemporal index is built to bridge the logical array-based data model and the physical data layout, which enables fast data retrieve with spatiotemporal query. Based on the index, a data-partitioning algorithm is proposed to enable MapReduce to achieve high data locality and balanced workload. The proposed approach is evaluated using the NASA MERRA reanalysis climate data. The experiment results show that the Hadoop-based middleware can significantly accelerate the query and process (~10x speedup compared to the baseline test using the same cluster), while keeping the index-to-data ratio small (0.0328%). The applicability of the Hadoop-based middleware is demonstrated by a climate anomaly detection application deployed on the NASA Hadoop cluster.

2014-2015
GMU-14-07 Analyzing and visualizing data quality in crowdsourcing environments 

Significant new influences in the geospatial domain include Web 2.0, social media, and user-centered technologies, as well as the generation and use of very large, dynamic datasets. These areas of influence present many new opportunities and challenges, and may require the development of a “new geospatial toolkit” to be used with new sources and types of geospatial data. A goal of our research is to help define this new set of tools, techniques, and strategies, and to explore approaches and practical perspectives for integrating new sources of data and knowledge into the geospatial domain. In this project, we developed web and mobile-based data collection prototypes to provide methods for characterizing and assessing data quality in crowdsourcing systems and novel ways to visualize data quality metrics. The different data collection ways, hybrid databases, and quality assessment methods for crowdsourced data serve as the cores in the prototype.

2014-2015
STC-14-01 Developing a big spatiotemporal data computing platform (continued by STC-15-01)  

This research is to design and develop a general computing platform to best utilize cluster, grid, cloud, many integrated cores (MICs), graphics processing units (GPUs), GPUs/CPUs hybrid, and volunteer computing for accessing, processing, managing, analyzing, and visualizing big spatiotemporal data. Our developed computing and optimization techniques and heterogeneous resources integrator can support such a platform that can facilitate a number of applications, e.g., climate simulation, social media analyses, online visual analytics, geospatial platform, and GEOSS clearinghouse.

2014-2015
STC-14-00 Advancing Spatiotemporal Studies to Enable 21st Century Sciences and Applications 

Many 21st century challenges to contemporary society, such as natural disasters, happen in both space and time, and require spatiotemporal principles and thinking be incorporated into computing process. A systematic investigation of the principles would advance human knowledge by providing trailblazing methodologies to explore the next generation computing models for addressing the challenges. This overarching center project is to analyze and collaborate with international leaders, and the science advisory committee to generalize spatiotemporal thinking methodologies, produce efficient computing software and tools, elevate the application impact, and advance human knowledge and intelligence. Objectives: a) Generalize spatiotemporal thinking methodologies. b) Produce efficient computing software and tools. c) Elevate the application impact, and. d) Advance human knowledge and intelligence.

2013-2020

* This page is under construction. We will have more information to be added.*