Proposed Projects

ID Name Period
HVD-21-10 Assessing and Enhancing Values and Lifespan of Geospatial Datasets for Global Humanitarian Research

Geospatial data are the backbone of transdisciplinary research and critical to geospatial analysis efforts. The COVID-19 pandemic exposed the importance of geospatial data at the local level to track impacts of the virus and explore linkages of causal relationships between health and policy. The plethora of geospatial data developed from the COVID-19 pandemic such as data dashboards and unique datasets tracking COVID-19-related outcomes are in danger of disappearing if strategies for maintenance and archiving are not developed. We propose to use two Department of State-sponsored projects to examine this issue: the Secondary Cities Initiative (2C) and the Cities’ COVID-19 Mapping Mitigation Program (C2M2). Both projects aim at the generation of geospatial data at the local level in urban areas in low- and middle-income countries. A key goal of these projects was to ensure accessibility and adopt open data strategies to ensure long-term sharing of these data.

2021-2022
GMU-21-06 USDA Climate Impact on Agricultural Output

The climate change induced increase of temperature and change in precipitation extremes have significant impacts on agricultural production thus threaten the food security in the US. A comprehensive analysis of the climate change influences on crop yield is essential for decision making and citizen’s life. The objective of this project is to identify and understand how U.S. historical crop production and agricultural productivity were affected by climate variability and climate change. In addition to using ERS national and state-level TFP data, we will collect the annual records of county-level crop acreage and yield for all major crops from USDA survey statistics to conduct a spatiotemporal multifactor analysis of their variations in corresponding to local climate conditions. We will also consider other factors, such as domestic needs for food/feed/fiber commodity, international trades, commodity prices, and government policies in the modeling. We will use machine learning approaches, especially the “structural equation model” framework, to identify the significant signals and underlying mechanisms linking U.S. crop productions and TFP factors to weather conditions (mean, extremes, anomalies) in specific periods and regions.

2021-2022
HVD-21-04 Development of High-Performance System for Collection and Processing of data from Sina-Weibo 

Social media platforms have made available vast quantities of digital text, providing researchers unique data to investigate human interactions, communication, and well-being. Motivated by this, Harvard Center for Geographic Analysis (CGA) maintains the Geotweet Archive, a global collection of billions of tweets from 2010-present. However, this archive has little information about China because Twitter is not accessible in the country. This creates a significant spatial gap for researchers who are trying to study a global phenomenon. To fill this gap, we need an archive of data from Sina-Weibo, the second largest social media platform in China. Due to its large sample size Weibo has the powerful ability to study and track sentiment, behaviors, and communications within the Chinese socio-cultural context. Therefore, Harvard CGA and Sustainable Urbanization Lab (SUL) at MIT are collaborating to jointly build a high-performance system for collection, processing and analysis of data from Sina-Weibo.

2021-2022
HVD-21-05 Global Urban Impulse Index:
Monitoring Human Mobility with Internet and Social Media Data

Human mobility plays an important role in understanding global socio-economic networks, epidemic control, and climate change in the context of global urbanization. Wherein, social media data has become a timely and massive data source for characterizing human flows, widely used in research on various topics. However, there are still difficulties for the public to obtain continuous, instant, and comprehensive social perception analysis based on social media data, which may prevent government agencies from timely decision-making and global cooperation. This project plans to build a set of global city impulse indices on the open KNIME workflow platform, using social media dataset archived by Harvard CGA and other open Internet data. These indices consist of the multi-scale intercity connectivity index at the core, and other ancillary indices based on sentiment text mining. The project will greatly facilitate global cooperation on sustainability, crime morphology, and other potential regional issues.

2021-2022
HVD-21-06 Village Level Spatial Prediction of Health Indicators for India Using Machine Learning and Environmental Remote Sensing and Socioeconomic Data Combination

Health indicators are metrics for population health and development and can be used as effective tools for relevant policy decision makings. To support precision policy making regarding population health and development, village level data science analysis is needed which provides the highest administrative resolution therefore being able to reveal the ultimate details of the spatial patterns of public health and development conditions. This project aims to improve spatial predictions of health indicators at village level to support precision policy making regarding population health and development and implementation planning of the UN SDGs related to public health. While we will take India as our study area and child stunting, underweight, and wasting as our case health indicators, the outputs from this study will be expandable to other developing countries and other health indicators.

2021-2022
HVD-21-07 Developing Workbenches for Spatial Data Science

This project will explore methodologies and establish protocols for developing workbenches for spatial data science research and teaching. Using Knime, a freeware developed by a German based company, the project will conduct experiments on Workbench development with peer-reviewed case studies, producing at least 60 added nodes for spatial statistics, modeling and visualization, one Workbook for Quantitative Methods and Socio-Economic Applications, 30 replicable, reproducible and expandable workflow based case studies for spatial data science, business applications, and spatial social sciences, 20 online webinars and onsite training workshops for workflow based data analysis with Knime, the User Guide for the Workbench, and 4-6 peer-review publications. Results of this project will provide a consistent and compatible platform for spatial data analysis programs developed in R, Python, and JAVA on different computing environment, promote a new generation of workflow data analysis as well as their applications for teaching and research across different disciplines.

2021-2022
HVD-21-08 Mapping of Secondary and Tertiary Boundaries Over Time

Currently there is no open, authoritative global source for primary and secondary administrative boundaries.  While some countries provide access to current boundaries, most do not.  Fewer support the ability to see how a given boundary has changed over time. This situation holds true despite the fact small changes in administrative boundaries can have huge impacts on people and their livelihoods. To address this deficiency and design a system for storing and updating such boundaries, we will take a survey of existing efforts to create global boundary datasets and evaluate the strengths and weaknesses of each. Then, to understand the requirements for handling the historic dimension, we will create a historic district boundary dataset for India. Finally, based on the lessons learned from past efforts, and our experience building a historical dataset for one country, we will design a platform to support public access to global boundaries, and their evolution over time.

2021-2022
HVD-21-09 Predicting Human Insecurity with Multi-faceted Spatiotemporal Factors

Human security extends beyond the provision of core human needs and protection from acute harm to the creation of supports for home, community, and a sense of hope in the future that contribute to population stability and sustainable development. The range of threats that contribute to human insecurity are multi-faceted and complex, and underscore the need for multi-disciplinary approaches to addressing policy and programmatic strategies for local and regional contexts across the spectrum of the disaster cycle. This preliminary proposal explores four possible research areas, including 1) climate, conflict and migration prediction; 2) atrocity prevention via early warning and early action; 3) spatial vulnerability and climate predictive models for disaster preparedness; and 4) COVID 19 and conflict. All involve integrating spatiotemporal climate change, conflict, demographic, infrastructural, socio-economic and resource availability data, as well as quantifiable perception and behavioral data into predictive models.

2021-2022
Center Related Projects (I-Corps & 2yr Associate Degree Students START training) 

  • The I-CORPS project and the START program are associated to the NSF Spatiotemporal I/UCRC. The I-CORPS project will commercialize the campus reopening system to enable the safe reopen of thousands of college campuses and school systems, as well as many more companies and other organizations with a campus setting. The product will also benefit worldwide schools, especially those from developing countries, to combat the global pandemic. We will also engage community colleges as one of the first steps in providing services to help them safely reopen as a priority identified from our ICAP program interviews.
  • The START program is in response to the NSF DCL 21-076 START that GMU College of Science and the NSF Spatiotemporal I/UCRC will collaborate on providing training to selected 2-year associate students from Valdosta State University and community colleges engage them in Skills Training in the Advanced Research and Technology. Seven faculty members and 10+ Ph.D. students leading by Prof. Chaowei Yang, director of NSF Spatiotemporal Innovation Center and assistant director Dr. Hai Lan, as well as 2-yr IHE lead Prof. Jia Lu will mentor selected students and involve them into current research projects of the center.
2021-2022
GMU-21-01 Improving ground-level air quality prediction by integrating spatiotemporal new observation system datasets and numerical simulations 

  • Objective: Leverage our advanced cyberinfrastructure (CI) projects, such as EarthCube conceptual design, cloud computing and big data innovations, and big Earth data analytics, to produce an agile, flexible, and sustainable architecture for supporting efficient big spatiotemporal data ingesting and integration. An inter-disciplinary interoperable model will be refined from our past investigations funded by NSF/NASA on WRF for dust storms, WRF-Chem, CMAQ for NO2, voxel-based cellular automata simulation and Earth science research on ground-level data collection and integration. 
  • Major task: 1. Data preprocessing and spatiotemporal collocation; 2. ML-based data preprocessing and downscaling; 3. AQ model simulation; 4. post processing; 5. evaluation and testing 
  • Example: AQ database, prediction and monitoring systems of LA city 
  • Expected results: 1) a robust, high-fidelity ground level AQ dataset for geoscience research of both atmospheric and Earth science divisions; 2) an integrated and re-interfaced advanced cyberinfrastructure for fusing and collocating spatiotemporal AQ data from satellite, airborne, ground and in-situ observations; 3)   an improved high-resolution AQ model to facilitate metropolitan area forecasting; 
2021-2022
GMU-21-05 PM 2.5 retrieval and spatiotemporal downscaling using earth observation data 

  • Objective: Develop an innovative methodology to retrieve the PM2.5 in global scale and further downscale the spatiotemporal resolution to 1 km and hourly level in some key regions, using artificial intelligence (AI) models. 
  • Major task: 1. Deep learning for PM 2.5 retrieval using satellite remote sensing, model simulation and ground observation; 2. Deep learning for PM2.5 prediction and downscaling using meteorological data with AOD spatial pattern. 
  • Example: AQ database, prediction and monitoring systems of LA city 
  • Expected results: 1. PM 2.5 estimation covering global scale; 2. hourly 1km*1km (500m*500m) PM2.5 for LA region 
2021-2022
STC-21-01 Expand campus reopen to a school system by considering population density and human dynamics 

  • Objective: Expand current school/campus reopen decision support system to accommodate county-based school system reopen decision support 
  • Major task: expand current school reopen model to predict/simulate COVID-19 cases trajectories for multiple schools in a county under specific control strategies with population density and human dynamics dataset as input 
  • Examples: school system simulation in Fairfax County, VA 
  • Expected results: An operational web service to assist county-based school system reopen decision support during COVID-19 pandemic. Potentially extend to simulate stadium for sports, industry or research campus like Goddard for using by chief medical officer or related decision makers 
2021-2022
GMU-21-02 Spatiotemporal Open-Source workflow with COVID-19 and Cloud Classification as an example 

In recent times, Deep Learning (DL) has become an important tool to discover patterns and predict earth science processes. In most cases, open-source code for the DL models is shared to perform research. While it is easy for subject matter experts or tech-savvy to quickly set up the computing environment and replicate the DL research, non-programmers or beginners find it difficult to utilize open-source code. To mend this gap, this project aims to develop a formalized process and workflow to effectively publicize and share DL research for Earth System applications so that people from any background can effectively replicate, reproduce, and reobtain the results. The open-source workflow primarily consists of three major phases: (i) open-source software development, (ii) sharing and maintenance, and (iii) reproducible research. Recently, we publicized the rainy cloud detection deep learning model. The open-source activities for rainy cloud detection applications include (i) testing the cloud classification DL model in a various computing platform that supports CPU, single-GPU, and multi-GPU with Windows and Ubuntu OS, (ii) documenting the steps to reproduce the research and creating a tutorial video (ii), sharing the deep learning model, training datasets, user guide, tutorial video, and interpretation of the model results with the community.

2021-2022
GMU-21-03 Using Machine Learning Methods to Improve the Categorization and Answers of Health Questions 

  • Objective: develop an automatic question answer system for questions regarding health data, information and knowledge with better query understanding, ranking, and recommendation. 
  • Major task: Collect and Index Health and Human Services Experts Knowledge from historical database; Build an HHS knowledge base; Implement a question understanding tool; Build a smart search engine for user queries 
  • Example: An organization may (the United States Department of Health & Human Services) receive 1,000s to 100,000s of related questions from the public on a daily basis 
  • Expected results: A health-specific spatiotemporal question answer portal 
2021-2022
STC-20-01 Innovating a computing infrastructure for spatiotemporal studies 

  • Objective: upgrade our current CI structure to support increasing demands on spatiotemporal innovations and computing 
  • Major task: obtain a 600-node cluster from NASA Goddard CISTO, and evolve the current system management and monitoring web services  
  • Examples: all current center projects need computing support 
  • Expected results: An operational CI to support all current and ongoing research from ST center and potential external usage needs, especially computationally intensive tasks. Projects from members would also be welcomed to use the facility. 
2020-2024
STC-19-02 Spatiotemporal Innovation Testbed 

  • Objective: Artificial Intelligence (AI) is powering numerous applications, including Earth science applications. This computing testbed is to identify how mature a computing infrastructure is, especially GPU-based computing, to support AI-based earth science applications.   
  • Major task: Test how the DGX GPU cluster could help users better leverage the GPU platform and computing.  
  • Example: Data uploading, Image Mapping: Generate a Thematic Map based on EO data, Agriculture Location Identification for Grapes, Text Classification, ArcCI: Detect Sea ice changes and their correlation with climate indicators; Downscale air quality data 
  • Expected results: Test DGX support to earth science applications; optimization recommendation for DGX operation; A hand-on tutorial to support GPU enabling Earth science applications. 
2021-2022
GMU-21-04 Developing Cloud-based Image Classification Management and Processing Service for High Spatial Resolution Sea Ice Imagery 

  • Objective: develop and maintain a high-performance image classification service to GPU enable rapid sea ice processing for climate and cryosphere research  
  • Major task: expand the current framework to enable multi-classification algorithms to be run on GPU cluster and reducing overfitting under lighting contexts, misclassification between thick/thin ice (with ATM elevation data), and deploy to an operational facility 
  • Example: arctic sea ice and NASA cryosphere research, climate change and natural hazards  
  • Expected results: an operational high resolution image classification system for earth science applications 
2021-2022
GMU-17-04 Geo-JPSS Flood Detection

Flood detection software has been developed to generate near real-time flood products from VIIRS imagery. SNPP/JPSS VIIRS data show special advantages in flood detection. The major activities and accomplishments specific objectives in the reporting period is Flooding application. The plan for next reporting period is to 1) improvement current flood product, 2) develop 3-D flood parameters: flood water surface level, flood water depth, high resolution flood maps and 3) Further analysis on regional flood patterns. 

2021-2022
Hvd-21-01 Enabling replicable spatiotemporal research with virtual spatial data lab
This project is a continuing effort based on achievements of the Spatial Data Lab project. It is designed to provide a new generation of data services with cutting-edge methodology and technology for reproducible, replicable, and generalizable spatiotemporal research. It will allow researchers to develop case studies with easy-to-use workflow tools and share the case study as a package with others. The project will also support case-based training and teaching programs for multi-disciplinary and inter-disciplinary research in the applications of public health, economics, urban planning, social science, and others. In detail, this project will expand Spatial Data Lab’s capabilities and collaborate with various academic and business partners on the following missions: 

  • Promote Spatial Data Services. Collect and integrate more datasets from partners and various sources and provide standard data services for data access, integration and sharing.    
  • Tools Development for Spatial Data Analysis. Enrich current workflow platform to build spatial data analysis tools, such as hotspot analysis, spatial correlation analysis, geographical regression modelling, and spatiotemporal modelling.  
  • Workflow based Spatial Data Case Studies. Develop easy-to-use workflows to lower the barrier for spatiotemporal data analysis and build a case study repository for reproducible, replicable and generalizable research.   
  • Training Programs for Spatial Data Science. Collaborate with partners to organize a series of training programs on different spatial data science topics, such as urban development, public health, human movement, and environment.  
2021-2022
Hvd-21-02 Assessing household preparedness for Covid-19 in Bangladesh
This project assesses household preparedness for COVID-19 in Bangladesh with special attention to district-level inequalities. The definition of COVID-19 prepared households is based on guidelines from WHO. A household is considered as prepared for COVID-19 when it meets the five conditions: (1) adequate space for quarantine, (2) adequate sanitation, (3) soap and water available for handwashing, (4) phone available for communication, and (5) regular exposure to mass media. The main data source is the 2019 Multiple Indicator Cluster Surveys (MICS) for Bangladesh. The study investigates the association between the district-level prevalence of COVID-19 and household preparation and identify those district-level factors (e.g. population density, economic development, health system performance) that are associated with household preparation for COVID-19.  Findings from this study will provide policy makers in Bangladesh and other stakeholders with solid evidence for improving the situation in those households with poor preparedness for COVID-19.   
2021-2022
Hvd-21-03 Developing the International Geospatial Health Research Network (IGHRN)
The concept of the International Geospatial Health Research Network (IGHRN) has prompted a series of high-level workshops and symposia on geography and international health research in recent years.  With a focus on Fostering International Geospatial Health Research Collaborations, leading GIScience and health researchers from North America, Asia, Europe, Africa and Latin America identified an interim Steering Committee to develop and sustain an operational IGHRN Network. The IGHRN Secretariat functions and management are jointly handled at the two hub universities, Harvard University and the Chinese University of Hong Kong (CUHK). An International Advisory Committee comprising leading geospatial and health researchers from around the world is also being developed.         

The International Geospatial Health Research Network aims to share new international research and data, help develop geospatial health methods, and support new technologies to foster international collaborations and synergies across borders, and to bridge the gap between GIScience health research and the needs of health practitioners on the ground. 

After the COVID-19 pandemic, it is now clear that the IGHRN is much needed, and that an expanded IGHRN has never been needed more than it is now. With that in mind, the IGHRN Steering Committee has recently begun to restructure the IGHRN, with multiple university and organizational affiliates involved.  

We welcome the engagement, ideas, participation, and funding of the International Geospatial Health Research Network by the NSF, NIH, non-governmental organizations, foundations, and private-sector geospatial and health tech companies, as we develop and expand the IGHRN. 

2021-2022
GMU-20-04 Using Machine Learning Methods to Improve the Categorization of Health Questions 

Currently, there is not any geospatial dataset that tracks the growth or deterioration of roads and railways worldwide. This has been a major obstacle in assessing the effectiveness of transportation development projects, which is important for developing countries to make informed decisions on these expensive investments. We propose to develop an active learning-based platform to generate such data for the benefit of a broad research community with interest in transportation evaluation. This system will facilitate human annotators to map roads and railways using historical and up-to-date high-resolution satellite imagery. It has three building blocks: First, combine pixel-wise segmentation-based and graph-based neural networks to generate proposed roads connections based on existing labels from Open Street Map; Second, enable annotators to accept or edit correctly predicted roads, reject false positives; Third, internalize the inputs from annotators and retrain the model with the new data, which will reinforce the model to make better predictions over time. As a proof of concept, our first application of the system is analyzing the impact of the Belt and Road Initiative (BRI), which embodies unprecedented transportation upgrade and construction projects in Asia in the past decade, on economic development in related countries. Applying recent developments in remote sensing to satellite imagery before and after BRI projects were undertaken, we will link the extracted road and rail networks with the detected expansion of urban areas detected from a larger set of daytime and nighttime imagery, and estimate the impact of BRI investments on the spatial distribution of economic activity.

2020-2021
Hvd-20-04 Developing an active-learning based platform to evaluate the Impacts of China’s Belt-Road Initiatives using high-resolution satellite imagery 

Currently, there is not any geospatial dataset that tracks the growth or deterioration of roads and railways worldwide. This has been a major obstacle in assessing the effectiveness of transportation development projects, which is important for developing countries to make informed decisions on these expensive investments. We propose to develop an active learning-based platform to generate such data for the benefit of a broad research community with interest in transportation evaluation. This system will facilitate human annotators to map roads and railways using historical and up-to-date high-resolution satellite imagery. It has three building blocks: First, combine pixel-wise segmentation-based and graph-based neural networks to generate proposed roads connections based on existing labels from Open Street Map; Second, enable annotators to accept or edit correctly predicted roads, reject false positives; Third, internalize the inputs from annotators and retrain the model with the new data, which will reinforce the model to make better predictions over time. As a proof of concept, our first application of the system is analyzing the impact of the Belt and Road Initiative (BRI), which embodies unprecedented transportation upgrade and construction projects in Asia in the past decade, on economic development in related countries. Applying recent developments in remote sensing to satellite imagery before and after BRI projects were undertaken, we will link the extracted road and rail networks with the detected expansion of urban areas detected from a larger set of daytime and nighttime imagery, and estimate the impact of BRI investments on the spatial distribution of economic activity.

2020-2021
Hvd-20-03  Historical Forest Changes Detection Using Satellite Imagery and Google Earth Engine 

In the southeastern US region, over 90% of forests are privately-owned and managed. To achieve sustainable timber production from these forest lands, understanding the forest change history and continuous monitoring of forest land, including harvest and replantation, are essential. The objective of this project is to build a software solution that uses the 35-year history of satellite imagery from Google Earth Engine to recreate the silvicultural history of timberland in the southeastern US region. Specifically, this pilot project takes Union County, South Carolina as the pilot study area, tests the effectiveness of methodologies based on time series satellite data on Google Earth Engine for identifying hardwood, natural or planted pine, and mixed hardwood/pine forest on satellite imagery; and detecting their silvicultural history such as clear-cuts, natural-growth or replanted, age and height. The output may support economic acquisition of under-managed and privately owned timberland in the Southeast United States and sustainably manage the timberlands.

2020-2021
STC-20-02  Spatiotemporal Analytics of COVID-19’s Second-Order Impacts on Global Vulnerable Urban Areas 

This project addresses the role of spatiotemporal data, including open data, upon the understanding and mitigation of impacts from the global COVID-19 pandemic.  The research undertaken will focus on possible long-term and second-order impacts of COVID-19 and the responses that have been enacted at multiple scales, from multinational regions to neighborhood-levels.  Development backsliding during this pandemic is a high risk for developing countries and rapidly growing cities due to new migration patterns, a collapse of informal economies, lacking supplies, disparate basic services and health sites, and overcrowded informal settlements. A key goal of this project is to facilitate discussion and conduct research and reporting to inform participatory mapping and open data creation taking place in developing countries to mitigate COVID second order impacts.

2020-2021
GMU-20-03 Spatiotemporal Analysis of Medical Resource Deficiencies under COVID-19 Pandemic

The COVID-19 pandemic swept the entire world in the past 5 months and the U.S. became the epic center with the most confirmed cases. Although many states are reopening for the economy, the risk is still high and there are many debates about resurgence of the outbreak and high pressure on the medical system with immature openings. Sufficient medical equipment and health care professionals are critical to save lives and better prepare our communities. Accurately assessment and predication of the medical resource demands are important to avoid over committed (e.g., NYC didn’t use many resources asked for) and under committed (e.g., there’s only a few ICU beds in the Alabama capital).
We propose to develop a timely assessment system of medical resource demands based on current confirmed cases and hospitalized patients as well as ML/AI based prediction. This system (shown in the following figure) will be based on our current spatiotemporal distribution and demands of medical resources in USA at county level for COVID-19 pandemic. The system dashboard supports monitoring, analyzing, visualizing, and sharing the medical resource and analyzed results. The medical resource includes county-based summary of licensed beds and ICU beds from hospital and medical agencies, and medical stuff, specifically critical care stuff for COVID-19 treatment. Integrated and analyzed with dynamic active cases, the medical resource dynamic index is created and calculated in real time to show the medical resource deficiencies in the U.S. under COVID-19 Pandemic.

2020-2021
GMU-20-01  Improving the Air Quality in the Urban Setting 

Climate change and pollutant emissions continue to worsen our breathing air, which is killing 7 million people every year according to World Health Organization. 1/3 of deaths from stroke, lung cancer and heart disease are due to air pollution (https://www.who.int/airpollution/news-and-events/how-air-pollution-is-destroying-our-health). Based on EPA observations, Climate Central found that 40 U.S. cities had at least 20 unhealthy air days since 2015, many of them experienced an uptick in unhealthy air days in recent years. For example, over 100 accumulated days with unhealthy air quality (AQ) in the past two decades are observed in Los Angeles. Timely forecasting air pollution and disseminating the results to citizens would help save lives and improve their health. It has been a long time gap dreamed to be filled by atmospheric scientists and urban managers. Fortunately, the increasingly available low-cost sensors, Internet of Things, and satellite observations start to provide new Earth observation system for feeding into numerical simulations to enhance the reliability and accuracy of AQ prediction. The emergence of 5G mobile technologies also brings enormous benefits to AQ observation with higher data transmission speed and more connected networks. However, it is critical and challenging to integrate spatiotemporally heterogeneous observation data with the numerical AQ prediction. Using Los Angles as an example, we propose to fuse a variety of geoscience observations from satellite to the ground-based Internet of Things, feed into numerical methods-based AQ simulations models, and output the results to be validated by and disseminated to academic geoscientists and citizens.

2020-2021
GMU-20-02 Agent-based multi-scale COVID-19 outbreak simulation

The outbreak of the Coronavirus disease 2019 (COVID-19) is becoming a globally pandemic, which affects deeply in daily life of people in China, Spain, Italy, the U.S. and many other countries across the world. Many effective policies and strategies has been made to slow down the spreading of COVID-19 for different areas around the world, which could potentially be considered as guidance to prevent the possible outbreaks of places and counties that have not been serious effected by the virus yet. However, the question will be, how to identify the possible outbreak places with existing observation-based evidence? Agent-based models (ABMs) are widely used media to be applied as standalone simulator or be integrated with models in related disciplines to help strengthen existing studies including the field of infectious disease epidemiology. In previous epidemiological studies, ABMs have been adopted to simulated and predicted the effectiveness containment strategies under different policies, the time and space of outbreaks, medical resource deficiencies, and impact on logistics systems. In our study, we propose to develop a comprehensive ABM-based simulator coupled with multivariate impact factors such as spatiotemporal distribution of coronavirus, human migration and activities, climate conditions and environmental factors, containment strategies and policies to reveal and predict the pandemic pattern of COVID-19 at different scales such as county-level, state-level, nation-wide and even global scale. We will then apply the simulation model to describe the multi-scale COVID-19 outbreak patterns with certain confidence level as well as predict the possibility of outbreak in some places like India and South Africa, which may help prevent the pandemic and save lives in those no outbreak areas.

2020-2021
Hvd-20-01 Cloud-based Large-scale High-resolution Mangrove Forests Mapping with Satellite Big Data and Machine Learning

Mangrove forests make up one of the most productive ecosystems on the planet, providing a variety of goods and services from which we benefit. In addition, mangrove forests have the capability to sequester four times more carbon dioxide than upland forests, mitigate the impacts of natural hazards on coastal communities, and support biodiversity conservation. However, they are being destroyed at an alarming rate by human activities such as aquaculture, agriculture, and coastal development. To characterize mangrove forest changes, evaluate their impacts, and support relevant protection and restoration decision makings by government agencies and NGOs, accurate and up-to-date mangrove forests mapping at large spatial scales is essential.

Available large-scale mangrove forest data products were created commonly with 30 m Landsat imagery, and significant inconsistencies remain among these data products. With high resolution satellite data (e.g., Sentinel-1 and Sentinel-2) open to the public, the availability of high performance cloud computing, and the recent progresses in machine learning, it has become feasible to map coastal mangrove forests at large spatial scales with better resolution, accuracy, and frequency.

The objective of this proposed project is to develop a methodology that can be used for generating 10 m mangrove forest spatial distribution data products for any region across the globe annually, therefore providing the most accurate information about the spatial temporal changes of mangrove forests and effectively supporting mangrove ecosystem protection and restoration efforts. Our approach is to combine satellite big data processing on cloud platform (e.g., Google Earth Engine) and machine learning algorithms (e.g., Neural Network, Random Forest) based on the knowledge gained from our NASA project. Study areas will be selected from different regions of the globe, accuracy will be assessed quantitatively, and mangrove forest maps will be compared with existing mangrove data products.

2020-2021
STC-19-00 Center Introduction and Progress Report

Many 21st century challenges to contemporary society, such as natural disasters, happen in both space and time, and require spatiotemporal principles and thinking be incorporated into computing process. A systematic investigation of the principles would advance human knowledge by providing trailblazing methodologies to explore the next generation computing models for addressing the challenges. This overarching center project is to analyze and collaborate with international leaders, and the science advisory committee to generalize spatiotemporal thinking methodologies, produce efficient computing software and tools, elevate the application impact, and advance human knowledge and intelligence. Objectives are to: a) build spatiotemporal infrastructure from theoretical, technological and application aspects, b) innovate the spatiotemporal studies with new tools, systems, and applications, c) educate K-16 and graduate students with proper knowledge, d) develop a community for spatiotemporal studies from center sites and members at regional, national to global levels through IAB meetings, symposium, and other venues. 

2019-2020
Hvd-19-01 CDL: Developing an online spatial data sharing and management platform

This project is to develop an online platform for the creation, management and sharing of spatiotemporal data, analytical tools and study cases, nicknamed Spatial Data Lab (SDL). Currently, Harvard’s Dataverse and WorldMap have been integrated with the SDL platform. Data-driven analytical workflows are sharable and accessible from Harvard Dataverse with encrypted links to the SDL platform. This year the project takes COVID-19 as one of the case studies. The team has been actively building resource repositories for COVID-19 research since January. We are providing standardized datasets, executable workflows and training materials on the SDL platform for collaborating researchers to easily and quickly conduct research, enhance methodology, publish results, and deliver education, on COVID-19 related research topics.

2019-2020
Hvd-19-02 Building a geospatial differential privacy server for shared mobility data

This proposal is to build differential privacy into ride share data to allow government analysts to make useful queries on telemetry data to learn generalized patterns, while enabling individual level location data to obtain the strong privacy guarantees of differential privacy such that no re-identification attack is possible. Moreover, the worst-case amount of individual information possible to be leaked by any published results can be precisely and formally measured, so that cumulative privacy loss across all access to the system can be monitored.

2019-2020
Hvd-19-03 Using Internet Remote-Sensing to Estimate High-Precision Connectivity Statistics

The mass adoption of the Internet has boosted the demand for scientific explanations about the effects of digitalization. What is the impact of social media on elections and polarization? What is the effect of digital technologies in economic growth, inequality or unemployment? How is public health affected by increased access to medical websites?

Official statistics typically provide a country-year resolution, but researchers need more precision in order to take into account variation inside countries such as urban versus rural areas, and also shorter-term effects of seasonal dynamics and shocking events. In addition, researchers working with highly precise Internet data need to address the challenges introduced by privacy legislation as well.

The Internet Connectivity Statistics Dataverse is the most precise dataset of Internet connections for scientific research available and it contributes to overcome both the precision and the privacy related challenges. First, we analyze the global traffic of the Internet using remote sensing to estimate connectivity by months, and down to city resolution. As we rely on direct observation of the Internet, we can get estimates also in areas where official statistics are not available, and data cannot be retrieved, such as in the case of authoritarian regimes or territories experiencing political violence. Second, we estimate connectivity statistics using differential privacy algorithms, and we test the accuracy of our estimates. Finally, we make the statistics available for the entire research community thanks to the Harvard University Dataverse, the most prominent research data sharing software, maintained by the Institute for Quantitative Social Science.

2019-2020
Hvd-19-04 Scaling K-nearest Neighbor Calculations using Geohash

Spatial Clustering, Indexed Search, and Compression
We propose to develop a practical, cost-effective, easy to use platform to perform fast geospatial k-means clustering on big geospatial datasets. The system makes use of mutually reinforcing optimization techniques: geohashing, disk clustering, index based searches, and data compression, to build a novel system that makes the normally slow and resource intensive process of spatial clustering faster and less expensive than alternatives. In tests using an input dataset of 180 million point features where K=1000, we achieved average throughput of 200,000 distance calculations per second, to generate 180 billion measurements on a medium-sized Amazon instance. To make system easy for any analyst to implement we offer the option of an Amazon AMI deliverable which replicates the entire computation environment and comes with all required libraries preinstalled and configured. Once the AMI is launched, the system is ready for data loading, and all calculation work including compression and storage of results, is handled automatically.

2019-2020
Hvd-19-05 Leveraging Geovisual Analytics and Digital Epidemiological Methods for Emerging Outbreaks of Infectious Diseases after Natural Disasters in Developing Regions

Infectious disease outbreaks triggered by natural disasters (e.g. floods, earthquakes) pose great challenges to disease surveillance, especially in developing regions because of the loss of homes, displacement of population, damaged health infrastructure, and long reporting delays. Digital epidemiological methods have emerged in the last decade as a complementary alternative to provide near real-time disease activity estimates, in the absence of timely and accurate reporting from traditional healthcare-based surveillance systems. Most digital epidemiology efforts to date have focused on the computational modeling challenges of tracking diseases, and only a few of these have investigated the evident potential that would emerge from involving humans into the analytical and decision-making processes that emerge from the use and interpretation of these digital epidemiological methods. Here we aim at developing a human-centered real-time disease surveillance system with the goal of improving the surveillance and response to emerging outbreaks of infectious diseases caused by natural disasters in developing regions. We plan on focusing on the recent cholera outbreaks that have emerged after the landing of cyclones Idai and Kenneth in southeastern African nations, such as Mozambique.

2019-2020
Hvd-19-06 Elevating Research Excellence with Data Repository and AI Ecosystem

Dataverse is an open source data repository platform where users can share, preserve, cite, and explore research data. RMDS Lab is a startup company developing transformative technologies for research with big data and AI. This project is to establish a collaboration between the two teams and two platforms to create synergy that will advance the shared goal of supporting worldwide scholars in data-driven research. The main objective of this project is to explore solutions to apply AI technology in evaluating data science studies, provide measurable references for data scientists on the accuracy, impactfulness, replicability, applicability, and other merit scores of data science study cases; and to promote high-quality data science research through platform development, data sharing, community building, and user training. The Coronavirus crisis has strengthened the collaboration between the two organizations expanding the project to use datasets from not only the Harvard Dataverse but also the over 60 Dataverse installations worldwide.

2019-2020
GMU-19-01 Cloud classification

Cloud types, coverage and distribution have significant influence on the characteristic and dynamic of global climate. They are directly related to the energy balance of the earth. Therefore, accurate cloud classification and analysis are essential for the research of atmosphere and climate change. Cloud classification assigns a predetermined label to cloud in the image, e.g., cirrus, altostratus and altocumulus. With cloud segmentation, satellite imagery can be utilized to support a series of local mesoscale climate analysis like rainy cloud detection, cyclone detection, or extreme weather event (e.g. heavy rainfall) predictions. However, it is a challenging task to distinguish different clouds from satellite imagery because of intraclass spectral variations and interclass spectral similarities. 

Traditionally, cloud types are classified using selected features and threshold such as cloud-top pressure (CTP), cloud optical thickness (COT), brightness temperature (BT) and multilayer flag (MLF). One drawback is that the model accuracy heavily relies on threshold and feature selection. The past years have witnessed the successful deep learning applications in automatically feature selection for object detection from images with the aid of CNN model and its variants such as VGGNet, ResNet. Inspired by successful applications of deep learning in computer vision, we propose to implement an automatic cloud classification system based on deep neural network to identify the 8 kinds of cloud from geostationary and polar orbit satellite data, with cloud types from 2B-CLDCLASS product of CloudSat-CPR as the reference of label. 

2019-2020
GMU-19-02 Big data analytics for space situational awareness

Space situational awareness (SSA) learns current and predictive knowledge of space events, threats, activities, conditions and space system (space, ground, link) status capabilities, constraints and employment. With data collected from telescopes, satellites and other sources, thousands of space objects are tracked, cataloged, and maintained, however, big observation data need to be collected constantly to distill such kinds of knowledge, which pose grant challenges to data management system. The goal of this project is to develop a big space observation data analytical platform to better assist space situational awareness. The distributed storage layer supports storage and access to space observation data with parallel I/O. The metadata layer will manage metadata and interact with a smart search engine to provide efficient and accurate data discovery functionalities. The analytical layer serves as an efficient and effective tool to mine spatiotemporal patterns, detect and predict events in near-Earth space. Finally, the visualization layer presents the orbit of natural and manmade objects in the near-Earth space. By distilling knowledge from dispersed observation data, this big data analytical platform is expected to advance space situational awareness across government agencies and scientific communities.

2019-2020
GMU-19-03 Planetary Defense 

Programs like NASA’s Near-Earth Object (NEO) Survey supply the PD community with the necessary information that can be utilized for NEO mitigation. However, information about detecting, characterizing and mitigating NEO threats is still dispersed throughout different organizations and scientists, due to the lack of structured architecture. This project is aimed to develop a knowledge base and engine to provide discovery and easy access to the PD related resources by developing 1) a domain-specific Web crawler to automate the large-scale up-to-date discovery of PD related resource, and 2) a search ranking method to better rank the search results. The Web crawler is based on Apache Nutch, one of the well-recognized highly scalable web crawlers. In this research, Apache Nutch is extended in three aspects: 1) a semi-supervised approach is developed to create PD-related keyword list; 2) an improved similarity scoring function is utilized to set the priority of the web pages in the crawl frontier; and 3) an adaptive approach is designed to re-crawl/update web pages. The search ranking module is built upon Elasticsearch. Rather than using the basic search relevance function of Elasticsearch, a PageRank based link analysis and an LDA based topic modelling approach are developed to better support the ranking of interconnected web pages. 

2019-2020
GMU-19-04 Micro-scale Urban Heat Island Spatiotemporal Analytics and Prediction Framework

As one of the adverse effects of urbanization and climate change, Urban Heat Island (UHI) can affect human health. Most researches have been relying on remote sensing imagery or sparsely distributed station sensor data and focusing on the broad understanding of the meso- or city- scale UHI phenomenon and mitigation support. However, challenges remain for the micro-level. This project aims to: 1) build an in-depth investigation of the human-weather-climate relations for the urban area; 2) fill the gap between short-term weather impact effects from buildings, traffics, human mobilities, and long-term microclimate from understanding such relations with real-time urban sensing (IoT) data; 3) establish a machine-learning enabled ensemble model for fast near-future temperature forecasts by considering the human-weather-climate relationships; 4) provide guideline for the precautionary local-human-activity management strategy design and implementation according to the forecasts to reduce public health-related risks, allowing better urban living spaces.

2019-2020
GMU-19-05 Why is my training data never good enough? Quantifying training data representativeness for scaling up Convolutional Neural Networks to large geographic areas.

With increased availability of affordable, frequent, high resolution satellite imagery there has been a proliferation of machine learning methods, notably convolutional neural networks (CNN’s), for automated image interpretation. Despite this progress, the biggest challenge remains the insatiable demand for more training data that is most often produced by human operators – the same human operators that are already overwhelmed by the large satellite data volumes. The research community is grappling with methods to produce training data that are sufficiently representative of large areas to which they want to scale up their machine learning models. Although much emphasis has been placed on required computing resources and CNN architectures, our research has demonstrated that the structure of the training data is the overriding determinant of model accuracy and regional generalization of CNN classifications. The objective of our research is to explore to relationship between CNN classification accuracy and the representativeness of training data across increasing geographical distance and relate this to CNN feature space. To this end we are conducting experiments with automatically generated training data using ancillary data sets (building footprints available from counties and Open StreetMaps, building counts, high resolution land cover and percentage imperviousness available for the entire Chesapeake bay catchment) and 1m resolution aerial photography data (NAIP). The training data sets and the operational application area will be systematically varied across the MidAtlantic region to simulate diverse scenarios to tease out the underlying relationships. In the experiments CNN’s are applied to image tiles of NAIP imagery (200m X 200m) used in the following use cases (i) classify 1m resolution land cover, (ii) predict percentage imperviousness, predict total building footprint in image tile and (iii) predict number of buildings in image tile. The study will be applied to cities and their surrounding areas (30km buffer) distributed throughout the MidAtlantic region which largely coincides with the Chesapeake bay catchment. The research will address science questions such as, what is relationship between the representativeness (measured as dissimilarity in CNN feature vector) of training data in relation to increasing geographical distance between training and application areas and what is its influence on CNN classification accuracy? Simply put, can a CNN model trained with data from Fairfax (VA) be applied to multiple other cites at increasing distance away (e.g. Harrisburg MD) and how is the accuracy of these classifications related to distances in feature space and geographical space. This will help the community develop reasonable expectations for regional machine learning applications based on high resolution satellite imagery.

2019-2020
STC-15-02 Dynamic Mapping of Secondary Cities

Secondary Cities are non-primary cities, characterized by population size, function and/or economic status.  They are urban centers of governance, logistics, and production and are often data poor. This project is a global initiative to address critical geospatial data needs of secondary cities. The objective is to enhance emergency preparedness, human security and resilience. The project facilitates partnership with local organizations for data generation and sharing, using open source tools, and focuses on applied geography – human geography thematic areas.

2019-2020
GMU-18-01 Rapid extreme weather events detection and tracking from 4D/5D climate simulations

Climate simulations provide valuable information to represent the situations of the atmosphere, ocean and land. Increasingly advanced computational technologies and Earth observation capabilities have enabled the climate models to have higher spatial and temporal resolution, providing an ever realistic coverage of the Earth. The high spatiotemporal resolution also provides us the opportunity to more precisely pinpoint and identify/segment the occurrence of extreme weather events, such as tropical cyclones, which can have dramatic impacts on populations and economies. Deep learning techniques are considered as one of the breakthroughs in recent years, achieving compelling results on many practical tasks including disease diagnosis, facial recognition, autonomous driving. We propose to utilize deep learning techniques on the rapid detection of two extreme weather events: tropical cyclones and dust storms. Deep learning models trained on past climate simulations will inform the effectiveness of the approach on future simulations. Our technological motivation is that currently high-resolution simulations and observations have been generating too much data for researchers, scientists, and organizations to store for their applications. Machine learning methods performing real-time segmentation and classification of relevant features for extreme weather events can generate such list or database storing these features, and detailed information can be obtained by rerunning the simulation with high spatiotemporal data when needed.

2018-2019
GMU-18-02 Climate Indicators downscaling

Weather condition has become one of the most essential factors that people concern about in their daily life. People may want to check the weather forecast every day even every several hours especially in some activities that very sensitive to temperature, precipitation or winds, for example, taking flights, etc. But nowadays, civil weather forecasts data are issued every six hours, which is far insufficient to the actual needs. And the spatial resolutions of most weather data such as precipitation and surface winds are around several kilometers which are too coarse for some regions. This project will focus on weather data downscaling to fulfill the increasing needs for short term forecast with high spatial and temporal resolutions.

2018-2019
UCSB-18-01 The World Geographic Reference System v2 and 3D

A revision of the World Geographic Reference System (Clarke, Dana, and Hastings, 2002) is proposed. This new WGRS v2 is consistent with UTM/MGRS worldwide and further refines the MGRS grids to 1×1 km tiles, which can be individually named and registered. These simple changes facilitate the development of a dynamic, publicly accessible, Web-map-supported gazetteer, the Place Name System (PNS), analogous to the Internet Domain Name System (DNS).

2018-2019
GMU-17-01 Utilizing High Performance Computing to Detect the Relationship Between the Urban Heat Island and Land System Architecture at the Microscale

An urban heat island (UHI) is an urban area that is significantly warmer than its surrounding rural areas caused by human activities. UHI combines the results of all surface–atmosphere interactions and energy fluxes between the atmosphere and the ground, and closely linked to water, energy usage, and health-related consequences, including decreased quality of living conditions, and increased heat-related injuries and fatalities (Changnon et al., 1996; Patz et al., 2005). The prior studies have demonstrated the correlation between land system architecture and urban heat island based on the mediate or coarse spatial resolution data. However, these measurement scales may obscure stronger or different relations between land cover and land surface temperature (LST) because the mixture of land covers in coarse resolutions may hide the relations at finer resolutions where more urban land cover variability occurs (Zhou et al., 2011; Myint et al., 2013; Jenerette et al., 2016). Consequently, an evaluation of urban heat island at micro scales (e.g. < 30 m or even < 10 m), has become an important research goal to improve the understanding of the relationship between UHI and land system architecture (Small 2003; Deng and Wu, 2013; Jenerette et al., 2016; Li et al., 2017). Unfortunately, due to the limitation of computing capability and the efficiency of land-cover classification, most of these researches either selected sample sites from the study area or aggregated small patches into larger blocks, which may cause the bias or miss importation information in the final discovered relationships (Zhou et al., 2011). Based on the extensive experiences at NCCS and GMU for big spatiotemporal data analytics, Spark, cloud computing, and other technologies, we propose to extend the existing high-performance computing framework, ClimateSpark, to detect the relationship between UHI and land system architecture at the microscale. The convolutional neural network will be utilized to improve the accuracy of land-cover information, and the advanced spatial statistic algorithms will be implemented in parallel to provide the affluent computing capability to detect the relationship between UHI and land system architecture at the microscale.

2017-2018
GMU-17-02 Deep Learning for Improving Severe Weather Detection and Anomaly Analysis

Severe weather, including dust storms, hurricanes, and thunderstorms, annually cause significant loss of life and property. The detection and forecast of severe weather events will have an immediate impact to society. Numerical simulations and earth observations have been largely improved in spatiotemporal resolution and coverage, so that scientists and researchers are able to better understand and forecast severe weather phenomena. However, it is challenging to obtain long-term climatology for different severe weather events and to accurately predict events by even the most state-of-the-art forecasting models due to the uncertainties of model forecasting. We propose a cloud-based, deep learning system to mine and learn severe weather events (e.g. dust storms, hurricanes, and thunderstorms) and their patterns, as well as anomaly detections from forecasting results. The deep learning system will be tested using three use cases: dust storm, hurricane, and thunderstorm, and it will help meteorologist better detect and understand the evolution patterns of severe weather events.

2017-2018
GMU-17-05 Spatiotemporal Innovation Testbed

This project aims to 1) Develop methods for real-time, micro-scale data collection with moving sensors; 2) Augmentation and update of existing data, and generation of new data, new geometries; 3) Improve accessibility of public space using data that is nearly universally needed but unavailable; 4) Spread methods, workflows, knowledge to IAB members.

2017-2018
GMU-17-06 Real-time message georeferencing for geocrowdsourced data integration

This project aims to 1) Explore, develop, and demonstrate the use of gazetteer-based geoparsing for generating footprints from text-based location descriptions; 2) Develop a library of spatial footprints (simple, complex); 3) Spatial footprint used for message mapping; 4) Spatial footprint for quality assessment in crowdsourced geospatial data.

2017-2018
Hvd-17-01 Evaluating OmniSci, Open Source GPU-powered SQL Database

OmniSci provides a platform that leverages the parallelism and throughput of graphics processing units (GPUs) to achieve orders of magnitude speedups over CPU-based systems. Last year the team was challenged to find an economical way to run OmniSci with appropriate hardware resources. To solve the problem the team collaborated with Harvard’s Research Computing group and deployed OmniSci and PostGIS as public apps on Harvard’s Slurm-based computation cluster, which supports 2.5 million cuda (GPU) cores. The apps can now be deployed by any Harvard researcher or collaborator. OmniSci’s on-GPU data interoperability currently enables end-to-end workflows with Jupyter, Pandas and R. The cluster hosts more than 600 laboratories and 1000 installed applications, making it a rich research ecosystem for developing new use cases, and enhancement recommendations, especially given the importance of spatio-temporal analytics in the age of COVID. To further compliment this environment, the CGA installed Version 2.0 of the Geotweet Archive, a global, continually updated, spatially and temporally tagged, social media dataset.

2017-2018
UCSB-17-01 Siemens: Semantc Applicaton Logic Design for Subject Mater Experts

This project aim to design a semantc applicaton logic for subject mater experts. Four milstones are listed below: 1) Conceptualize and implement a framework and interface supportng the import and inclusion of SPIN rules and domain graphs. 2) Add logic validaton and executon capabilites to the workflow. 3) Develop export flters that will convert the logic to non-natve (RDF) executon formats, such as RIF or JSON. 4) Integrate and test components.

2017-2018
UCSB-17-02 Forecasting Future Urban Expansion in an African Secondary City, Douala, Cameroon: Transfer of Expertise in GIS and Land Use Change Modeling to Douala University

The goals for this project is 1) to bring visiting scholars from Douala University in Cameroon to a training session in the use of GIS and remote sensing to map land use and its changes, to map Douala’s built-up extent at multiple historical time periods; 2) to use the resulting data to create forecasts of long term urban growth and land use change in the region and 3) to promote informed and sustainable urban planning.

The project success will be measured in the number of people trained, the number of cities mapped and modeled, and the number of reports and papers created for use in planning and land management.

2017-2018
GMU-16-02 Cloud computing and big data management 2016-2017
GMU-16-03 Computing technology: SmartDrive 2016-2017
GMU-16-04 Health Mapping Incorporating Data Reliability 2016-2017
UCSB-16-01 Applications of High Accuracy and Precision Building Data for Urban Areas 2016-2017
UCSB-16-02 Urban Modeling in Uzbekistan 2016
-2017
UCSB
-16-03 an Open World Gazetteer 2016-2017
Harvard-16-01 HHyperMap 2016-2017
Harvard- 16-02 Semantic ally enhanced workbench for responsive big data geoprocessing and visualization 2016-2017
Harvard-16-03 Exploring relationships between cancer vulnerability/resilience and emotional condition/environment from social media 2016-2017
GMU-15-02 Upgrade the Delivery of NASA Earth Observing System Data Products for Consumption by ArcGIS

The content and format of NASA EOS data products are defined by their respective Science Teams, stretching back over the past 25 years. Many of these data models are ancient are difficult to consume with other geospatial tools. Specifically, these tools are, in some cases, unable to read the files and/or unable to interpret properly the data organization inside them so they cannot be visualized or analyzed. A solution that can apply to all these data products across NASA data centers would be valuable. We propose a plug-in framework which is developed based on GDAL open source library to interpret the non-compliant data. The framework should have the advantages of extensibility within the EOSDIS allowing the multiple NASA data centers construct their own plug-ins to adjust their data products.

2015-2016
GMU-15-03 Analyzing Spatiotemporal Dynamics Using Place-Based Georeferencing

The human world is a world of places, where verbal description and narrative use placenames to describe occurrences, locations, and events. The geospatial, computational, and analytical rely instead on metric georeferencing to place these occurrences, locations, and events on a map. The gazetteer is the linkage between these two worlds, and the means for translating the human world into the computational world. With a new emphasis on social media and crowdsourcing in geospatial data production, Gazetteers and the associated techniques of geoparsing and georeferences are a critical element of an emerging geospatial toolkit. We use gazetteers to validate the contributions of crowdsourced event data contributed by end-users and look at placenaming as a validation tool within quality assessment for geocrowdsourced data. Strategies and best practices for generating and maintaining gazetteer databases for georeferencing crowdsourced data will be explored, determined, and presented.

2015-2016
GMU-15-04 Using Sonification to Analysis Spatiotemporal Dynamics in High-Dimensional Data

The human senses are paramount in constructing knowledge about the everyday world around us. The human sensory system is also a key to geospatial knowledge discovery, where patterns, trends, and outliers can be detected visually, and explored in more detail. As the complexity and size of geospatial datasets increase, the tools for geographic knowledge discovery need to expand. This research looks at the use of sonification and auditory display systems to expand the visualization toolkit. First, we use sonification as a way of simplifying the exploration of large, multidimensional data, including space-time data, where certain dimensions of data can be removed from the visual domain and represented efficiently with sound, leading to more effective geographic knowledge discovery. Second, we use sonification as a means of redundant display to reinforce cartographic and geospatial aspects of spatial-temporal display in low-vision environments.

2015-2016
GMU-15-05 A Cyberinfrastructure-based Disaster Management System using Social Media Data

During emergencies, it is of significance to deliver accurate and useful information to the impacted communities, and to assess damages to properties, people and the environment, in order to coordinate responses and recovery activities, including evacuations and relief operations. Novel information streams from social media are redefining situation awareness and can be used for damage assessment, humanitarian assistance and disaster relief operations. These streams are diverse, complex and overwhelming in volume, velocity and in the variety of viewpoints they offer. Negotiating these overwhelming streams is beyond the capacity of human analysts and an effective framework should be developed to mine and deliver disaster relevant information in a real-time fashion.

2015-2016
GMU-15-06 FloodNet: Demonstrating a Flood Monitoring Network Integrating Satellite, Sensor-Web and Social Media for the Protection of Life and Property

Flooding is the most costly natural disaster, striking with regularity, destroying property, agriculture, transportation, communication and lives. Floods impact developing countries profoundly, but developed nations are hardly immune with floods claiming thousands of lives every year. The threat is increasing as we build along riverbanks and flood plains, construct dykes and levees that channelize flow, and as climate change brings increased extreme weather events including floods.The first line defense for protection of life and property is flood monitoring. Knowledge of floods is truly power when issuing warnings, managing infrastructure, assessing damage, and planning for the future. Information about active floods can be gleaned from satellite sensors, ground stations and sensor-webs, and harvested from social media and citizen scientists. This information is complemented by flood hazard or risk maps, and weather and climate forecasts. These flood information elements exist separately, but would be much more effective at producing actionable flood knowledge if integrated into a seamless flood monitoring network.Therefore, we propose to demonstrate a flood monitoring network that integrates flood information from satellites, sensor-webs, social media, risk maps, and weather/climate forecasts into a user-focused visualization interface (such as GIS or Google-Earth) that enables the production of actionable flood knowledge (FloodNet). We will largely focus on networking existing flood information elements available from government agencies, harvested from social media, and produced by satellite sensors. The demonstration will be performed in a historical context, focused on a few well-known recent flood events in the Mid-Atlantic region, with a vision for global real-time implementation. We will take advantage of recent advances in cloud computing, visualization tools, and spatial-temporal knowledge toolboxes in the implementation of FloodNet.The resulting flood monitoring network will guide civil protection officials, insurers and citizens as to current flood hazards and future flooding risks.

2015-2016
GMU-15-07 Benchmarking Timely Decision Support and Integrating Multi-Source Spatiotemporal Environmental Datasets

In the past decade, natural disasters have become more frequent. It is widely recognized that the increasing complexity of environmental problems at local, regional, and global scales need to be attacked by integrated approaches. Explosive growth in spatiotemporal data and emergence of social media make it possible and also emphasize the need for developing new and computationally efficient geospatial analytics tailored for analyzing big data. This project aims to provide decision support for life and property with maximum accuracy and minimum human intervention by leveraging near-real time integration of government satellite and model assets using HPC, virtual computing and storage environments, OGC standard protocols. Additionally, we are going to benchmark latency and science validity of end-to-end (E2E) solutions using machine-to-machine (M2M) interfaces to exploit NOAA, USGS, NASA environmental data from satellites, forecast models and social media data to generate more accurate and timely decision support information.

2015-2016
UCSB-15-02 Assessment and Applications of High Accuracy and Precision building data for urban areas

The company Solar Census has developed an unprecedented means by which high resolution (10cm) stereo overhead imagery is processed photogrammetrically to extraordinary levels of accuracy, and then models are applied that orthorectify and extract building footprints and roofs with unprecedented fidelity. Test acquisitions of new imagery have been supported by the Department of Energy for test areas in northern California, and new data are forthcoming for the entire state, and for the State of New York. Solar Census has an application that solves the solar equation across building roofs for identifying optimal locations for the placement of photovoltaic electric panels to generate distributed solar power. The purposes of the collaboration between Solar Census and the UCSB Geography site for the I/UCRC Center for Spatiotemporal Thinking, Computing and Applications are twofold: 1) complete an accuracy assessment to quantify the vertical and horizontal accuracy of the new data; and 2) explore innovative potential new applications of the data that could present new revenue streams and business opportunities for the data, which could potentially be available nation-wide.

2015-2016
Harvard-15-01 A Training-by-Crowdsourcing Approach for Place Name Extraction from Large Volumes of Scanned Maps

We propose to develop a training-by-crowdsourcing approach for automatic extraction of place names in large volumes of georeferenced scanned maps. Place names very often exist only in paper maps and have potential use both for adding semantic content and for providing search and indexing capabilities to the original scanned maps. Moreover place names can be used to strengthen existing gazetteers (place name databases), which are the foundation to support effective geotagging or georeferencing of many document and media types. The proposed solution will provide a map text extraction service and web map client interface that accesses the service. The extraction service will consume raw map images from standard WMSs, and output spatiotemporally labeled place names. The client will allow users to curate (i.e., update, delete, insert, and edit) extraction results and share the results with other users. The user curation process will be recorded and sent to the extraction service to train the underlying map processing algorithms for handling map areas where no user training has yet been done.

2015-2016
Harvard-15-02 Building an Open Source, Real-Time, Billion Object Spatio-Temporal Exploration Platform

There is currently no general purpose platform to support interactive queries and geospatial visualizations against datasets containing even a few million features against which queries return more than ten thousand records. To begin to address this fundamental lack of public infrastructure, we will design and build an open source platform to support search and visualization against a billion spatio-temporal features. The instance will be loaded with the latest billion geotweets (tweets which contain GPS coordinates from the originating device), and which the CGA has been harvesting since 2012. The system will run on commodity hardware and well known software. It will support queries by time, space, keyword, user name, and operating system. The platform will be capable of returning responses to complex queries in less than 2 seconds. Spatial heatmaps will be used to represent the distribution of results returned at any scale, for any number of features. Temporal histograms will be used to represent the distribution over time of results returned at any scale. The system will be capable of generating kernal density visualizations from massive collections of point measurements such as weather, pollution, or other sensor streams.

2015-2016
Harvard-15-03 Addressing the Search Problem for Geospatial Data

We are currently engaged in building a general purpose, open source, global registry of map service layers on servers across the web. The registry will be made available for search via a public API for anyone to use to find and bind to map layers from within any application. We are developing a basic UI that will integrate with WorldMap http://worldmap.harvard.edu (open source general purpose map collaboration platform) and make registry content discoverable by time, space, and keyword, and savable and sharable online. The system will allow users to visualize the geographic distribution of search results regardless of number of layer returned by rendering heatmaps of overlapping layer footprints. All assets in the system will be map layers that can used immediately within WorldMap or within any other web or desktop mapping client. Uptime and usage statistics will be maintained for all resources and these will be used to continually improve search. Core elements of this project are currently funded by a grant from the National Endowment for the Humanities, but there are important aspects which are not supported. For example, the grant focuses on OGC and Esri image services, though there exist many other spatial assets in need of organization, including feature services, processing services, shapefiles, KML/KMZ, and other raster and vector formats. There are also important types of metadata we are not handling. We have developed basic tools for crawling the web using Hadoop and a pipeline to harvest and load results to a fast registry, but there are many ways both crawl and harvest can be improved.

2015-2016
Harvard-15-04 HyperMap: An Ontology-driven Platform for Interactive Big Data Geo-analytics

Sensing technology and the digital traces of human activity are providing us with ever larger spatiotemporally referenced data streams. Computing and automated analysis advances are at the same time decreasing the effort of drawing knowledge from such large data volumes. Still, there is a gap between the ability to run large batch-type data processing tasks and the interactive engagement with analysis that characterizes most research. There appear to be three principal scales (in both volume and task time) of processing tasks: asynchronous summarization of billions of records by space-time and other relevant dimensions, synchronous analysis of the summary data using statistical / functional models, and interactive visual interpretation of model results. The forward workflow is becoming more and more common, but feedback from interpretation to refine the larger-scale process steps is still most often a logistical nightmare. We propose to develop a platform that flexibly links the three stages of geo-analysis using a provenance-orchestration ontology and OGC service interface standards such as Web Processing Service (WPS). The purpose of the platform will be to provide domain experts the tools to explore – iteratively and interactively – extremely large datasets such as the CGA geo-tweet corpus without spending most of there time in performing system engineering. Researchers will be able to leverage a semantic description of an analysis workflow to drill back from interesting visual insights to the details of processing and then trigger process refinements by updating the workflow description instead of having to re-write processing codes and scripts. The HyperMap platform is envisioned to support several approaches to big data summarization. Initial design targets include factorization of unstructured data such as geo-tweets, classification of coverages, and recognition of imagery feature hierarchies.

2015-2016
Harvard-15-05 Terrain and Hydrology Processing of High Resolution Elevation Data Sets

Raster data sets representing elevation are being released at increasingly high resolutions. The National Elevation Dataset (NED) has gone from 30m to 10m and is now available in many states at 3m resolution. At the local and state level, LIDAR-based elevation data is available for many locations, particularly coastal areas and those subject to flooding. As horizontal resolution improves, vertical resolution and accuracy are also improving, but while higher resolution is improving the ability to leverage these data sets for modeling hydrological flow, visibility, slope and other data processing operations, the exponentially larger size of the data sets is presenting significant data processing challenges, even with professional workstation GIS tools. Under this proposal, the project team will develop and implement new algorithms for performing parallel data processing on large raster data sets. The work will leverage the open source Apache Spark and GeoTrellis projects, both based on the Scala functional programming language. It will also take advantage of other open source efforts supporting data processing at scale, including the Hadoop Distributed File System (HDFS) and indexing tools such as Cassandra and Accumulo. The results of the work will be released under a business-friendly Apache2 license, and will be aimed at supporting execution of large elevation data processing operations on clusters of virtual machines. Specific processing operations may include: viewshed, flow accumulation, flow direction, watershed delineation, sink, slope, aspect, and profiling operations. The proposed work will be synergistic with other proposed research projects, including the HyperMap effort to classify terrain types and channel areas based on large, high resolutions elevation data sets.

2015-2016
Harvard-15-06 Feature Classification Using Terrain and Imagery Data at Scale

Drones, micro-satellites, and other innovations promise to both lower the cost and rapidly increase the amount of available raster imagery data. Initial use of this imagery is currently focused on supporting visualization of geospatial data. However, there is substantial opportunity to provide the ability to extract features from the imagery using simple user interfaces. Feature classification from raster imagery is not a new capability, and it is supported by several commercial workstation products. In addition, contemporary techniques rely not only on the imagery itself, but also leverage elevation data to improve the accuracy of the feature classification. However, the ability to do so with large data sets through a simple browser-based user interface is a significant challenge. Under this proposal, the project team will develop and implement a prototype web-based software tool that will be able to use a combination of elevation and imagery data to enable users to extract vector polygon features with real-time processing speeds. The work will leverage the open source Apache Spark and GeoTrellis projects, both based on the Scala functional programming language. It will also take advantage of other open source efforts supporting data processing at scale, including the Hadoop Distributed File System (HDFS) and indexing tools such as Cassandra and Accumulo. The results of the work will be released under a business-friendly Apache2 license, and will be aimed at supporting execution of large data processing operations on clusters of virtual machines. The proposed work will be synergistic with other proposed research projects, including the HyperMap project to classify terrain types and channel areas based on large, high resolutions elevation data sets and the Place Name extraction from historic maps project.

2015-2016
GMU-14-04 Developing a spatiotemporal cloud advisory system for better selecting cloud services

We propose a web-based cloud advising system to (1) integrate heterogeneous cloud information from different providers, (2) automatically retrieve update-to-date cloud information, (3) recommend and evaluate cloud solutions according to users’ selection preferences.

2014-2015
GMU-14-02 Developing an open spatiotemporal analytics platform for big event data

We propose to design a visual analytical platform to systematically perform inductive pattern analysis on real-time volunteered event data. The platform will be built based on tools and methods that were developed by us in previous studies. The accomplished platform will not only enable spatiotemporal pattern exploration of big event data in the short term, but also lay the concrete foundation for using volunteered data for tasks such as urban planning in the long term.

2014-2015
GMU-14-06 Incorporating quality information to support spatiotemporal data and service exploration 2014-2015
Harvard-14-01 Temporal gazetteer for geotemporal information retrieval

Place names are a key part of geographic understanding and carry a full sense of changing perspective over time, but existing gazetteers do not in general represent the temporal dimension. This project will develop, populate, and implement services for a place name model that incorporates realistic complexity in the temporal, spatial, and language elements that form a place name. Additional tools will be developed to conflate and reconcile place name evidence from authoritative, documentary, and social sources.

2014-2015
Harvard-14-04 Cartographic ontology for semantic map annotation

Map annotation produces highly-relevant, high-value information, whose utility however often critically depends on semantic interoperability. Achieving that requires an ontology-based, semantic web, linked open data approach. We will develop a key missing ingredient, the cartographic annotation ontology, to characterize the complex structures and rich visual, symbolic, and geospatial languages that maps use to represent geographic information.

2014-2015
Harvard-14-05 A Paleo-Event Ontology for Characterizing and Publishing Irish Historical and Fossil Climate Data

Integration of both Big and Little spatio-temporal data from different scientific domains is vital for validating climate models, as a single volcanic eruption, for example, can have a great effect. Yet observation of deep time events, without deep-time observers, means we must discern paleo-events through observation of fossilized, event-proxy, features. Using medieval monastic records, tree-ring data, ice core features and volcanic eruption phenomena to inform our efforts, we will develop a deep-time climate event observation ontology to characterize the nature and relationships of the data.

2014-2015
Harvard-14-06 Emotional City – measuring, analyzing and visualizing citizens’ emotions for urban planning in smart cities

Emotional City contributes provides a human-centered approach for extracting contextual emotional information from technical and human sensor data. The methodology used in this project consists of four steps: 1) detecting emotions using wristband sensors, 2) “ground-truthing” these measurements using a People as Sensors smartphone app, 3) extracting emotion information from crowdsourced data like Twitter, 4) correlating the measured and extracted emotions. Finally, the emotion information is mapped and fed back into urban management for decision support and for evaluating ongoing planning processes.

2014-2015
UCSB-14-01 Dismounted navigationIndoor Mapping Using Multi-Sensor Point Clouds 2014-2015
UCSB-14-02 Indoor Mapping Using Multi-Sensor Point Clouds

Develop and evaluate method for creating 3D indoor maps using point clouds generated by multiple sensor platforms.

2014-2015
UCSB-14-03 Pattern driven Exploratory Interaction with Big Geo Data 2014-2015