Proposed Projects

ID	Name	Period
GMU-23-03	ClassX: Automatic Labeling Tool The ClassX project is based on the development of an automatic training dataset labelling tool and online service to fill the gap of missing high-quality training image datasets. Novel spatiotemporal AI/ML-based capabilities are being developed to automatically classify, label, store, and share training datasets among a group of needed users. Specifically, 1) The auto labelling tool fills the gap between time-consuming, tedious manual labelling and the demands for enormous amounts of high-quality training datasets; 2) Well-labelled image datasets will be able to help build accurate and reliable machine-learning models across many science and engineering domains; 3) The image labelling tool can automatically classify and label digital images and thus provide image labelling services to various industries; 4) While initially designed for sea ice research that can automatically label sea ice images, the ICAP program helped us confirm that the tool and service have potential in many domains such as heliophysics, climate change, and biomedical industry.	2023-2024
DT-GMU-24-01	Geographic Information Extraction Using Foundation Model Media data, encompassing both news and social media, provides near-real-time spatiotemporal information. This data is valuable for extracting geographical locations and associated topics to gain insights into current events. Existing tools can extract topics and geographic locations separately from media content but often struggle to recognize the complex relationships between topics and geographic locations, particularly when multiple topics and numerous geographical features appear within the same paragraph. Large Language Models (LLMs), built on transformer architectures, are adept at capturing interdependencies and connections among diverse elements within data, enabling a nuanced understanding of contextual information. This project aims to leverage LLMs to classify news topics, extract geographic locations, geocode these locations, and extract date information, thereby revealing spatiotemporal patterns in near-real-time events. The techniques developed will be applied to events such as the Sudan conflicts, illegal organ trade, and the Ukraine-Russian conflicts.	2023-2024
GMU-23-02	Exploring the Transformative Potential of Foundation Models in Geoscience In recent years, foundation models, particularly transformer-based models, have emerged as powerful tools for natural language understanding and generation. Their versatility and adaptability have led to transformative impacts across various domains. This proposal aims to investigate the transformative impact of foundation models, particularly transformer-based models, within the field of geoscience. Geoscience, which encompasses diverse areas such as climate modeling, natural disaster prediction, and environmental analysis, can significantly benefit from the capabilities of these advanced AI models. We propose to adapt foundation models to geospatial data, creating a bridge between cutting-edge natural language understanding and complex geospatial analysis. By leveraging pre-trained knowledge and fine-tuning the models for specific geoscience tasks, we anticipate improved accuracy, efficiency, and novel insights in geoscience research. This interdisciplinary endeavor fosters collaboration between geoscientists and AI experts, ensuring that foundation models can address the unique challenges and opportunities within the geoscience domain.	2023-2024
iCorps	Research Translation Projects (I-Corps & Automatic Labeling Tool) The I-CORPS project will commercialize the Automatic Labeling Tool to reduce the time and resource spent on obtaining training datasets by millions of data science professionals, researchers and students, as well as many companies and other organizations. The product will fill the gap of time-consuming, tedious manual labelling and the demands for large amount of high-quality training dataset. While initially designed for sea ice research that can automatically label sea ice images, the ICAP program helps us confirm that the tool and service have potential in many domains. The iCorps will help us further focus on customer sector and the tool is also being translated to serve the heliophysics domain. In our recent endeavor, the GMU Innovation Commercialization Assistance program (ICAP), we conducted 23 interviews, revealing a substantial market interest spanning various domains, from geospatial industries to medical diagnostics. Building on this insight, the next step is to initiate the National Science Foundation’s (NSF) I-CORPS project. This endeavor aims to engage with over 100 potential customers to a) pinpoint a sector poised for early adoption and success, b) articulate a Minimum Viable Product (MVP) from the customer’s viewpoint, c) evaluate the feasibility and strategize for launching a startup led by a diverse team of undergraduate students, graduate students, and professors, and d) formulate well-defined proposals to foster innovation partnerships or secure SBIR/STTR program funding with startups or small businesses.	2023-2024
GMU-23-01	Uncertainty Quantification using PM2.5 Retrieval as an Example Ambient PM2.5 pollution poses a significant public health challenge, making precise monitoring crucial for informed policy decisions. While AI and machine learning (ML) models have demonstrated great potential in improving PM2.5 retrieval, a key challenge lies in understanding the confidence level of these models’ predictions. Standard AI/ML models typically generate point estimates without conveying the uncertainty or reliability of the projections, limiting their applicability in real-world air quality applications. In response to the challenge of lacking uncertainty quantification (UQ) in AI-based PM2.5 retrieval, we develop a systematic framework that integrates UQ methods into the workflow process. By incorporating probabilistic techniques such as Bayesian Neural Networks (BNN) and Monte Carlo Dropout (MCD), the framework provides predictions and quantifies the uncertainties involved. This probabilistic approach allows for more reliable insights into the model’s confidence under varying conditions, enhancing decision-making with a clearer understanding of prediction uncertainty.	2022-2023
STC-23-01	Spatiotemporal Open-Source Workflow with Training Datasets Auto-labeling Deep Learning (DL) has been increasingly adopted by various domains to discern patterns from processes of pressing challenges. While many open-source DL models are readily available for research replication, the replication is often challenging to navigate for non-programmers or novices. This project bridges this gap by establishing a streamlined, comprehensive workflow to make the replication, reproduction, and retrieval of results achievable for high schoolers and novices for broadening impact. The workflow includes three steps: i) open-source software development, ii) software and documents dissemination and maintenance, and iii) videos and tracking for reproducing. This year will include a unique element of auto-labeling training datasets, which is addressing also a AI/ML gap of lacking quality training dataset. The project will produce open source, open data, documents and videos to reproduce the research for open science.	2022-2023
HVD-23-01	Annual Large-Scale High-Resolution Forest Carbon Mapping by Combining Satellite Data, Forest Inventory Data, and Machine LearningForests account for 92% of all terrestrial biomass globally, storing more carbon than the total carbon stock in the atmosphere, with heterogeneous distribution across the Earth. Various factors especially human activities, climate change, and wildfires affect carbon stock in forests. For example, deforestation and forest degradation release much of the carbon stored by trees back into the atmosphere, and climate warming has the potential to cause forests to lose carbon. In addition, different ages and species of trees store different amount of carbon. While a lot of research has been done to estimate forest carbon stock, quantifying forest carbon at the appropriate temporal and spatial resolution to inform CO2 reduction strategies has proven challenging. For example, it is reported that the uncertainty of forest carbon estimates is greater than 30% for most individual 1km pixels. In addition, both high-resolution and high-frequency forest carbon mapping are needed for reducing the uncertainties and depicting the spatiotemporal variations, but are not yet available at large spatial scales. The objective of this project is to integrate satellite data, forest inventory data, and machine learning to achieve more accurate, annual, large-scale high-resolution forest carbon mapping. We will integrate all available satellite data products such as land cover land use, surface temperature, precipitation, elevation, slope, vegetation index, soil properties, etc., and all available forest inventory data that can be used to infer carbon density, and advanced machine learning algorithms including neural network and machine learning ensembles. The outcomes of this project include (1) identification of the set of satellite variables and other variables best suited for large-scale forest carbon mapping, (2) development of machine learning algorithms which can achieve higher forest carbon mapping accuracy, and can be applied to large scale, high resolution, and more frequent mapping, (3) generate forest carbon mapping data products for the area of study selected for this project. The outcomes of this project can be used to inform GHG reduction and climate change mitigation strategies for different countries and international organizations such as the UN REDD+ Program.	2022-2023
HVD-23-02	Heat Stressor Solutions (HS2) for Informal Settlements This proposal addresses the problem of urban heat islands (UHI) of informal settlements in partner cities to create a common pool of strategies, solutions, and tools useful between countries and across cultures. We engage a global network of geospatial experts to engage in an innovative multi-stakeholder process where vulnerable communities identify priorities and define solutions to UHI through a data-for-development approach. One of the integral themes for the 2023 G20 is the principle of “data for development” to reimagine the built environment highlighting the use of digital technologies to address the fight against poverty and the benefits realized when digital access becomes truly inclusive and widely accessible. Our overall framework is threefold: 1) to create a digital geospatial twin of informal settlements through mobile mapping (handheld and airborne devices) for slum upgrading to adapt to heat stress in selected partner cities; 2) to intersect state-of-the-art technologies and local knowledge to identify critical local needs for heat adaptation solutions; and 3) to establish and train local HS2 teams to build capacity for climate adaptation and resilience. The construction of digital twins of informal settlements (the physical place, the virtual counterpart, and the data connections between them) is the basis for long term data capture, maintenance, and monitoring to demonstrate their utility as part of the disaster risk reduction toolkit. This approach operationalizes existing technologies in a comprehensive and innovative way to utilize the best available data (e.g., satellite imagery and demographic data) to identify UHI. Coupled with local knowledge (e.g., access to services, locations of open space) digital twins can be assembled through mobile data collector teams to identify heat stressors (i.e., building type and materials, illegal dumping sites, impervious surfaces) and potential solutions (i.e., resilient building materials such as “cool” roofs, window types, marketplace shades, tree planting). Using the digital twin as the organizational data model, we are poised to implement an innovative mapping program to address heat solutions in informal settlements in partner cities.	2022-2023
STC-20-01	Innovating a Computing Infrastructure for Spatiotemporal StudiesWith AI-based processing, big data analytics, and simulations, researchers must have access to the computing power necessary to fulfill these demands. We have created a cross-site center-wide computing infrastructure to provide these capabilities to center members. The computing infrastructure consists of three computing clusters of 600 nodes used for various purposes within the center. This infrastructure offers advanced computing capabilities that all center projects utilize in their research. Continuous maintenance and support for the infrastructure enable center members to have reliable resources improving their research capabilities.	2022-2023
GMU-23-03	Transformational Adaptation for Climate Change using Geospatial Applications This project aims to identify transformational adaptations for climate change using geospatial applications in informal settlements of urban areas. Specifically, we aim to address how the Sustainable Development Goals (SDGs) can be localized to inform policies for climate adaptations in those places and populations most at risk.	2022-2023
STC-23-02	Transformational Adaptation for Climate Change using Geospatial Applications This project aims to identify transformational adaptations for climate change using geospatial applications in informal settlements of urban areas. Specifically, we aim to address how the Sustainable Development Goals (SDGs) can be localized to inform policies for climate adaptations in those places and populations most at risk.	2022-2023
HVD-22-02	Building a Ukraine Data Lab and Archive Recent events in Ukraine significantly impact the academic research community, yet high-quality, current, open-source spatial data for the Ukraine region is hard to find. Given the pivotal importance Ukraine has come to hold, both for the US and the rest of the world, we must address the gap to support both the academic community and the broader research community. The problem is not a lack of data but that data spreads across many organizations in various formats and structures. Researchers cannot adequately use most datasets because datasets that could be made public are not online. We propose to create a virtual lab that assembles the best current and historical datasets and data streams, together with a set of valuable tools, to accelerate research on this critical region. Collaborations with Harvard, other universities, and organizations inside and outside Ukraine will provide data sourcing, dataset development, and research on various topics	2022-2023
HVD-22-01	Enhancing RIWI Data with a Workflow-based Spatiotemporal Exploration Portal RIWI data refers to the online real-time survey data of more than 200 million people in more than 200 countries around the world based on RIWI’s unique Random Domain Intercept Technology As a global perception data capturing the sentiments of global citizens, it helps to understand citizen behavior, global trend tracking, spatiotemporal economic prediction, etc. This project aims to utilize the workflow-driven features of the KNIME WebPortal product, provide a data exploration platform based on RIWI data, and provide customized data services for different users such as researchers, RIWI customers, and the public. The two-year project is developing a spatiotemporal visualization platform for real-time RIWI data. Data Mining on spatiotemporal features. Exploring platform integrating RIWI data with data from other sources such as Geotweet, remote sensing, and government’s statistical data. Creating a customized data analysis web portal for different types of web products users; Developing a few specific global perception indices. The proposal builds on the KNIME workbench project (Hvd-21-07) and the Spatial Data Lab project(HVD-21-01), focusing on developing spatiotemporal analysis methods and applications and adding value to RIWI’s perception data to provide insights to global events.	2022-2023
GMU-22-01	Integration and Applications of Geo-JPSS Flood DetectionFlood detection software has been developed to generate near real-time flood products from VIIRS imagery. SNPP/JPSS VIIRS data show special advantages in flood detection. The major activities and accomplishments specific objectives in the reporting period is Flooding application. The plan for next reporting period is to 1) improvement current flood product, 2) develop 3-D flood parameters: flood water surface level, flood water depth, high resolution flood maps and 3) Further analysis on regional flood patterns.	2022-2023
HVD-22-04	Enabling Geotweet Research Using High-Performance Computing In recent years, there has been an ever-increasing share of studies using social media data to investigate the impact on human well-being of ongoing global problems, such as Climate Change or COVID-19. Access to geo-coded social media information offers a unique medium to explore multiple behavioral responses of individuals to changes in their natural or political environment. Harvard CGA maintains a Geotweet Archive, a global record of tweets spanning time, geography, and language. The archive consists of about 10 billion tweets from 2010 to the present and is updated hourly. Over time, this dataset has been proven beneficial for geospatial research and research in other fields, including Social Sciences, Public Health, Business, and Humanitarian studies. We propose this project to enable researchers and the wider society to benefit from our archive. This two-year project includes five key activities: (1) Development of a platform for query, analysis, and visualization of the latest 1 billion Geotweets; (2) Enrichment and analysis of the entire Geotweet archive; (3) Enabling extraction across the entire Geotweet archive; (4) Extractions from Twitter Powertrack API; and (5) Prototype a new type of distributed archive based on IPFS. This proposal builds on the achievements of a previous CGA project funded by the Sloan Foundation to create the Billion Object Platform (BOP), a prototype spatiotemporal visualization platform for Geotweets. The key goal is to make our comprehensive collection of geo-located tweets available to the broader research community and lower barriers for scholars who wish to access this dataset.	2022-2023
HVD-22-01	Enhancing RIWI Data with a Workflow-based Spatiotemporal Exploration Portal RIWI data refers to the online real-time survey data of more than 200 million people in more than 200 countries around the world based on RIWI’s unique Random Domain Intercept Technology As a global perception data capturing the sentiments of global citizens, it helps to understand citizen behavior, global trend tracking, spatiotemporal economic prediction, etc.This project aims to utilize the workflow-driven features of the KNIME WebPortal product, provide a data exploration platform based on RIWI data, and provide customized data services for different users such as researchers, RIWI customers, and the public. The two-year project is developing a spatiotemporal visualization platform for real-time RIWI data. Data Mining on spatiotemporal features. Exploring platform integrating RIWI data with data from other sources such as Geotweet, remote sensing, and government’s statistical data. Creating a customized data analysis web portal for different types of web products users; Developing a few specific global perception indices. The proposal builds on the KNIME workbench project (Hvd-21-07) and the Spatial Data Lab project (HVD-21-01), focusing on developing spatiotemporal analysis methods and applications and adding value to RIWI’s perception data to provide insights to global events.	2022-2023
STC-22-01	Spatiotemporal Innovation Testbed With the advancement of Artificial Intelligence (AI) technologies and accumulation of big Earth data, Deep Learning (DL) has become an important method to discover patterns and understand Earth science processes in the past several years. While successful in many Earth science areas, AI/DL applications are often challenging for computing devices. In recent years, Graphics Processing Unit (GPU) devices have been leveraged to speed up AI/DL applications, yet computational performance still poses a major barrier for DL-based Earth science applications. To address these computational challenges, we selected five existing sample Earth science AI applications, revised the DL-based models/algorithms, and tested the performance of multiple GPU-supported computing platforms to support the applications. Application software packages, performance comparisons across different platforms, along with other results, are summarized. This project can help understand how various AI/ML Earth science applications can be supported by GPU computing and help researchers in the Earth science domain better adopt GPU computing (such as supermicro, GPU clusters, and cloud computing-based) for their AI/ML applications, and to optimize their science applications to better leverage the computing device.	2022-2023
GMU-22-03	Automating Training Datasets Labeling for Image Applications Building large image datasets with ground truth labels is labor-intensive and costly. Previously, for labeling datasets, a single group of people must make the datasets for their specific needs with limited classes. This project aims to develop an image-based training dataset labeling tool for ML models to support multidisciplinary discoveries. The project will leverage web-based technologies to create image-labeling tools through which multiple users collaborate and build large datasets. Usually, satellite-based or aerial-based imageries have thousands of pixels, making them challenging to the label on a web platform. Therefore, the labeling tool will have a cropping functionality to crop the images into a default size of 256×256 pixels. Then the cropped images are subjected to a semi-automatic watershed segmentation in which the images are segmented into multiple polygons. Additionally, the labeling tool offers functionality to modify the segment parameters with which the user can refine the segmentation. Finally, the platform will have an image annotation functionality to label each segment based on the classification schema.	2022-2023
Education	Education Projects (2yr Associate Degree Students START training and ASSIP for High School Students) The START program is in response to the NSF DCL START. The NSF Spatiotemporal I/UCRC received START supplement to provide training to selected 2-year associate students from Valdosta State University and Northern Virginia Community College. This project is for promoting the two-year Institution of Higher Education (2-yr IHE) student training in cutting-edge, spatiotemporal computing membership projects. Faculty members and 10+ Ph.D. students led by the center director Prof. Chaowei Yang and education coordinator Prof. Wenying Ji, as well as 2-yr IHE lead Prof. Jia Lu and Prof. Yinjing Cui. 18 students are engaged. The ASSIP project engages 5-10 high school students to conduct research in the center every summer.	2022-2023
STC-19-00	Advancing Spatiotemporal Studies to Enable 21st Century Sciences and Applications Many 21st century challenges to contemporary society, such as natural disasters, happen in both space and time, and require spatiotemporal principles and thinking be incorporated into computing process. A systematic investigation of the principles would advance human knowledge by providing trailblazing methodologies to explore the next generation computing models for addressing the challenges. This overarching center project is to analyze and collaborate with international leaders, and the science advisory committee to generalize spatiotemporal thinking methodologies, produce efficient computing software and tools, elevate the application impact, and advance human knowledge and intelligence. Objectives are to: a) build spatiotemporal infrastructure from theoretical, technological and application aspects, b) innovate the spatiotemporal studies with new tools, systems, and applications, c) educate K-16 and graduate students with proper knowledge, d) develop a community for spatiotemporal studies from center sites and members at regional, national to global levels through IAB meetings, symposium, and other venues. This project also coordinates the operation of the spatiotemporal I/UCRC as well as outreach to collaborators and potential partners. Statistical summary of the center operation is projected on a semi-annual basis including members, funding, projects, personnel, education engagement and other outreach activities.	2019-2022
HVD-21-10	Assessing and Enhancing Values and Lifespan of Geospatial Datasets for Global Humanitarian Research Geospatial data are the backbone of transdisciplinary research and critical to geospatial analysis efforts. The COVID-19 pandemic exposed the importance of geospatial data at the local level to track impacts of the virus and explore linkages of causal relationships between health and policy. The plethora of geospatial data developed from the COVID-19 pandemic such as data dashboards and unique datasets tracking COVID-19-related outcomes are in danger of disappearing if strategies for maintenance and archiving are not developed. We propose to use two Department of State-sponsored projects to examine this issue: the Secondary Cities Initiative (2C) and the Cities’ COVID-19 Mapping Mitigation Program (C2M2). Both projects aim at the generation of geospatial data at the local level in urban areas in low- and middle-income countries. A key goal of these projects was to ensure accessibility and adopt open data strategies to ensure long-term sharing of these data.	2021-2022
GMU-21-06	USDA Climate Impact on Agricultural Output The climate change induced increase of temperature and change in precipitation extremes have significant impacts on agricultural production thus threaten the food security in the US. A comprehensive analysis of the climate change influences on crop yield is essential for decision making and citizen’s life. The objective of this project is to identify and understand how U.S. historical crop production and agricultural productivity were affected by climate variability and climate change. In addition to using ERS national and state-level TFP data, we will collect the annual records of county-level crop acreage and yield for all major crops from USDA survey statistics to conduct a spatiotemporal multifactor analysis of their variations in corresponding to local climate conditions. We will also consider other factors, such as domestic needs for food/feed/fiber commodity, international trades, commodity prices, and government policies in the modeling. We will use machine learning approaches, especially the “structural equation model” framework, to identify the significant signals and underlying mechanisms linking U.S. crop productions and TFP factors to weather conditions (mean, extremes, anomalies) in specific periods and regions.	2021-2022
HVD-21-04	Development of High-Performance System for Collection and Processing of data from Sina-Weibo Social media platforms have made available vast quantities of digital text, providing researchers unique data to investigate human interactions, communication, and well-being. Motivated by this, Harvard Center for Geographic Analysis (CGA) maintains the Geotweet Archive, a global collection of billions of tweets from 2010-present. However, this archive has little information about China because Twitter is not accessible in the country. This creates a significant spatial gap for researchers who are trying to study a global phenomenon. To fill this gap, we need an archive of data from Sina-Weibo, the second largest social media platform in China. Due to its large sample size Weibo has the powerful ability to study and track sentiment, behaviors, and communications within the Chinese socio-cultural context. Therefore, Harvard CGA and Sustainable Urbanization Lab (SUL) at MIT are collaborating to jointly build a high-performance system for collection, processing and analysis of data from Sina-Weibo.	2021-2022
HVD-21-05	Global Urban Impulse Index: Monitoring Human Mobility with Internet and Social Media Data Human mobility plays an important role in understanding global socio-economic networks, epidemic control, and climate change in the context of global urbanization. Wherein, social media data has become a timely and massive data source for characterizing human flows, widely used in research on various topics. However, there are still difficulties for the public to obtain continuous, instant, and comprehensive social perception analysis based on social media data, which may prevent government agencies from timely decision-making and global cooperation. This project plans to build a set of global city impulse indices on the open KNIME workflow platform, using social media dataset archived by Harvard CGA and other open Internet data. These indices consist of the multi-scale intercity connectivity index at the core, and other ancillary indices based on sentiment text mining. The project will greatly facilitate global cooperation on sustainability, crime morphology, and other potential regional issues.	2021-2022
HVD-21-06	Village Level Spatial Prediction of Health Indicators for India Using Machine Learning and Environmental Remote Sensing and Socioeconomic Data Combination Health indicators are metrics for population health and development and can be used as effective tools for relevant policy decision makings. To support precision policy making regarding population health and development, village level data science analysis is needed which provides the highest administrative resolution therefore being able to reveal the ultimate details of the spatial patterns of public health and development conditions. This project aims to improve spatial predictions of health indicators at village level to support precision policy making regarding population health and development and implementation planning of the UN SDGs related to public health. While we will take India as our study area and child stunting, underweight, and wasting as our case health indicators, the outputs from this study will be expandable to other developing countries and other health indicators.	2021-2022
HVD-21-07	Developing Workbenches for Spatial Data Science This project will explore methodologies and establish protocols for developing workbenches for spatial data science research and teaching. Using Knime, a freeware developed by a German based company, the project will conduct experiments on Workbench development with peer-reviewed case studies, producing at least 60 added nodes for spatial statistics, modeling and visualization, one Workbook for Quantitative Methods and Socio-Economic Applications, 30 replicable, reproducible and expandable workflow based case studies for spatial data science, business applications, and spatial social sciences, 20 online webinars and onsite training workshops for workflow based data analysis with Knime, the User Guide for the Workbench, and 4-6 peer-review publications. Results of this project will provide a consistent and compatible platform for spatial data analysis programs developed in R, Python, and JAVA on different computing environment, promote a new generation of workflow data analysis as well as their applications for teaching and research across different disciplines.	2021-2022
HVD-21-08	Mapping of Secondary and Tertiary Boundaries Over Time Currently there is no open, authoritative global source for primary and secondary administrative boundaries. While some countries provide access to current boundaries, most do not. Fewer support the ability to see how a given boundary has changed over time. This situation holds true despite the fact small changes in administrative boundaries can have huge impacts on people and their livelihoods. To address this deficiency and design a system for storing and updating such boundaries, we will take a survey of existing efforts to create global boundary datasets and evaluate the strengths and weaknesses of each. Then, to understand the requirements for handling the historic dimension, we will create a historic district boundary dataset for India. Finally, based on the lessons learned from past efforts, and our experience building a historical dataset for one country, we will design a platform to support public access to global boundaries, and their evolution over time.	2021-2022
HVD-21-09	Predicting Human Insecurity with Multi-faceted Spatiotemporal Factors Human security extends beyond the provision of core human needs and protection from acute harm to the creation of supports for home, community, and a sense of hope in the future that contribute to population stability and sustainable development. The range of threats that contribute to human insecurity are multi-faceted and complex, and underscore the need for multi-disciplinary approaches to addressing policy and programmatic strategies for local and regional contexts across the spectrum of the disaster cycle. This preliminary proposal explores four possible research areas, including 1) climate, conflict and migration prediction; 2) atrocity prevention via early warning and early action; 3) spatial vulnerability and climate predictive models for disaster preparedness; and 4) COVID 19 and conflict. All involve integrating spatiotemporal climate change, conflict, demographic, infrastructural, socio-economic and resource availability data, as well as quantifiable perception and behavioral data into predictive models.	2021-2022
GMU-21-01	Improving ground-level air quality prediction by integrating spatiotemporal new observation system datasets and numerical simulations The objective of this project is to leverage our advanced cyberinfrastructure (CI) projects, such as EarthCube conceptual design, cloud computing and big data innovations, and big Earth data analytics, to produce an agile, flexible, and sustainable architecture for supporting efficient big spatiotemporal data ingesting and integration. An inter-disciplinary interoperable model will be refined from our past investigations funded by NSF/NASA on WRF for dust storms, WRF-Chem, CMAQ for NO2, voxel-based cellular automata simulation and Earth science research on ground-level data collection and integration. Our major tasks includes 1. data preprocessing and spatiotemporal collocation; 2. ML-based data preprocessing and downscaling; 3. AQ model simulation; 4. post processing; 5. evaluation and testing Example: AQ database, prediction and monitoring systems of LA city Expected results: 1) a robust, high-fidelity ground level AQ dataset for geoscience research of both atmospheric and Earth science divisions; 2) an integrated and re-interfaced advanced cyberinfrastructure for fusing and collocating spatiotemporal AQ data from satellite, airborne, ground and in-situ observations; 3) an improved high-resolution AQ model to facilitate metropolitan area forecasting;	2021-2022
GMU-21-05	PM 2.5 retrieval and spatiotemporal downscaling using earth observation data This project objective is to develop an innovative methodology to retrieve the PM2.5 in global scale and further downscale the spatiotemporal resolution to 1 km and hourly level in some key regions, using artificial intelligence (AI) models. Our major task includes deep learning for PM 2.5 retrieval using satellite remote sensing, model simulation and ground observation; 2. Deep learning for PM2.5 prediction and downscaling using meteorological data with AOD spatial pattern. Example: AQ database, prediction and monitoring systems of LA city Expected results: 1. PM 2.5 estimation covering global scale; 2. hourly 1km1km (500m500m) PM2.5 for LA region	2021-2022
STC-21-01	Expand campus reopen to a school system by considering population density and human dynamics The objective of this project is to expand current school/campus reopen decision support system to accommodate county-based school system reopen decision support Our major task includes expanding current school reopen model to predict/simulate COVID-19 cases trajectories for multiple schools in a county under specific control strategies with population density and human dynamics dataset as input Examples: school system simulation in Fairfax County, VA Expected results: An operational web service to assist county-based school system reopen decision support during COVID-19 pandemic. Potentially extend to simulate stadium for sports, industry or research campus like Goddard for using by chief medical officer or related decision makers	2021-2022
GMU-21-02	Spatiotemporal Open-Source workflow with COVID-19 and Cloud Classification as an example In recent times, Deep Learning (DL) has become an important tool to discover patterns and predict earth science processes. In most cases, open-source code for the DL models is shared to perform research. While it is easy for subject matter experts or tech-savvy to quickly set up the computing environment and replicate the DL research, non-programmers or beginners find it difficult to utilize open-source code. To mend this gap, this project aims to develop a formalized process and workflow to effectively publicize and share DL research for Earth System applications so that people from any background can effectively replicate, reproduce, and reobtain the results. The open-source workflow primarily consists of three major phases: (i) open-source software development, (ii) sharing and maintenance, and (iii) reproducible research. Recently, we publicized the rainy cloud detection deep learning model. The open-source activities for rainy cloud detection applications include (i) testing the cloud classification DL model in a various computing platform that supports CPU, single-GPU, and multi-GPU with Windows and Ubuntu OS, (ii) documenting the steps to reproduce the research and creating a tutorial video (ii), sharing the deep learning model, training datasets, user guide, tutorial video, and interpretation of the model results with the community.	2021-2022
GMU-21-03	Using Machine Learning Methods to Improve the Categorization and Answers of Health Questions This project objective includes developing an automatic question answer system for questions regarding health data, information and knowledge with better query understanding, ranking, and recommendation. Our major task includes collection and Index Health and Human Services Experts Knowledge from historical database; Build an HHS knowledge base; Implement a question understanding tool; Build a smart search engine for user queries Example: An organization may (the United States Department of Health & Human Services) receive 1,000s to 100,000s of related questions from the public on a daily basis Expected results: A health-specific spatiotemporal question answer portal	2021-2022
GMU-21-04	Developing Cloud-based Image Classification Management and Processing Service for High Spatial Resolution Sea Ice Imagery This project objective includes developing and maintaining a high-performance image classification service to GPU enable rapid sea ice processing for climate and cryosphere research Our major task includes expand the current framework to enable multi-classification algorithms to be run on GPU cluster and reducing overfitting under lighting contexts, misclassification between thick/thin ice (with ATM elevation data), and deploy to an operational facility Example: arctic sea ice and NASA cryosphere research, climate change and natural hazards Expected results: an operational high resolution image classification system for earth science applications	2021-2022
HVD-21-01	Enabling replicable spatiotemporal research with virtual spatial data lab This project is a continuing effort based on achievements of the Spatial Data Lab project. It is designed to provide a new generation of data services with cutting-edge methodology and technology for reproducible, replicable, and generalizable spatiotemporal research. It will allow researchers to develop case studies with easy-to-use workflow tools and share the case study as a package with others. The project will also support case-based training and teaching programs for multi-disciplinary and inter-disciplinary research in the applications of public health, economics, urban planning, social science, and others. In detail, this project will expand Spatial Data Lab’s capabilities and collaborate with various academic and business partners on the following missions. 1. Promote Spatial Data Services. Collect and integrate more datasets from partners and various sources and provide standard data services for data access, integration and sharing.    2. Tools Development for Spatial Data Analysis. Enrich current workflow platform to build spatial data analysis tools, such as hotspot analysis, spatial correlation analysis, geographical regression modelling, and spatiotemporal modelling.  Workflow based Spatial Data Case Studies. 3. Develop easy-to-use workflows to lower the barrier for spatiotemporal data analysis and build a case study repository for reproducible, replicable and generalizable research.  4. Training Programs for Spatial Data Science. Collaborate with partners to organize a series of training programs on different spatial data science topics, such as urban development, public health, human movement, and environment.	2021-2022
HVD-21-02	Assessing household preparedness for Covid-19 in Bangladesh This project assesses household preparedness for COVID-19 in Bangladesh with special attention to district-level inequalities. The definition of COVID-19 prepared households is based on guidelines from WHO. A household is considered as prepared for COVID-19 when it meets the five conditions: (1) adequate space for quarantine, (2) adequate sanitation, (3) soap and water available for handwashing, (4) phone available for communication, and (5) regular exposure to mass media. The main data source is the 2019 Multiple Indicator Cluster Surveys (MICS) for Bangladesh. The study investigates the association between the district-level prevalence of COVID-19 and household preparation and identify those district-level factors (e.g. population density, economic development, health system performance) that are associated with household preparation for COVID-19. Findings from this study will provide policy makers in Bangladesh and other stakeholders with solid evidence for improving the situation in those households with poor preparedness for COVID-19.	2021-2022
Hvd-21-03	Developing the International Geospatial Health Research Network (IGHRN) The concept of the International Geospatial Health Research Network (IGHRN) has prompted a series of high-level workshops and symposia on geography and international health research in recent years. With a focus on Fostering International Geospatial Health Research Collaborations, leading GIScience and health researchers from North America, Asia, Europe, Africa and Latin America identified an interim Steering Committee to develop and sustain an operational IGHRN Network. The IGHRN Secretariat functions and management are jointly handled at the two hub universities, Harvard University and the Chinese University of Hong Kong (CUHK). An International Advisory Committee comprising leading geospatial and health researchers from around the world is also being developed. The International Geospatial Health Research Network aims to share new international research and data, help develop geospatial health methods, and support new technologies to foster international collaborations and synergies across borders, and to bridge the gap between GIScience health research and the needs of health practitioners on the ground. After the COVID-19 pandemic, it is now clear that the IGHRN is much needed, and that an expanded IGHRN has never been needed more than it is now. With that in mind, the IGHRN Steering Committee has recently begun to restructure the IGHRN, with multiple university and organizational affiliates involved. We welcome the engagement, ideas, participation, and funding of the International Geospatial Health Research Network by the NSF, NIH, non-governmental organizations, foundations, and private-sector geospatial and health tech companies, as we develop and expand the IGHRN.	2021-2022
STC-20-01	Innovating a computing infrastructure for spatiotemporal studies This project objective includes developing an innovative methodology to retrieve the PM2.5 in global scale and further downscale the spatiotemporal resolution to 1 km and hourly level in some key regions, using artificial intelligence (AI) models. Our major task includes deep learning for PM 2.5 retrieval using satellite remote sensing, model simulation and ground observation; 2. Deep learning for PM2.5 prediction and downscaling using meteorological data with AOD spatial pattern. Example: AQ database, prediction and monitoring systems of LA city Expected results: 1. PM 2.5 estimation covering global scale; 2. hourly 1km1km (500m500m) PM2.5 for LA region	2020-2024
GMU-20-04	Using Machine Learning Methods to Improve the Categorization of Health Questions Currently, there is not any geospatial dataset that tracks the growth or deterioration of roads and railways worldwide. This has been a major obstacle in assessing the effectiveness of transportation development projects, which is important for developing countries to make informed decisions on these expensive investments. We propose to develop an active learning-based platform to generate such data for the benefit of a broad research community with interest in transportation evaluation. This system will facilitate human annotators to map roads and railways using historical and up-to-date high-resolution satellite imagery. It has three building blocks: First, combine pixel-wise segmentation-based and graph-based neural networks to generate proposed roads connections based on existing labels from Open Street Map; Second, enable annotators to accept or edit correctly predicted roads, reject false positives; Third, internalize the inputs from annotators and retrain the model with the new data, which will reinforce the model to make better predictions over time. As a proof of concept, our first application of the system is analyzing the impact of the Belt and Road Initiative (BRI), which embodies unprecedented transportation upgrade and construction projects in Asia in the past decade, on economic development in related countries. Applying recent developments in remote sensing to satellite imagery before and after BRI projects were undertaken, we will link the extracted road and rail networks with the detected expansion of urban areas detected from a larger set of daytime and nighttime imagery, and estimate the impact of BRI investments on the spatial distribution of economic activity.	2020-2021
HVD-20-04	Developing an active-learning based platform to evaluate the Impacts of China’s Belt-Road Initiatives using high-resolution satellite imagery Currently, there is not any geospatial dataset that tracks the growth or deterioration of roads and railways worldwide. This has been a major obstacle in assessing the effectiveness of transportation development projects, which is important for developing countries to make informed decisions on these expensive investments. We propose to develop an active learning-based platform to generate such data for the benefit of a broad research community with interest in transportation evaluation. This system will facilitate human annotators to map roads and railways using historical and up-to-date high-resolution satellite imagery. It has three building blocks: First, combine pixel-wise segmentation-based and graph-based neural networks to generate proposed roads connections based on existing labels from Open Street Map; Second, enable annotators to accept or edit correctly predicted roads, reject false positives; Third, internalize the inputs from annotators and retrain the model with the new data, which will reinforce the model to make better predictions over time. As a proof of concept, our first application of the system is analyzing the impact of the Belt and Road Initiative (BRI), which embodies unprecedented transportation upgrade and construction projects in Asia in the past decade, on economic development in related countries. Applying recent developments in remote sensing to satellite imagery before and after BRI projects were undertaken, we will link the extracted road and rail networks with the detected expansion of urban areas detected from a larger set of daytime and nighttime imagery, and estimate the impact of BRI investments on the spatial distribution of economic activity.	2020-2021
HVD-20-03	Historical Forest Changes Detection Using Satellite Imagery and Google Earth Engine In the southeastern US region, over 90% of forests are privately-owned and managed. To achieve sustainable timber production from these forest lands, understanding the forest change history and continuous monitoring of forest land, including harvest and replantation, are essential. The objective of this project is to build a software solution that uses the 35-year history of satellite imagery from Google Earth Engine to recreate the silvicultural history of timberland in the southeastern US region. Specifically, this pilot project takes Union County, South Carolina as the pilot study area, tests the effectiveness of methodologies based on time series satellite data on Google Earth Engine for identifying hardwood, natural or planted pine, and mixed hardwood/pine forest on satellite imagery; and detecting their silvicultural history such as clear-cuts, natural-growth or replanted, age and height. The output may support economic acquisition of under-managed and privately owned timberland in the Southeast United States and sustainably manage the timberlands.	2020-2021
STC-20-02	Spatiotemporal Analytics of COVID-19’s Second-Order Impacts on Global Vulnerable Urban Areas This project addresses the role of spatiotemporal data, including open data, upon the understanding and mitigation of impacts from the global COVID-19 pandemic. The research undertaken will focus on possible long-term and second-order impacts of COVID-19 and the responses that have been enacted at multiple scales, from multinational regions to neighborhood-levels. Development backsliding during this pandemic is a high risk for developing countries and rapidly growing cities due to new migration patterns, a collapse of informal economies, lacking supplies, disparate basic services and health sites, and overcrowded informal settlements. A key goal of this project is to facilitate discussion and conduct research and reporting to inform participatory mapping and open data creation taking place in developing countries to mitigate COVID second order impacts.	2020-2021
GMU-20-03	Spatiotemporal Analysis of Medical Resource Deficiencies under COVID-19 Pandemic The COVID-19 pandemic swept the entire world in the past 5 months and the U.S. became the epic center with the most confirmed cases. Although many states are reopening for the economy, the risk is still high and there are many debates about resurgence of the outbreak and high pressure on the medical system with immature openings. Sufficient medical equipment and health care professionals are critical to save lives and better prepare our communities. Accurately assessment and predication of the medical resource demands are important to avoid over committed (e.g., NYC didn’t use many resources asked for) and under committed (e.g., there’s only a few ICU beds in the Alabama capital). We propose to develop a timely assessment system of medical resource demands based on current confirmed cases and hospitalized patients as well as ML/AI based prediction. This system (shown in the following figure) will be based on our current spatiotemporal distribution and demands of medical resources in USA at county level for COVID-19 pandemic. The system dashboard supports monitoring, analyzing, visualizing, and sharing the medical resource and analyzed results. The medical resource includes county-based summary of licensed beds and ICU beds from hospital and medical agencies, and medical stuff, specifically critical care stuff for COVID-19 treatment. Integrated and analyzed with dynamic active cases, the medical resource dynamic index is created and calculated in real time to show the medical resource deficiencies in the U.S. under COVID-19 Pandemic.	2020-2021
GMU-20-01	Improving the Air Quality in the Urban Setting Climate change and pollutant emissions continue to worsen our breathing air, which is killing 7 million people every year according to World Health Organization. 1/3 of deaths from stroke, lung cancer and heart disease are due to air pollution (https://www.who.int/airpollution/news-and-events/how-air-pollution-is-destroying-our-health). Based on EPA observations, Climate Central found that 40 U.S. cities had at least 20 unhealthy air days since 2015, many of them experienced an uptick in unhealthy air days in recent years. For example, over 100 accumulated days with unhealthy air quality (AQ) in the past two decades are observed in Los Angeles. Timely forecasting air pollution and disseminating the results to citizens would help save lives and improve their health. It has been a long time gap dreamed to be filled by atmospheric scientists and urban managers. Fortunately, the increasingly available low-cost sensors, Internet of Things, and satellite observations start to provide new Earth observation system for feeding into numerical simulations to enhance the reliability and accuracy of AQ prediction. The emergence of 5G mobile technologies also brings enormous benefits to AQ observation with higher data transmission speed and more connected networks. However, it is critical and challenging to integrate spatiotemporally heterogeneous observation data with the numerical AQ prediction. Using Los Angles as an example, we propose to fuse a variety of geoscience observations from satellite to the ground-based Internet of Things, feed into numerical methods-based AQ simulations models, and output the results to be validated by and disseminated to academic geoscientists and citizens.	2020-2021
GMU-20-02	Agent-based multi-scale COVID-19 outbreak simulation The outbreak of the Coronavirus disease 2019 (COVID-19) is becoming a globally pandemic, which affects deeply in daily life of people in China, Spain, Italy, the U.S. and many other countries across the world. Many effective policies and strategies has been made to slow down the spreading of COVID-19 for different areas around the world, which could potentially be considered as guidance to prevent the possible outbreaks of places and counties that have not been serious effected by the virus yet. However, the question will be, how to identify the possible outbreak places with existing observation-based evidence? Agent-based models (ABMs) are widely used media to be applied as standalone simulator or be integrated with models in related disciplines to help strengthen existing studies including the field of infectious disease epidemiology. In previous epidemiological studies, ABMs have been adopted to simulated and predicted the effectiveness containment strategies under different policies, the time and space of outbreaks, medical resource deficiencies, and impact on logistics systems. In our study, we propose to develop a comprehensive ABM-based simulator coupled with multivariate impact factors such as spatiotemporal distribution of coronavirus, human migration and activities, climate conditions and environmental factors, containment strategies and policies to reveal and predict the pandemic pattern of COVID-19 at different scales such as county-level, state-level, nation-wide and even global scale. We will then apply the simulation model to describe the multi-scale COVID-19 outbreak patterns with certain confidence level as well as predict the possibility of outbreak in some places like India and South Africa, which may help prevent the pandemic and save lives in those no outbreak areas.	2020-2021
Hvd-20-01	Cloud-based Large-scale High-resolution Mangrove Forests Mapping with Satellite Big Data and Machine Learning Mangrove forests make up one of the most productive ecosystems on the planet, providing a variety of goods and services from which we benefit. In addition, mangrove forests have the capability to sequester four times more carbon dioxide than upland forests, mitigate the impacts of natural hazards on coastal communities, and support biodiversity conservation. However, they are being destroyed at an alarming rate by human activities such as aquaculture, agriculture, and coastal development. To characterize mangrove forest changes, evaluate their impacts, and support relevant protection and restoration decision makings by government agencies and NGOs, accurate and up-to-date mangrove forests mapping at large spatial scales is essential. Available large-scale mangrove forest data products were created commonly with 30 m Landsat imagery, and significant inconsistencies remain among these data products. With high resolution satellite data (e.g., Sentinel-1 and Sentinel-2) open to the public, the availability of high performance cloud computing, and the recent progresses in machine learning, it has become feasible to map coastal mangrove forests at large spatial scales with better resolution, accuracy, and frequency. The objective of this proposed project is to develop a methodology that can be used for generating 10 m mangrove forest spatial distribution data products for any region across the globe annually, therefore providing the most accurate information about the spatial temporal changes of mangrove forests and effectively supporting mangrove ecosystem protection and restoration efforts. Our approach is to combine satellite big data processing on cloud platform (e.g., Google Earth Engine) and machine learning algorithms (e.g., Neural Network, Random Forest) based on the knowledge gained from our NASA project. Study areas will be selected from different regions of the globe, accuracy will be assessed quantitatively, and mangrove forest maps will be compared with existing mangrove data products.	2020-2021
Hvd-19-01	CDL: Developing an online spatial data sharing and management platform This project is to develop an online platform for the creation, management and sharing of spatiotemporal data, analytical tools and study cases, nicknamed Spatial Data Lab (SDL). Currently, Harvard’s Dataverse and WorldMap have been integrated with the SDL platform. Data-driven analytical workflows are sharable and accessible from Harvard Dataverse with encrypted links to the SDL platform. This year the project takes COVID-19 as one of the case studies. The team has been actively building resource repositories for COVID-19 research since January. We are providing standardized datasets, executable workflows and training materials on the SDL platform for collaborating researchers to easily and quickly conduct research, enhance methodology, publish results, and deliver education, on COVID-19 related research topics.	2019-2020
Hvd-19-02	Building a geospatial differential privacy server for shared mobility data This proposal is to build differential privacy into ride share data to allow government analysts to make useful queries on telemetry data to learn generalized patterns, while enabling individual level location data to obtain the strong privacy guarantees of differential privacy such that no re-identification attack is possible. Moreover, the worst-case amount of individual information possible to be leaked by any published results can be precisely and formally measured, so that cumulative privacy loss across all access to the system can be monitored.	2019-2020
Hvd-19-03	Using Internet Remote-Sensing to Estimate High-Precision Connectivity Statistics The mass adoption of the Internet has boosted the demand for scientific explanations about the effects of digitalization. What is the impact of social media on elections and polarization? What is the effect of digital technologies in economic growth, inequality or unemployment? How is public health affected by increased access to medical websites? Official statistics typically provide a country-year resolution, but researchers need more precision in order to take into account variation inside countries such as urban versus rural areas, and also shorter-term effects of seasonal dynamics and shocking events. In addition, researchers working with highly precise Internet data need to address the challenges introduced by privacy legislation as well. The Internet Connectivity Statistics Dataverse is the most precise dataset of Internet connections for scientific research available and it contributes to overcome both the precision and the privacy related challenges. First, we analyze the global traffic of the Internet using remote sensing to estimate connectivity by months, and down to city resolution. As we rely on direct observation of the Internet, we can get estimates also in areas where official statistics are not available, and data cannot be retrieved, such as in the case of authoritarian regimes or territories experiencing political violence. Second, we estimate connectivity statistics using differential privacy algorithms, and we test the accuracy of our estimates. Finally, we make the statistics available for the entire research community thanks to the Harvard University Dataverse, the most prominent research data sharing software, maintained by the Institute for Quantitative Social Science.	2019-2020
Hvd-19-04	Scaling K-nearest Neighbor Calculations using Geohash Spatial Clustering, Indexed Search, and Compression We propose to develop a practical, cost-effective, easy to use platform to perform fast geospatial k-means clustering on big geospatial datasets. The system makes use of mutually reinforcing optimization techniques: geohashing, disk clustering, index based searches, and data compression, to build a novel system that makes the normally slow and resource intensive process of spatial clustering faster and less expensive than alternatives. In tests using an input dataset of 180 million point features where K=1000, we achieved average throughput of 200,000 distance calculations per second, to generate 180 billion measurements on a medium-sized Amazon instance. To make system easy for any analyst to implement we offer the option of an Amazon AMI deliverable which replicates the entire computation environment and comes with all required libraries preinstalled and configured. Once the AMI is launched, the system is ready for data loading, and all calculation work including compression and storage of results, is handled automatically.	2019-2020
Hvd-19-05	Leveraging Geovisual Analytics and Digital Epidemiological Methods for Emerging Outbreaks of Infectious Diseases after Natural Disasters in Developing Regions Infectious disease outbreaks triggered by natural disasters (e.g. floods, earthquakes) pose great challenges to disease surveillance, especially in developing regions because of the loss of homes, displacement of population, damaged health infrastructure, and long reporting delays. Digital epidemiological methods have emerged in the last decade as a complementary alternative to provide near real-time disease activity estimates, in the absence of timely and accurate reporting from traditional healthcare-based surveillance systems. Most digital epidemiology efforts to date have focused on the computational modeling challenges of tracking diseases, and only a few of these have investigated the evident potential that would emerge from involving humans into the analytical and decision-making processes that emerge from the use and interpretation of these digital epidemiological methods. Here we aim at developing a human-centered real-time disease surveillance system with the goal of improving the surveillance and response to emerging outbreaks of infectious diseases caused by natural disasters in developing regions. We plan on focusing on the recent cholera outbreaks that have emerged after the landing of cyclones Idai and Kenneth in southeastern African nations, such as Mozambique.	2019-2020
Hvd-19-06	Elevating Research Excellence with Data Repository and AI Ecosystem Dataverse is an open source data repository platform where users can share, preserve, cite, and explore research data. RMDS Lab is a startup company developing transformative technologies for research with big data and AI. This project is to establish a collaboration between the two teams and two platforms to create synergy that will advance the shared goal of supporting worldwide scholars in data-driven research. The main objective of this project is to explore solutions to apply AI technology in evaluating data science studies, provide measurable references for data scientists on the accuracy, impactfulness, replicability, applicability, and other merit scores of data science study cases; and to promote high-quality data science research through platform development, data sharing, community building, and user training. The Coronavirus crisis has strengthened the collaboration between the two organizations expanding the project to use datasets from not only the Harvard Dataverse but also the over 60 Dataverse installations worldwide.	2019-2020
GMU-19-01	Cloud classification Cloud types, coverage and distribution have significant influence on the characteristic and dynamic of global climate. They are directly related to the energy balance of the earth. Therefore, accurate cloud classification and analysis are essential for the research of atmosphere and climate change. Cloud classification assigns a predetermined label to cloud in the image, e.g., cirrus, altostratus and altocumulus. With cloud segmentation, satellite imagery can be utilized to support a series of local mesoscale climate analysis like rainy cloud detection, cyclone detection, or extreme weather event (e.g. heavy rainfall) predictions. However, it is a challenging task to distinguish different clouds from satellite imagery because of intraclass spectral variations and interclass spectral similarities. Traditionally, cloud types are classified using selected features and threshold such as cloud-top pressure (CTP), cloud optical thickness (COT), brightness temperature (BT) and multilayer flag (MLF). One drawback is that the model accuracy heavily relies on threshold and feature selection. The past years have witnessed the successful deep learning applications in automatically feature selection for object detection from images with the aid of CNN model and its variants such as VGGNet, ResNet. Inspired by successful applications of deep learning in computer vision, we propose to implement an automatic cloud classification system based on deep neural network to identify the 8 kinds of cloud from geostationary and polar orbit satellite data, with cloud types from 2B-CLDCLASS product of CloudSat-CPR as the reference of label.	2019-2020
GMU-19-02	Big data analytics for space situational awareness Space situational awareness (SSA) learns current and predictive knowledge of space events, threats, activities, conditions and space system (space, ground, link) status capabilities, constraints and employment. With data collected from telescopes, satellites and other sources, thousands of space objects are tracked, cataloged, and maintained, however, big observation data need to be collected constantly to distill such kinds of knowledge, which pose grant challenges to data management system. The goal of this project is to develop a big space observation data analytical platform to better assist space situational awareness. The distributed storage layer supports storage and access to space observation data with parallel I/O. The metadata layer will manage metadata and interact with a smart search engine to provide efficient and accurate data discovery functionalities. The analytical layer serves as an efficient and effective tool to mine spatiotemporal patterns, detect and predict events in near-Earth space. Finally, the visualization layer presents the orbit of natural and manmade objects in the near-Earth space. By distilling knowledge from dispersed observation data, this big data analytical platform is expected to advance space situational awareness across government agencies and scientific communities.	2019-2020
GMU-19-03	Planetary Defense Programs like NASA’s Near-Earth Object (NEO) Survey supply the PD community with the necessary information that can be utilized for NEO mitigation. However, information about detecting, characterizing and mitigating NEO threats is still dispersed throughout different organizations and scientists, due to the lack of structured architecture. This project is aimed to develop a knowledge base and engine to provide discovery and easy access to the PD related resources by developing 1) a domain-specific Web crawler to automate the large-scale up-to-date discovery of PD related resource, and 2) a search ranking method to better rank the search results. The Web crawler is based on Apache Nutch, one of the well-recognized highly scalable web crawlers. In this research, Apache Nutch is extended in three aspects: 1) a semi-supervised approach is developed to create PD-related keyword list; 2) an improved similarity scoring function is utilized to set the priority of the web pages in the crawl frontier; and 3) an adaptive approach is designed to re-crawl/update web pages. The search ranking module is built upon Elasticsearch. Rather than using the basic search relevance function of Elasticsearch, a PageRank based link analysis and an LDA based topic modelling approach are developed to better support the ranking of interconnected web pages.	2019-2020
GMU-19-04	Micro-scale Urban Heat Island Spatiotemporal Analytics and Prediction Framework As one of the adverse effects of urbanization and climate change, Urban Heat Island (UHI) can affect human health. Most researches have been relying on remote sensing imagery or sparsely distributed station sensor data and focusing on the broad understanding of the meso- or city- scale UHI phenomenon and mitigation support. However, challenges remain for the micro-level. This project aims to: 1) build an in-depth investigation of the human-weather-climate relations for the urban area; 2) fill the gap between short-term weather impact effects from buildings, traffics, human mobilities, and long-term microclimate from understanding such relations with real-time urban sensing (IoT) data; 3) establish a machine-learning enabled ensemble model for fast near-future temperature forecasts by considering the human-weather-climate relationships; 4) provide guideline for the precautionary local-human-activity management strategy design and implementation according to the forecasts to reduce public health-related risks, allowing better urban living spaces.	2019-2020
GMU-19-05	Why is my training data never good enough? Quantifying training data representativeness for scaling up Convolutional Neural Networks to large geographic areas. With increased availability of affordable, frequent, high resolution satellite imagery there has been a proliferation of machine learning methods, notably convolutional neural networks (CNN’s), for automated image interpretation. Despite this progress, the biggest challenge remains the insatiable demand for more training data that is most often produced by human operators – the same human operators that are already overwhelmed by the large satellite data volumes. The research community is grappling with methods to produce training data that are sufficiently representative of large areas to which they want to scale up their machine learning models. Although much emphasis has been placed on required computing resources and CNN architectures, our research has demonstrated that the structure of the training data is the overriding determinant of model accuracy and regional generalization of CNN classifications. The objective of our research is to explore to relationship between CNN classification accuracy and the representativeness of training data across increasing geographical distance and relate this to CNN feature space. To this end we are conducting experiments with automatically generated training data using ancillary data sets (building footprints available from counties and Open StreetMaps, building counts, high resolution land cover and percentage imperviousness available for the entire Chesapeake bay catchment) and 1m resolution aerial photography data (NAIP). The training data sets and the operational application area will be systematically varied across the MidAtlantic region to simulate diverse scenarios to tease out the underlying relationships. In the experiments CNN’s are applied to image tiles of NAIP imagery (200m X 200m) used in the following use cases (i) classify 1m resolution land cover, (ii) predict percentage imperviousness, predict total building footprint in image tile and (iii) predict number of buildings in image tile. The study will be applied to cities and their surrounding areas (30km buffer) distributed throughout the MidAtlantic region which largely coincides with the Chesapeake bay catchment. The research will address science questions such as, what is relationship between the representativeness (measured as dissimilarity in CNN feature vector) of training data in relation to increasing geographical distance between training and application areas and what is its influence on CNN classification accuracy? Simply put, can a CNN model trained with data from Fairfax (VA) be applied to multiple other cites at increasing distance away (e.g. Harrisburg MD) and how is the accuracy of these classifications related to distances in feature space and geographical space. This will help the community develop reasonable expectations for regional machine learning applications based on high resolution satellite imagery.	2019-2020
STC-15-02	Dynamic Mapping of Secondary Cities Secondary Cities are non-primary cities, characterized by population size, function and/or economic status. They are urban centers of governance, logistics, and production and are often data poor. This project is a global initiative to address critical geospatial data needs of secondary cities. The objective is to enhance emergency preparedness, human security and resilience. The project facilitates partnership with local organizations for data generation and sharing, using open source tools, and focuses on applied geography – human geography thematic areas.	2019-2020
GMU-18-01	Rapid extreme weather events detection and tracking from 4D/5D climate simulations Climate simulations provide valuable information to represent the situations of the atmosphere, ocean and land. Increasingly advanced computational technologies and Earth observation capabilities have enabled the climate models to have higher spatial and temporal resolution, providing an ever realistic coverage of the Earth. The high spatiotemporal resolution also provides us the opportunity to more precisely pinpoint and identify/segment the occurrence of extreme weather events, such as tropical cyclones, which can have dramatic impacts on populations and economies. Deep learning techniques are considered as one of the breakthroughs in recent years, achieving compelling results on many practical tasks including disease diagnosis, facial recognition, autonomous driving. We propose to utilize deep learning techniques on the rapid detection of two extreme weather events: tropical cyclones and dust storms. Deep learning models trained on past climate simulations will inform the effectiveness of the approach on future simulations. Our technological motivation is that currently high-resolution simulations and observations have been generating too much data for researchers, scientists, and organizations to store for their applications. Machine learning methods performing real-time segmentation and classification of relevant features for extreme weather events can generate such list or database storing these features, and detailed information can be obtained by rerunning the simulation with high spatiotemporal data when needed.	2018-2019
GMU-18-02	Climate Indicators downscaling Weather condition has become one of the most essential factors that people concern about in their daily life. People may want to check the weather forecast every day even every several hours especially in some activities that very sensitive to temperature, precipitation or winds, for example, taking flights, etc. But nowadays, civil weather forecasts data are issued every six hours, which is far insufficient to the actual needs. And the spatial resolutions of most weather data such as precipitation and surface winds are around several kilometers which are too coarse for some regions. This project will focus on weather data downscaling to fulfill the increasing needs for short term forecast with high spatial and temporal resolutions.	2018-2019
UCSB-18-01	The World Geographic Reference System v2 and 3D A revision of the World Geographic Reference System (Clarke, Dana, and Hastings, 2002) is proposed. This new WGRS v2 is consistent with UTM/MGRS worldwide and further refines the MGRS grids to 1×1 km tiles, which can be individually named and registered. These simple changes facilitate the development of a dynamic, publicly accessible, Web-map-supported gazetteer, the Place Name System (PNS), analogous to the Internet Domain Name System (DNS).	2018-2019
GMU-17-01	Utilizing High Performance Computing to Detect the Relationship Between the Urban Heat Island and Land System Architecture at the Microscale An urban heat island (UHI) is an urban area that is significantly warmer than its surrounding rural areas caused by human activities. UHI combines the results of all surface–atmosphere interactions and energy fluxes between the atmosphere and the ground, and closely linked to water, energy usage, and health-related consequences, including decreased quality of living conditions, and increased heat-related injuries and fatalities (Changnon et al., 1996; Patz et al., 2005). The prior studies have demonstrated the correlation between land system architecture and urban heat island based on the mediate or coarse spatial resolution data. However, these measurement scales may obscure stronger or different relations between land cover and land surface temperature (LST) because the mixture of land covers in coarse resolutions may hide the relations at finer resolutions where more urban land cover variability occurs (Zhou et al., 2011; Myint et al., 2013; Jenerette et al., 2016). Consequently, an evaluation of urban heat island at micro scales (e.g. < 30 m or even < 10 m), has become an important research goal to improve the understanding of the relationship between UHI and land system architecture (Small 2003; Deng and Wu, 2013; Jenerette et al., 2016; Li et al., 2017). Unfortunately, due to the limitation of computing capability and the efficiency of land-cover classification, most of these researches either selected sample sites from the study area or aggregated small patches into larger blocks, which may cause the bias or miss importation information in the final discovered relationships (Zhou et al., 2011). Based on the extensive experiences at NCCS and GMU for big spatiotemporal data analytics, Spark, cloud computing, and other technologies, we propose to extend the existing high-performance computing framework, ClimateSpark, to detect the relationship between UHI and land system architecture at the microscale. The convolutional neural network will be utilized to improve the accuracy of land-cover information, and the advanced spatial statistic algorithms will be implemented in parallel to provide the affluent computing capability to detect the relationship between UHI and land system architecture at the microscale.	2017-2018
GMU-17-02	Deep Learning for Improving Severe Weather Detection and Anomaly Analysis Severe weather, including dust storms, hurricanes, and thunderstorms, annually cause significant loss of life and property. The detection and forecast of severe weather events will have an immediate impact to society. Numerical simulations and earth observations have been largely improved in spatiotemporal resolution and coverage, so that scientists and researchers are able to better understand and forecast severe weather phenomena. However, it is challenging to obtain long-term climatology for different severe weather events and to accurately predict events by even the most state-of-the-art forecasting models due to the uncertainties of model forecasting. We propose a cloud-based, deep learning system to mine and learn severe weather events (e.g. dust storms, hurricanes, and thunderstorms) and their patterns, as well as anomaly detections from forecasting results. The deep learning system will be tested using three use cases: dust storm, hurricane, and thunderstorm, and it will help meteorologist better detect and understand the evolution patterns of severe weather events.	2017-2018
GMU-17-05	Spatiotemporal Innovation Testbed This project aims to 1) Develop methods for real-time, micro-scale data collection with moving sensors; 2) Augmentation and update of existing data, and generation of new data, new geometries; 3) Improve accessibility of public space using data that is nearly universally needed but unavailable; 4) Spread methods, workflows, knowledge to IAB members.	2017-2018
GMU-17-06	Real-time message georeferencing for geocrowdsourced data integration This project aims to 1) Explore, develop, and demonstrate the use of gazetteer-based geoparsing for generating footprints from text-based location descriptions; 2) Develop a library of spatial footprints (simple, complex); 3) Spatial footprint used for message mapping; 4) Spatial footprint for quality assessment in crowdsourced geospatial data.	2017-2018
Hvd-17-01	Evaluating OmniSci, Open Source GPU-powered SQL Database OmniSci provides a platform that leverages the parallelism and throughput of graphics processing units (GPUs) to achieve orders of magnitude speedups over CPU-based systems. Last year the team was challenged to find an economical way to run OmniSci with appropriate hardware resources. To solve the problem the team collaborated with Harvard’s Research Computing group and deployed OmniSci and PostGIS as public apps on Harvard’s Slurm-based computation cluster, which supports 2.5 million cuda (GPU) cores. The apps can now be deployed by any Harvard researcher or collaborator. OmniSci’s on-GPU data interoperability currently enables end-to-end workflows with Jupyter, Pandas and R. The cluster hosts more than 600 laboratories and 1000 installed applications, making it a rich research ecosystem for developing new use cases, and enhancement recommendations, especially given the importance of spatio-temporal analytics in the age of COVID. To further compliment this environment, the CGA installed Version 2.0 of the Geotweet Archive, a global, continually updated, spatially and temporally tagged, social media dataset.	2017-2018
UCSB-17-01	Siemens: Semantc Applicaton Logic Design for Subject Mater Experts This project aim to design a semantc applicaton logic for subject mater experts. Four milstones are listed below: 1) Conceptualize and implement a framework and interface supportng the import and inclusion of SPIN rules and domain graphs. 2) Add logic validaton and executon capabilites to the workﬂow. 3) Develop export flters that will convert the logic to non-natve (RDF) executon formats, such as RIF or JSON. 4) Integrate and test components.	2017-2018
UCSB-17-02	Forecasting Future Urban Expansion in an African Secondary City, Douala, Cameroon: Transfer of Expertise in GIS and Land Use Change Modeling to Douala University The goals for this project is 1) to bring visiting scholars from Douala University in Cameroon to a training session in the use of GIS and remote sensing to map land use and its changes, to map Douala’s built-up extent at multiple historical time periods; 2) to use the resulting data to create forecasts of long term urban growth and land use change in the region and 3) to promote informed and sustainable urban planning. The project success will be measured in the number of people trained, the number of cities mapped and modeled, and the number of reports and papers created for use in planning and land management.	2017-2018
GMU-16-02	Cloud computing and big data management	2016-2017
GMU-16-03	Computing technology: SmartDrive	2016-2017
GMU-16-04	Health Mapping Incorporating Data Reliability	2016-2017
UCSB-16-01	Applications of High Accuracy and Precision Building Data for Urban Areas	2016-2017
UCSB-16-02	Urban Modeling in Uzbekistan	2016
-2017
UCSB
-16-03	an Open World Gazetteer	2016-2017
Harvard-16-01	HHyperMap	2016-2017
Harvard-	16-02	Semantic
Harvard-16-03	Exploring relationships between cancer vulnerability/resilience and emotional condition/environment from social media	2016-2017
GMU-15-02	Upgrade the Delivery of NASA Earth Observing System Data Products for Consumption by ArcGIS The content and format of NASA EOS data products are defined by their respective Science Teams, stretching back over the past 25 years. Many of these data models are ancient are difficult to consume with other geospatial tools. Specifically, these tools are, in some cases, unable to read the files and/or unable to interpret properly the data organization inside them so they cannot be visualized or analyzed. A solution that can apply to all these data products across NASA data centers would be valuable. We propose a plug-in framework which is developed based on GDAL open source library to interpret the non-compliant data. The framework should have the advantages of extensibility within the EOSDIS allowing the multiple NASA data centers construct their own plug-ins to adjust their data products.	2015-2016
GMU-15-03	Analyzing Spatiotemporal Dynamics Using Place-Based Georeferencing The human world is a world of places, where verbal description and narrative use placenames to describe occurrences, locations, and events. The geospatial, computational, and analytical rely instead on metric georeferencing to place these occurrences, locations, and events on a map. The gazetteer is the linkage between these two worlds, and the means for translating the human world into the computational world. With a new emphasis on social media and crowdsourcing in geospatial data production, Gazetteers and the associated techniques of geoparsing and georeferences are a critical element of an emerging geospatial toolkit. We use gazetteers to validate the contributions of crowdsourced event data contributed by end-users and look at placenaming as a validation tool within quality assessment for geocrowdsourced data. Strategies and best practices for generating and maintaining gazetteer databases for georeferencing crowdsourced data will be explored, determined, and presented.	2015-2016
GMU-15-04	Using Sonification to Analysis Spatiotemporal Dynamics in High-Dimensional Data The human senses are paramount in constructing knowledge about the everyday world around us. The human sensory system is also a key to geospatial knowledge discovery, where patterns, trends, and outliers can be detected visually, and explored in more detail. As the complexity and size of geospatial datasets increase, the tools for geographic knowledge discovery need to expand. This research looks at the use of sonification and auditory display systems to expand the visualization toolkit. First, we use sonification as a way of simplifying the exploration of large, multidimensional data, including space-time data, where certain dimensions of data can be removed from the visual domain and represented efficiently with sound, leading to more effective geographic knowledge discovery. Second, we use sonification as a means of redundant display to reinforce cartographic and geospatial aspects of spatial-temporal display in low-vision environments.	2015-2016
GMU-15-05	A Cyberinfrastructure-based Disaster Management System using Social Media Data During emergencies, it is of significance to deliver accurate and useful information to the impacted communities, and to assess damages to properties, people and the environment, in order to coordinate responses and recovery activities, including evacuations and relief operations. Novel information streams from social media are redefining situation awareness and can be used for damage assessment, humanitarian assistance and disaster relief operations. These streams are diverse, complex and overwhelming in volume, velocity and in the variety of viewpoints they offer. Negotiating these overwhelming streams is beyond the capacity of human analysts and an effective framework should be developed to mine and deliver disaster relevant information in a real-time fashion.	2015-2016
GMU-15-06	FloodNet: Demonstrating a Flood Monitoring Network Integrating Satellite, Sensor-Web and Social Media for the Protection of Life and Property Flooding is the most costly natural disaster, striking with regularity, destroying property, agriculture, transportation, communication and lives. Floods impact developing countries profoundly, but developed nations are hardly immune with floods claiming thousands of lives every year. The threat is increasing as we build along riverbanks and flood plains, construct dykes and levees that channelize flow, and as climate change brings increased extreme weather events including floods.The first line defense for protection of life and property is flood monitoring. Knowledge of floods is truly power when issuing warnings, managing infrastructure, assessing damage, and planning for the future. Information about active floods can be gleaned from satellite sensors, ground stations and sensor-webs, and harvested from social media and citizen scientists. This information is complemented by flood hazard or risk maps, and weather and climate forecasts. These flood information elements exist separately, but would be much more effective at producing actionable flood knowledge if integrated into a seamless flood monitoring network.Therefore, we propose to demonstrate a flood monitoring network that integrates flood information from satellites, sensor-webs, social media, risk maps, and weather/climate forecasts into a user-focused visualization interface (such as GIS or Google-Earth) that enables the production of actionable flood knowledge (FloodNet). We will largely focus on networking existing flood information elements available from government agencies, harvested from social media, and produced by satellite sensors. The demonstration will be performed in a historical context, focused on a few well-known recent flood events in the Mid-Atlantic region, with a vision for global real-time implementation. We will take advantage of recent advances in cloud computing, visualization tools, and spatial-temporal knowledge toolboxes in the implementation of FloodNet.The resulting flood monitoring network will guide civil protection officials, insurers and citizens as to current flood hazards and future flooding risks.	2015-2016
GMU-15-07	Benchmarking Timely Decision Support and Integrating Multi-Source Spatiotemporal Environmental Datasets In the past decade, natural disasters have become more frequent. It is widely recognized that the increasing complexity of environmental problems at local, regional, and global scales need to be attacked by integrated approaches. Explosive growth in spatiotemporal data and emergence of social media make it possible and also emphasize the need for developing new and computationally efficient geospatial analytics tailored for analyzing big data. This project aims to provide decision support for life and property with maximum accuracy and minimum human intervention by leveraging near-real time integration of government satellite and model assets using HPC, virtual computing and storage environments, OGC standard protocols. Additionally, we are going to benchmark latency and science validity of end-to-end (E2E) solutions using machine-to-machine (M2M) interfaces to exploit NOAA, USGS, NASA environmental data from satellites, forecast models and social media data to generate more accurate and timely decision support information.	2015-2016
UCSB-15-02	Assessment and Applications of High Accuracy and Precision building data for urban areas The company Solar Census has developed an unprecedented means by which high resolution (10cm) stereo overhead imagery is processed photogrammetrically to extraordinary levels of accuracy, and then models are applied that orthorectify and extract building footprints and roofs with unprecedented fidelity. Test acquisitions of new imagery have been supported by the Department of Energy for test areas in northern California, and new data are forthcoming for the entire state, and for the State of New York. Solar Census has an application that solves the solar equation across building roofs for identifying optimal locations for the placement of photovoltaic electric panels to generate distributed solar power. The purposes of the collaboration between Solar Census and the UCSB Geography site for the I/UCRC Center for Spatiotemporal Thinking, Computing and Applications are twofold: 1) complete an accuracy assessment to quantify the vertical and horizontal accuracy of the new data; and 2) explore innovative potential new applications of the data that could present new revenue streams and business opportunities for the data, which could potentially be available nation-wide.	2015-2016
Harvard-15-01	A Training-by-Crowdsourcing Approach for Place Name Extraction from Large Volumes of Scanned Maps We propose to develop a training-by-crowdsourcing approach for automatic extraction of place names in large volumes of georeferenced scanned maps. Place names very often exist only in paper maps and have potential use both for adding semantic content and for providing search and indexing capabilities to the original scanned maps. Moreover place names can be used to strengthen existing gazetteers (place name databases), which are the foundation to support effective geotagging or georeferencing of many document and media types. The proposed solution will provide a map text extraction service and web map client interface that accesses the service. The extraction service will consume raw map images from standard WMSs, and output spatiotemporally labeled place names. The client will allow users to curate (i.e., update, delete, insert, and edit) extraction results and share the results with other users. The user curation process will be recorded and sent to the extraction service to train the underlying map processing algorithms for handling map areas where no user training has yet been done.	2015-2016
Harvard-15-02	Building an Open Source, Real-Time, Billion Object Spatio-Temporal Exploration Platform There is currently no general purpose platform to support interactive queries and geospatial visualizations against datasets containing even a few million features against which queries return more than ten thousand records. To begin to address this fundamental lack of public infrastructure, we will design and build an open source platform to support search and visualization against a billion spatio-temporal features. The instance will be loaded with the latest billion geotweets (tweets which contain GPS coordinates from the originating device), and which the CGA has been harvesting since 2012. The system will run on commodity hardware and well known software. It will support queries by time, space, keyword, user name, and operating system. The platform will be capable of returning responses to complex queries in less than 2 seconds. Spatial heatmaps will be used to represent the distribution of results returned at any scale, for any number of features. Temporal histograms will be used to represent the distribution over time of results returned at any scale. The system will be capable of generating kernal density visualizations from massive collections of point measurements such as weather, pollution, or other sensor streams.	2015-2016
Harvard-15-03	Addressing the Search Problem for Geospatial Data We are currently engaged in building a general purpose, open source, global registry of map service layers on servers across the web. The registry will be made available for search via a public API for anyone to use to find and bind to map layers from within any application. We are developing a basic UI that will integrate with WorldMap http://worldmap.harvard.edu (open source general purpose map collaboration platform) and make registry content discoverable by time, space, and keyword, and savable and sharable online. The system will allow users to visualize the geographic distribution of search results regardless of number of layer returned by rendering heatmaps of overlapping layer footprints. All assets in the system will be map layers that can used immediately within WorldMap or within any other web or desktop mapping client. Uptime and usage statistics will be maintained for all resources and these will be used to continually improve search. Core elements of this project are currently funded by a grant from the National Endowment for the Humanities, but there are important aspects which are not supported. For example, the grant focuses on OGC and Esri image services, though there exist many other spatial assets in need of organization, including feature services, processing services, shapefiles, KML/KMZ, and other raster and vector formats. There are also important types of metadata we are not handling. We have developed basic tools for crawling the web using Hadoop and a pipeline to harvest and load results to a fast registry, but there are many ways both crawl and harvest can be improved.	2015-2016
Harvard-15-04	HyperMap: An Ontology-driven Platform for Interactive Big Data Geo-analytics Sensing technology and the digital traces of human activity are providing us with ever larger spatiotemporally referenced data streams. Computing and automated analysis advances are at the same time decreasing the effort of drawing knowledge from such large data volumes. Still, there is a gap between the ability to run large batch-type data processing tasks and the interactive engagement with analysis that characterizes most research. There appear to be three principal scales (in both volume and task time) of processing tasks: asynchronous summarization of billions of records by space-time and other relevant dimensions, synchronous analysis of the summary data using statistical / functional models, and interactive visual interpretation of model results. The forward workflow is becoming more and more common, but feedback from interpretation to refine the larger-scale process steps is still most often a logistical nightmare. We propose to develop a platform that flexibly links the three stages of geo-analysis using a provenance-orchestration ontology and OGC service interface standards such as Web Processing Service (WPS). The purpose of the platform will be to provide domain experts the tools to explore – iteratively and interactively – extremely large datasets such as the CGA geo-tweet corpus without spending most of there time in performing system engineering. Researchers will be able to leverage a semantic description of an analysis workflow to drill back from interesting visual insights to the details of processing and then trigger process refinements by updating the workflow description instead of having to re-write processing codes and scripts. The HyperMap platform is envisioned to support several approaches to big data summarization. Initial design targets include factorization of unstructured data such as geo-tweets, classification of coverages, and recognition of imagery feature hierarchies.	2015-2016
Harvard-15-05	Terrain and Hydrology Processing of High Resolution Elevation Data Sets Raster data sets representing elevation are being released at increasingly high resolutions. The National Elevation Dataset (NED) has gone from 30m to 10m and is now available in many states at 3m resolution. At the local and state level, LIDAR-based elevation data is available for many locations, particularly coastal areas and those subject to flooding. As horizontal resolution improves, vertical resolution and accuracy are also improving, but while higher resolution is improving the ability to leverage these data sets for modeling hydrological flow, visibility, slope and other data processing operations, the exponentially larger size of the data sets is presenting significant data processing challenges, even with professional workstation GIS tools. Under this proposal, the project team will develop and implement new algorithms for performing parallel data processing on large raster data sets. The work will leverage the open source Apache Spark and GeoTrellis projects, both based on the Scala functional programming language. It will also take advantage of other open source efforts supporting data processing at scale, including the Hadoop Distributed File System (HDFS) and indexing tools such as Cassandra and Accumulo. The results of the work will be released under a business-friendly Apache2 license, and will be aimed at supporting execution of large elevation data processing operations on clusters of virtual machines. Specific processing operations may include: viewshed, flow accumulation, flow direction, watershed delineation, sink, slope, aspect, and profiling operations. The proposed work will be synergistic with other proposed research projects, including the HyperMap effort to classify terrain types and channel areas based on large, high resolutions elevation data sets.	2015-2016
Harvard-15-06	Feature Classification Using Terrain and Imagery Data at Scale Drones, micro-satellites, and other innovations promise to both lower the cost and rapidly increase the amount of available raster imagery data. Initial use of this imagery is currently focused on supporting visualization of geospatial data. However, there is substantial opportunity to provide the ability to extract features from the imagery using simple user interfaces. Feature classification from raster imagery is not a new capability, and it is supported by several commercial workstation products. In addition, contemporary techniques rely not only on the imagery itself, but also leverage elevation data to improve the accuracy of the feature classification. However, the ability to do so with large data sets through a simple browser-based user interface is a significant challenge. Under this proposal, the project team will develop and implement a prototype web-based software tool that will be able to use a combination of elevation and imagery data to enable users to extract vector polygon features with real-time processing speeds. The work will leverage the open source Apache Spark and GeoTrellis projects, both based on the Scala functional programming language. It will also take advantage of other open source efforts supporting data processing at scale, including the Hadoop Distributed File System (HDFS) and indexing tools such as Cassandra and Accumulo. The results of the work will be released under a business-friendly Apache2 license, and will be aimed at supporting execution of large data processing operations on clusters of virtual machines. The proposed work will be synergistic with other proposed research projects, including the HyperMap project to classify terrain types and channel areas based on large, high resolutions elevation data sets and the Place Name extraction from historic maps project.	2015-2016
GMU-14-04	Developing a spatiotemporal cloud advisory system for better selecting cloud services We propose a web-based cloud advising system to (1) integrate heterogeneous cloud information from different providers, (2) automatically retrieve update-to-date cloud information, (3) recommend and evaluate cloud solutions according to users’ selection preferences.	2014-2015
GMU-14-02	Developing an open spatiotemporal analytics platform for big event data We propose to design a visual analytical platform to systematically perform inductive pattern analysis on real-time volunteered event data. The platform will be built based on tools and methods that were developed by us in previous studies. The accomplished platform will not only enable spatiotemporal pattern exploration of big event data in the short term, but also lay the concrete foundation for using volunteered data for tasks such as urban planning in the long term.	2014-2015
GMU-14-06	Incorporating quality information to support spatiotemporal data and service exploration	2014-2015
Harvard-14-01	Temporal gazetteer for geotemporal information retrieval Place names are a key part of geographic understanding and carry a full sense of changing perspective over time, but existing gazetteers do not in general represent the temporal dimension. This project will develop, populate, and implement services for a place name model that incorporates realistic complexity in the temporal, spatial, and language elements that form a place name. Additional tools will be developed to conflate and reconcile place name evidence from authoritative, documentary, and social sources.	2014-2015
Harvard-14-04	Cartographic ontology for semantic map annotation Map annotation produces highly-relevant, high-value information, whose utility however often critically depends on semantic interoperability. Achieving that requires an ontology-based, semantic web, linked open data approach. We will develop a key missing ingredient, the cartographic annotation ontology, to characterize the complex structures and rich visual, symbolic, and geospatial languages that maps use to represent geographic information.	2014-2015
Harvard-14-05	A Paleo-Event Ontology for Characterizing and Publishing Irish Historical and Fossil Climate Data Integration of both Big and Little spatio-temporal data from different scientific domains is vital for validating climate models, as a single volcanic eruption, for example, can have a great effect. Yet observation of deep time events, without deep-time observers, means we must discern paleo-events through observation of fossilized, event-proxy, features. Using medieval monastic records, tree-ring data, ice core features and volcanic eruption phenomena to inform our efforts, we will develop a deep-time climate event observation ontology to characterize the nature and relationships of the data.	2014-2015
Harvard-14-06	Emotional City – measuring, analyzing and visualizing citizens’ emotions for urban planning in smart cities Emotional City contributes provides a human-centered approach for extracting contextual emotional information from technical and human sensor data. The methodology used in this project consists of four steps: 1) detecting emotions using wristband sensors, 2) “ground-truthing” these measurements using a People as Sensors smartphone app, 3) extracting emotion information from crowdsourced data like Twitter, 4) correlating the measured and extracted emotions. Finally, the emotion information is mapped and fed back into urban management for decision support and for evaluating ongoing planning processes.	2014-2015
UCSB-14-01	Dismounted navigationIndoor Mapping Using Multi-Sensor Point Clouds	2014-2015
UCSB-14-02	Indoor Mapping Using Multi-Sensor Point Clouds Develop and evaluate method for creating 3D indoor maps using point clouds generated by multiple sensor platforms.	2014-2015
UCSB-14-03	Pattern driven Exploratory Interaction with Big Geo Data	2014-2015