Proposed Projects

ID Name Period
STC-19-00 Advancing Spatiotemporal Studies to Enable 21st Century Sciences and Applications

Many 21st century challenges to contemporary society, such as natural disasters, happen in both space and time, and require spatiotemporal principles and thinking be incorporated into computing process. A systematic investigation of the principles would advance human knowledge by providing trailblazing methodologies to explore the next generation computing models for addressing the challenges. This overarching center project is to analyze and collaborate with international leaders, and the science advisory committee to generalize spatiotemporal thinking methodologies, produce efficient computing software and tools, elevate the application impact, and advance human knowledge and intelligence. Objectives are to: a) build spatiotemporal infrastructure from theoretical, technological and application aspects, b) innovate the spatiotemporal studies with new tools, systems, and applications, c) educate K-16 and graduate students with proper texts and curriculum, d) develop a community for spatiotemporal studies from center sites, members, regional, national to global level through IAB meetings, symposium, and other venues.

STC-19-01 Innovating a Computing Infrastructure for Spatiotemporal Studies

In phase I, the spatiotemporal innovation center built a 500 computing nodes cloud facility, which enables most projects of the center. After 5 years operation of the infrastructure, we see a need for an upgrade infrastructure with more RAM, faster CPU speed, more storage on each node and hopefully with a GPU cluster that can help us to address the growing challenges on image/graphics processing and deep learning in the phase II operation. Based on our IAB’s recommendation and center projects, we propose to a) develop and maintain an upgraded computing infrastructure with more computing power and graphics, deep learning capabilities, b) provide spatiotemporal computing coordination and research to all center projects with computing needs by maintaining a highly capable research staff support to optimize the computing infrastructure, c) serve the campus needs of computing with spatiotemporal interest to gain broader impact and engagement of scientists and students, and d) adopt and develop advanced spatiotemporal computing technologies to innovate the next generation computing tools.

STC-19-02 Spatiotemporal Innovation Testbed

The first phase of the Spatiotemproal I/UCRC has witnessed many spatiotemporal innovations in the past six years. Like innovations in any other domain and technology area, spatiotemporal innovation also takes the hype cycle of maturity and many of them emerge in recent years. The community needs a comprehensive information source on what, when, where, and how much efforts are needed for maturing, adopting, and operating the new innovations. To reduce the high illusion of innovation hype cycle and meet this community need, we propose to establish a testbed utilizing the center’s infrastructure as part of the spatiotemporal infrastructure envisioned for the center to implement in its 15 years of investigation. This project will draw best practices from past investigations such as the computing infrastructure, big data testbed, cloud testbed, EarthCube testbed, ESIP testbed to maintain and automate a testbed environment. The testbed will serve as a platform for members, faculty, students, and the community to validate, verify new technologies & emerging innovations, and to produce white papers, review papers, evaluation publications for the broadest impacts.

Hvd-19-01 China Data Lab: Developing an online spatial data sharing and management platform

This project is to develop an open source based online platform for spatial data discovery, sharing and management, specialized on China data. The platform will allow researchers to share their spatial data with others online, conduct spatial data analysis with geospatial and statistical tools on the cloud, develop data-driven case studies, and share data and results as a package with others. The platform will also support case-based training programs on spatiotemporal data analysis for economic, social, public health, urban planning and other research subjects, focused on China.

Hvd-19-02 Building a geospatial differential privacy server for shared mobility data

This proposal is to build differential privacy into ride share data to allow government analysts to make useful queries on telemetry data to learn generalized patterns, while enabling individual level location data to obtain the strong privacy guarantees of differential privacy such that no re-identification attack is possible. Moreover, the worst-case amount of individual information possible to be leaked by any published results can be precisely and formally measured, so that cumulative privacy loss across all access to the system can be monitored.

Hvd-19-03 Using Internet Remote-Sensing to Estimate High-Precision Connectivity Statistics

The mass adoption of the Internet has boosted the demand for scientific explanations about the effects of digitalization. What is the impact of social media on elections and polarization? What is the effect of digital technologies in economic growth, inequality or unemployment? How is public health affected by increased access to medical websites?

Official statistics typically provide a country-year resolution, but researchers need more precision in order to take into account variation inside countries such as urban versus rural areas, and also shorter-term effects of seasonal dynamics and shocking events. In addition, researchers working with highly precise Internet data need to address the challenges introduced by privacy legislation as well.

The Internet Connectivity Statistics Dataverse is the most precise dataset of Internet connections for scientific research available and it contributes to overcome both the precision and the privacy related challenges. First, we analyze the global traffic of the Internet using remote sensing to estimate connectivity by months, and down to city resolution. As we rely on direct observation of the Internet, we can get estimates also in areas where official statistics are not available, and data cannot be retrieved, such as in the case of authoritarian regimes or territories experiencing political violence. Second, we estimate connectivity statistics using differential privacy algorithms, and we test the accuracy of our estimates. Finally, we make the statistics available for the entire research community thanks to the Harvard University Dataverse, the most prominent research data sharing software, maintained by the Institute for Quantitative Social Science.

Hvd-19-04 Scaling K-nearest Neighbor Calculations using Geohash

Spatial Clustering, Indexed Search, and Compression
We propose to develop a practical, cost-effective, easy to use platform to perform fast geospatial k-means clustering on big geospatial datasets. The system makes use of mutually reinforcing optimization techniques: geohashing, disk clustering, index based searches, and data compression, to build a novel system that makes the normally slow and resource intensive process of spatial clustering faster and less expensive than alternatives. In tests using an input dataset of 180 million point features where K=1000, we achieved average throughput of 200,000 distance calculations per second, to generate 180 billion measurements on a medium-sized Amazon instance. To make system easy for any analyst to implement we offer the option of an Amazon AMI deliverable which replicates the entire computation environment and comes with all required libraries preinstalled and configured. Once the AMI is launched, the system is ready for data loading, and all calculation work including compression and storage of results, is handled automatically.

Hvd-19-05 Leveraging Geovisual Analytics and Digital Epidemiological Methods for Emerging Outbreaks of Infectious Diseases after Natural Disasters in Developing Regions

Infectious disease outbreaks triggered by natural disasters (e.g. floods, earthquakes) pose great challenges to disease surveillance, especially in developing regions because of the loss of homes, displacement of population, damaged health infrastructure, and long reporting delays. Digital epidemiological methods have emerged in the last decade as a complementary alternative to provide near real-time disease activity estimates, in the absence of timely and accurate reporting from traditional healthcare-based surveillance systems. Most digital epidemiology efforts to date have focused on the computational modeling challenges of tracking diseases, and only a few of these have investigated the evident potential that would emerge from involving humans into the analytical and decision-making processes that emerge from the use and interpretation of these digital epidemiological methods. Here we aim at developing a human-centered real-time disease surveillance system with the goal of improving the surveillance and response to emerging outbreaks of infectious diseases caused by natural disasters in developing regions. We plan on focusing on the recent cholera outbreaks that have emerged after the landing of cyclones Idai and Kenneth in southeastern African nations, such as Mozambique.

Hvd-19-06 Collaboration Between RMDS and Dataverse

Dataverse is an open source online data repository platform where users can share, preserve, cite, explore, and analyze research data. RMDS Lab is a startup company developing transformative technologies for research with big data and AI. This project is to establish a collaboration between the two teams and two platforms to create synergy that will advance the shared goal of supporting worldwide scholars in data-driven research. The main objective of this project is to explore solutions to apply AI technology in evaluating data science studies, provide measurable references for data scientists on the accuracy, impactfulness, replicability, applicability, and other merit scores of data science study cases; and to promote high-quality data science research through platform development, data sharing, community building, and user training..

GMU-19-01 Cloud classification

Cloud types, coverage and distribution have significant influence on the characteristic and dynamic of global climate. They are directly related to the energy balance of the earth. Therefore, accurate cloud classification and analysis are essential for the research of atmosphere and climate change. Cloud classification assigns a predetermined label to cloud in the image, e.g., cirrus, altostratus and altocumulus. With cloud segmentation, satellite imagery can be utilized to support a series of local mesoscale climate analysis like rainy cloud detection, cyclone detection, or extreme weather event (e.g. heavy rainfall) predictions. However, it is a challenging task to distinguish different clouds from satellite imagery because of intraclass spectral variations and interclass spectral similarities.

Traditionally, cloud types are classified using selected features and threshold such as cloud-top pressure (CTP), cloud optical thickness (COT), brightness temperature (BT) and multilayer flag (MLF). One drawback is that the model accuracy heavily relies on threshold and feature selection. The past years have witnessed the successful deep learning applications in automatically feature selection for object detection from images with the aid of CNN model and its variants such as VGGNet, ResNet. Inspired by successful applications of deep learning in computer vision, we propose to implement an automatic cloud classification system based on deep neural network to identify the 8 kinds of cloud from geostationary and polar orbit satellite data, with cloud types from 2B-CLDCLASS product of CloudSat-CPR as the reference of label.

GMU-19-02 Big data analytics for space situational awareness

Space situational awareness (SSA) learns current and predictive knowledge of space events, threats, activities, conditions and space system (space, ground, link) status capabilities, constraints and employment. With data collected from telescopes, satellites and other sources, thousands of space objects are tracked, cataloged, and maintained, however, big observation data need to be collected constantly to distill such kinds of knowledge, which pose grant challenges to data management system. The goal of this project is to develop a big space observation data analytical platform to better assist space situational awareness. The distributed storage layer supports storage and access to space observation data with parallel I/O. The metadata layer will manage metadata and interact with a smart search engine to provide efficient and accurate data discovery functionalities. The analytical layer serves as an efficient and effective tool to mine spatiotemporal patterns, detect and predict events in near-Earth space. Finally, the visualization layer presents the orbit of natural and manmade objects in the near-Earth space. By distilling knowledge from dispersed observation data, this big data analytical platform is expected to advance space situational awareness across government agencies and scientific communities.

GMU-19-03 Planetary Defense 

Programs like NASA’s Near-Earth Object (NEO) Survey supply the PD community with the necessary information that can be utilized for NEO mitigation. However, information about detecting, characterizing and mitigating NEO threats is still dispersed throughout different organizations and scientists, due to the lack of structured architecture. This project is aimed to develop a knowledge discovery search engine to provide discovery and easy access to the PD related resources by developing 1) a domain-specific Web crawler to automate the large-scale up-to-date discovery of PD related resource, and 2) a search ranking method to better rank the search results. The Web crawler is based on Apache Nutch, one of the well-recognized highly scalable web crawler. In this research, Apache Nutch is extended in three aspects: 1) a semi-supervised approach is developed to create PD-related keyword list; 2) an improved similarity scoring function is utilized to set the priority of the web pages in the crawl frontier; and 3) an adaptive approach is designed to re-crawl/update web pages. The search ranking module is built upon Elasticsearch. Rather than using the basic search relevance function of Elasticsearch, a PageRank based link analysis and a LDA based topic modelling approach are developed to better support the ranking of interconnected web pages..

GMU-19-04 Micro-scale Urban Heat Island Spatiotemporal Analytics and Prediction Framework

As one of the adverse effects of urbanization and climate change, Urban Heat Island (UHI) can affect human health. Most researches have been relying on remote sensing imagery or sparsely distributed station sensor data and focusing on the broad understanding of the meso- or city- scale UHI phenomenon and mitigation support. However, challenges remain for the micro-level. This project aims to: 1) build an in-depth investigation of the human-weather-climate relations for the urban area; 2) fill the gap between short-term weather impact effects from buildings, traffics, human mobilities, and long-term microclimate from understanding such relations with real-time urban sensing (IoT) data; 3) establish a machine-learning enabled ensemble model for fast near-future temperature forecasts by considering the human-weather-climate relationships; 4) provide guideline for the precautionary local-human-activity management strategy design and implementation according to the forecasts to reduce public health-related risks, allowing better urban living spaces.

GMU-19-05 Why is my training data never good enough? Quantifying training data representativeness for scaling up Convolutional Neural Networks to large geographic areas.

With increased availability of affordable, frequent, high resolution satellite imagery there has been a proliferation of machine learning methods, notably convolutional neural networks (CNN’s), for automated image interpretation. Despite this progress, the biggest challenge remains the insatiable demand for more training data that is most often produced by human operators – the same human operators that are already overwhelmed by the large satellite data volumes. The research community is grappling with methods to produce training data that are sufficiently representative of large areas to which they want to scale up their machine learning models. Although much emphasis has been placed on required computing resources and CNN architectures, our research has demonstrated that the structure of the training data is the overriding determinant of model accuracy and regional generalization of CNN classifications. The objective of our research is to explore to relationship between CNN classification accuracy and the representativeness of training data across increasing geographical distance and relate this to CNN feature space. To this end we are conducting experiments with automatically generated training data using ancillary data sets (building footprints available from counties and Open StreetMaps, building counts, high resolution land cover and percentage imperviousness available for the entire Chesapeake bay catchment) and 1m resolution aerial photography data (NAIP). The training data sets and the operational application area will be systematically varied across the MidAtlantic region to simulate diverse scenarios to tease out the underlying relationships. In the experiments CNN’s are applied to image tiles of NAIP imagery (200m X 200m) used in the following use cases (i) classify 1m resolution land cover, (ii) predict percentage imperviousness, predict total building footprint in image tile and (iii) predict number of buildings in image tile. The study will be applied to cities and their surrounding areas (30km buffer) distributed throughout the MidAtlantic region which largely coincides with the Chesapeake bay catchment. The research will address science questions such as, what is relationship between the representativeness (measured as dissimilarity in CNN feature vector) of training data in relation to increasing geographical distance between training and application areas and what is its influence on CNN classification accuracy? Simply put, can a CNN model trained with data from Fairfax (VA) be applied to multiple other cites at increasing distance away (e.g. Harrisburg MD) and how is the accuracy of these classifications related to distances in feature space and geographical space. This will help the community develop reasonable expectations for regional machine learning applications based on high resolution satellite imagery.

STC-15-02 Dynamic Mapping of Secondary Cities

Secondary Cities are non-primary cities, characterized by population size, function and/or economic status.  They are urban centers of governance, logistics, and production and are often data poor. This project is a global initiative to address critical geospatial data needs of secondary cities. The objective is to enhance emergency preparedness, human security and resilience. The project facilitates partnership with local organizations for data generation and sharing, using open source tools, and focuses on applied geography – human geography thematic areas.

GMU-18-01 Rapid extreme weather events detection and tracking from 4D/5D climate simulations

Climate simulations provide valuable information to represent the situations of the atmosphere, ocean and land. Increasingly advanced computational technologies and Earth observation capabilities have enabled the climate models to have higher spatial and temporal resolution, providing an ever realistic coverage of the Earth. The high spatiotemporal resolution also provides us the opportunity to more precisely pinpoint and identify/segment the occurrence of extreme weather events, such as tropical cyclones, which can have dramatic impacts on populations and economies. Deep learning techniques are considered as one of the breakthroughs in recent years, achieving compelling results on many practical tasks including disease diagnosis, facial recognition, autonomous driving. We propose to utilize deep learning techniques on the rapid detection of two extreme weather events: tropical cyclones and dust storms. Deep learning models trained on past climate simulations will inform the effectiveness of the approach on future simulations. Our technological motivation is that currently high-resolution simulations and observations have been generating too much data for researchers, scientists, and organizations to store for their applications. Machine learning methods performing real-time segmentation and classification of relevant features for extreme weather events can generate such list or database storing these features, and detailed information can be obtained by rerunning the simulation with high spatiotemporal data when needed.

GMU-18-02 Climate Indicators downscaling

Weather condition has become one of the most essential factors that people concern about in their daily life. People may want to check the weather forecast every day even every several hours especially in some activities that very sensitive to temperature, precipitation or winds, for example, taking flights, etc. But nowadays, civil weather forecasts data are issued every six hours, which is far insufficient to the actual needs. And the spatial resolutions of most weather data such as precipitation and surface winds are around several kilometers which are too coarse for some regions. This project will focus on weather data downscaling to fulfill the increasing needs for short term forecast with high spatial and temporal resolutions.

UCSB-18-01 The World Geographic Reference System v2 and 3D

A revision of the World Geographic Reference System (Clarke, Dana, and Hastings, 2002) is proposed. This new WGRS v2 is consistent with UTM/MGRS worldwide and further refines the MGRS grids to 1×1 km tiles, which can be individually named and registered. These simple changes facilitate the development of a dynamic, publicly accessible, Web-map-supported gazetteer, the Place Name System (PNS), analogous to the Internet Domain Name System (DNS).

GMU-17-01 Utilizing High Performance Computing to Detect the Relationship Between the Urban Heat Island and Land System Architecture at the Microscale

An urban heat island (UHI) is an urban area that is significantly warmer than its surrounding rural areas caused by human activities. UHI combines the results of all surface–atmosphere interactions and energy fluxes between the atmosphere and the ground, and closely linked to water, energy usage, and health-related consequences, including decreased quality of living conditions, and increased heat-related injuries and fatalities (Changnon et al., 1996; Patz et al., 2005). The prior studies have demonstrated the correlation between land system architecture and urban heat island based on the mediate or coarse spatial resolution data. However, these measurement scales may obscure stronger or different relations between land cover and land surface temperature (LST) because the mixture of land covers in coarse resolutions may hide the relations at finer resolutions where more urban land cover variability occurs (Zhou et al., 2011; Myint et al., 2013; Jenerette et al., 2016). Consequently, an evaluation of urban heat island at micro scales (e.g. < 30 m or even < 10 m), has become an important research goal to improve the understanding of the relationship between UHI and land system architecture (Small 2003; Deng and Wu, 2013; Jenerette et al., 2016; Li et al., 2017). Unfortunately, due to the limitation of computing capability and the efficiency of land-cover classification, most of these researches either selected sample sites from the study area or aggregated small patches into larger blocks, which may cause the bias or miss importation information in the final discovered relationships (Zhou et al., 2011). Based on the extensive experiences at NCCS and GMU for big spatiotemporal data analytics, Spark, cloud computing, and other technologies, we propose to extend the existing high-performance computing framework, ClimateSpark, to detect the relationship between UHI and land system architecture at the microscale. The convolutional neural network will be utilized to improve the accuracy of land-cover information, and the advanced spatial statistic algorithms will be implemented in parallel to provide the affluent computing capability to detect the relationship between UHI and land system architecture at the microscale.

GMU-17-02 Deep Learning for Improving Severe Weather Detection and Anomaly Analysis

Severe weather, including dust storms, hurricanes, and thunderstorms, annually cause significant loss of life and property. The detection and forecast of severe weather events will have an immediate impact to society. Numerical simulations and earth observations have been largely improved in spatiotemporal resolution and coverage, so that scientists and researchers are able to better understand and forecast severe weather phenomena. However, it is challenging to obtain long-term climatology for different severe weather events and to accurately predict events by even the most state-of-the-art forecasting models due to the uncertainties of model forecasting. We propose a cloud-based, deep learning system to mine and learn severe weather events (e.g. dust storms, hurricanes, and thunderstorms) and their patterns, as well as anomaly detections from forecasting results. The deep learning system will be tested using three use cases: dust storm, hurricane, and thunderstorm, and it will help meteorologist better detect and understand the evolution patterns of severe weather events.

GMU-17-05 Spatiotemporal Innovation Testbed

This project aims to 1) Develop methods for real-time, micro-scale data collection with moving sensors; 2) Augmentation and update of existing data, and generation of new data, new geometries; 3) Improve accessibility of public space using data that is nearly universally needed but unavailable; 4) Spread methods, workflows, knowledge to IAB members.

GMU-17-06 Real-time message georeferencing for geocrowdsourced data integration

This project aims to 1) Explore, develop, and demonstrate the use of gazetteer-based geoparsing for generating footprints from text-based location descriptions; 2) Develop a library of spatial footprints (simple, complex); 3) Spatial footprint used for message mapping; 4) Spatial footprint for quality assessment in crowdsourced geospatial data.

Harvard-17-01 Building GPU-accelerated spatiotemporal big data applications with cloud-based MapD open source

For this project we are evaluating the open source core of the MapD analytics platform for geospatial data analysis and exploration applications.  MapD provides a SQL database that leverages the parallelism and throughput of graphics processing units (GPUs) to achieve orders-of-magnitude speedups over CPU-based systems and enables query of big datasets with very low latency. Although simple position-coordinate-based spatial queries can be made, full standards-based spatial datatype and operator functionality and wide geospatial interoperability with other ecosystem components has not yet been implemented within MapD.  One focus of this project will be to identify the most important capabilities to develop and help navigate a roadmap for addressing them through successive testing in real world applications.

UCSB-17-01 Siemens: Semantc Applicaton Logic Design for Subject Mater Experts

This project aim to design a semantc applicaton logic for subject mater experts. Four milstones are listed below: 1) Conceptualize and implement a framework and interface supportng the import and inclusion of SPIN rules and domain graphs. 2) Add logic validaton and executon capabilites to the workflow. 3) Develop export flters that will convert the logic to non-natve (RDF) executon formats, such as RIF or JSON. 4) Integrate and test components.

UCSB-17-02 Forecasting Future Urban Expansion in an African Secondary City, Douala, Cameroon: Transfer of Expertise in GIS and Land Use Change Modeling to Douala University

The goals for this project is 1) to bring visiting scholars from Douala University in Cameroon to a training session in the use of GIS and remote sensing to map land use and its changes, to map Douala’s built-up extent at multiple historical time periods; 2) to use the resulting data to create forecasts of long term urban growth and land use change in the region and 3) to promote informed and sustainable urban planning.

The project success will be measured in the number of people trained, the number of cities mapped and modeled, and the number of reports and papers created for use in planning and land management.

GMU-16-02 Cloud computing and big data management 2016-2017
GMU-16-03 Computing technology: SmartDrive 2016-2017
GMU-16-04 Health Mapping Incorporating Data Reliability 2016-2017
UCSB-16-01 Applications of High Accuracy and Precision Building Data for Urban Areas 2016-2017
UCSB-16-02 Urban Modeling in Uzbekistan 2016
-16-03 an Open World Gazetteer 2016-2017
Harvard-16-01 HHyperMap 2016-2017
Harvard- 16-02 Semantic ally enhanced workbench for responsive big data geoprocessing and visualization 2016-2017
Harvard-16-03 Exploring relationships between cancer vulnerability/resilience and emotional condition/environment from social media 2016-2017
GMU-15-02 Upgrade the Delivery of NASA Earth Observing System Data Products for Consumption by ArcGIS

The content and format of NASA EOS data products are defined by their respective Science Teams, stretching back over the past 25 years. Many of these data models are ancient are difficult to consume with other geospatial tools. Specifically, these tools are, in some cases, unable to read the files and/or unable to interpret properly the data organization inside them so they cannot be visualized or analyzed. A solution that can apply to all these data products across NASA data centers would be valuable. We propose a plug-in framework which is developed based on GDAL open source library to interpret the non-compliant data. The framework should have the advantages of extensibility within the EOSDIS allowing the multiple NASA data centers construct their own plug-ins to adjust their data products.

GMU-15-03 Analyzing Spatiotemporal Dynamics Using Place-Based Georeferencing

The human world is a world of places, where verbal description and narrative use placenames to describe occurrences, locations, and events. The geospatial, computational, and analytical rely instead on metric georeferencing to place these occurrences, locations, and events on a map. The gazetteer is the linkage between these two worlds, and the means for translating the human world into the computational world. With a new emphasis on social media and crowdsourcing in geospatial data production, Gazetteers and the associated techniques of geoparsing and georeferences are a critical element of an emerging geospatial toolkit. We use gazetteers to validate the contributions of crowdsourced event data contributed by end-users and look at placenaming as a validation tool within quality assessment for geocrowdsourced data. Strategies and best practices for generating and maintaining gazetteer databases for georeferencing crowdsourced data will be explored, determined, and presented.

GMU-15-04 Using Sonification to Analysis Spatiotemporal Dynamics in High-Dimensional Data

The human senses are paramount in constructing knowledge about the everyday world around us. The human sensory system is also a key to geospatial knowledge discovery, where patterns, trends, and outliers can be detected visually, and explored in more detail. As the complexity and size of geospatial datasets increase, the tools for geographic knowledge discovery need to expand. This research looks at the use of sonification and auditory display systems to expand the visualization toolkit. First, we use sonification as a way of simplifying the exploration of large, multidimensional data, including space-time data, where certain dimensions of data can be removed from the visual domain and represented efficiently with sound, leading to more effective geographic knowledge discovery. Second, we use sonification as a means of redundant display to reinforce cartographic and geospatial aspects of spatial-temporal display in low-vision environments.

GMU-15-05 A Cyberinfrastructure-based Disaster Management System using Social Media Data

During emergencies, it is of significance to deliver accurate and useful information to the impacted communities, and to assess damages to properties, people and the environment, in order to coordinate responses and recovery activities, including evacuations and relief operations. Novel information streams from social media are redefining situation awareness and can be used for damage assessment, humanitarian assistance and disaster relief operations. These streams are diverse, complex and overwhelming in volume, velocity and in the variety of viewpoints they offer. Negotiating these overwhelming streams is beyond the capacity of human analysts and an effective framework should be developed to mine and deliver disaster relevant information in a real-time fashion.

GMU-15-06 FloodNet: Demonstrating a Flood Monitoring Network Integrating Satellite, Sensor-Web and Social Media for the Protection of Life and Property

Flooding is the most costly natural disaster, striking with regularity, destroying property, agriculture, transportation, communication and lives. Floods impact developing countries profoundly, but developed nations are hardly immune with floods claiming thousands of lives every year. The threat is increasing as we build along riverbanks and flood plains, construct dykes and levees that channelize flow, and as climate change brings increased extreme weather events including floods.The first line defense for protection of life and property is flood monitoring. Knowledge of floods is truly power when issuing warnings, managing infrastructure, assessing damage, and planning for the future. Information about active floods can be gleaned from satellite sensors, ground stations and sensor-webs, and harvested from social media and citizen scientists. This information is complemented by flood hazard or risk maps, and weather and climate forecasts. These flood information elements exist separately, but would be much more effective at producing actionable flood knowledge if integrated into a seamless flood monitoring network.Therefore, we propose to demonstrate a flood monitoring network that integrates flood information from satellites, sensor-webs, social media, risk maps, and weather/climate forecasts into a user-focused visualization interface (such as GIS or Google-Earth) that enables the production of actionable flood knowledge (FloodNet). We will largely focus on networking existing flood information elements available from government agencies, harvested from social media, and produced by satellite sensors. The demonstration will be performed in a historical context, focused on a few well-known recent flood events in the Mid-Atlantic region, with a vision for global real-time implementation. We will take advantage of recent advances in cloud computing, visualization tools, and spatial-temporal knowledge toolboxes in the implementation of FloodNet.The resulting flood monitoring network will guide civil protection officials, insurers and citizens as to current flood hazards and future flooding risks.

GMU-15-07 Benchmarking Timely Decision Support and Integrating Multi-Source Spatiotemporal Environmental Datasets

In the past decade, natural disasters have become more frequent. It is widely recognized that the increasing complexity of environmental problems at local, regional, and global scales need to be attacked by integrated approaches. Explosive growth in spatiotemporal data and emergence of social media make it possible and also emphasize the need for developing new and computationally efficient geospatial analytics tailored for analyzing big data. This project aims to provide decision support for life and property with maximum accuracy and minimum human intervention by leveraging near-real time integration of government satellite and model assets using HPC, virtual computing and storage environments, OGC standard protocols. Additionally, we are going to benchmark latency and science validity of end-to-end (E2E) solutions using machine-to-machine (M2M) interfaces to exploit NOAA, USGS, NASA environmental data from satellites, forecast models and social media data to generate more accurate and timely decision support information.

UCSB-15-02 Assessment and Applications of High Accuracy and Precision building data for urban areas

The company Solar Census has developed an unprecedented means by which high resolution (10cm) stereo overhead imagery is processed photogrammetrically to extraordinary levels of accuracy, and then models are applied that orthorectify and extract building footprints and roofs with unprecedented fidelity. Test acquisitions of new imagery have been supported by the Department of Energy for test areas in northern California, and new data are forthcoming for the entire state, and for the State of New York. Solar Census has an application that solves the solar equation across building roofs for identifying optimal locations for the placement of photovoltaic electric panels to generate distributed solar power. The purposes of the collaboration between Solar Census and the UCSB Geography site for the I/UCRC Center for Spatiotemporal Thinking, Computing and Applications are twofold: 1) complete an accuracy assessment to quantify the vertical and horizontal accuracy of the new data; and 2) explore innovative potential new applications of the data that could present new revenue streams and business opportunities for the data, which could potentially be available nation-wide.

Harvard-15-01 A Training-by-Crowdsourcing Approach for Place Name Extraction from Large Volumes of Scanned Maps

We propose to develop a training-by-crowdsourcing approach for automatic extraction of place names in large volumes of georeferenced scanned maps. Place names very often exist only in paper maps and have potential use both for adding semantic content and for providing search and indexing capabilities to the original scanned maps. Moreover place names can be used to strengthen existing gazetteers (place name databases), which are the foundation to support effective geotagging or georeferencing of many document and media types. The proposed solution will provide a map text extraction service and web map client interface that accesses the service. The extraction service will consume raw map images from standard WMSs, and output spatiotemporally labeled place names. The client will allow users to curate (i.e., update, delete, insert, and edit) extraction results and share the results with other users. The user curation process will be recorded and sent to the extraction service to train the underlying map processing algorithms for handling map areas where no user training has yet been done.

Harvard-15-02 Building an Open Source, Real-Time, Billion Object Spatio-Temporal Exploration Platform

There is currently no general purpose platform to support interactive queries and geospatial visualizations against datasets containing even a few million features against which queries return more than ten thousand records. To begin to address this fundamental lack of public infrastructure, we will design and build an open source platform to support search and visualization against a billion spatio-temporal features. The instance will be loaded with the latest billion geotweets (tweets which contain GPS coordinates from the originating device), and which the CGA has been harvesting since 2012. The system will run on commodity hardware and well known software. It will support queries by time, space, keyword, user name, and operating system. The platform will be capable of returning responses to complex queries in less than 2 seconds. Spatial heatmaps will be used to represent the distribution of results returned at any scale, for any number of features. Temporal histograms will be used to represent the distribution over time of results returned at any scale. The system will be capable of generating kernal density visualizations from massive collections of point measurements such as weather, pollution, or other sensor streams.

Harvard-15-03 Addressing the Search Problem for Geospatial Data

We are currently engaged in building a general purpose, open source, global registry of map service layers on servers across the web. The registry will be made available for search via a public API for anyone to use to find and bind to map layers from within any application. We are developing a basic UI that will integrate with WorldMap (open source general purpose map collaboration platform) and make registry content discoverable by time, space, and keyword, and savable and sharable online. The system will allow users to visualize the geographic distribution of search results regardless of number of layer returned by rendering heatmaps of overlapping layer footprints. All assets in the system will be map layers that can used immediately within WorldMap or within any other web or desktop mapping client. Uptime and usage statistics will be maintained for all resources and these will be used to continually improve search. Core elements of this project are currently funded by a grant from the National Endowment for the Humanities, but there are important aspects which are not supported. For example, the grant focuses on OGC and Esri image services, though there exist many other spatial assets in need of organization, including feature services, processing services, shapefiles, KML/KMZ, and other raster and vector formats. There are also important types of metadata we are not handling. We have developed basic tools for crawling the web using Hadoop and a pipeline to harvest and load results to a fast registry, but there are many ways both crawl and harvest can be improved.

Harvard-15-04 HyperMap: An Ontology-driven Platform for Interactive Big Data Geo-analytics

Sensing technology and the digital traces of human activity are providing us with ever larger spatiotemporally referenced data streams. Computing and automated analysis advances are at the same time decreasing the effort of drawing knowledge from such large data volumes. Still, there is a gap between the ability to run large batch-type data processing tasks and the interactive engagement with analysis that characterizes most research. There appear to be three principal scales (in both volume and task time) of processing tasks: asynchronous summarization of billions of records by space-time and other relevant dimensions, synchronous analysis of the summary data using statistical / functional models, and interactive visual interpretation of model results. The forward workflow is becoming more and more common, but feedback from interpretation to refine the larger-scale process steps is still most often a logistical nightmare. We propose to develop a platform that flexibly links the three stages of geo-analysis using a provenance-orchestration ontology and OGC service interface standards such as Web Processing Service (WPS). The purpose of the platform will be to provide domain experts the tools to explore – iteratively and interactively – extremely large datasets such as the CGA geo-tweet corpus without spending most of there time in performing system engineering. Researchers will be able to leverage a semantic description of an analysis workflow to drill back from interesting visual insights to the details of processing and then trigger process refinements by updating the workflow description instead of having to re-write processing codes and scripts. The HyperMap platform is envisioned to support several approaches to big data summarization. Initial design targets include factorization of unstructured data such as geo-tweets, classification of coverages, and recognition of imagery feature hierarchies.

Harvard-15-05 Terrain and Hydrology Processing of High Resolution Elevation Data Sets

Raster data sets representing elevation are being released at increasingly high resolutions. The National Elevation Dataset (NED) has gone from 30m to 10m and is now available in many states at 3m resolution. At the local and state level, LIDAR-based elevation data is available for many locations, particularly coastal areas and those subject to flooding. As horizontal resolution improves, vertical resolution and accuracy are also improving, but while higher resolution is improving the ability to leverage these data sets for modeling hydrological flow, visibility, slope and other data processing operations, the exponentially larger size of the data sets is presenting significant data processing challenges, even with professional workstation GIS tools. Under this proposal, the project team will develop and implement new algorithms for performing parallel data processing on large raster data sets. The work will leverage the open source Apache Spark and GeoTrellis projects, both based on the Scala functional programming language. It will also take advantage of other open source efforts supporting data processing at scale, including the Hadoop Distributed File System (HDFS) and indexing tools such as Cassandra and Accumulo. The results of the work will be released under a business-friendly Apache2 license, and will be aimed at supporting execution of large elevation data processing operations on clusters of virtual machines. Specific processing operations may include: viewshed, flow accumulation, flow direction, watershed delineation, sink, slope, aspect, and profiling operations. The proposed work will be synergistic with other proposed research projects, including the HyperMap effort to classify terrain types and channel areas based on large, high resolutions elevation data sets.

Harvard-15-06 Feature Classification Using Terrain and Imagery Data at Scale

Drones, micro-satellites, and other innovations promise to both lower the cost and rapidly increase the amount of available raster imagery data. Initial use of this imagery is currently focused on supporting visualization of geospatial data. However, there is substantial opportunity to provide the ability to extract features from the imagery using simple user interfaces. Feature classification from raster imagery is not a new capability, and it is supported by several commercial workstation products. In addition, contemporary techniques rely not only on the imagery itself, but also leverage elevation data to improve the accuracy of the feature classification. However, the ability to do so with large data sets through a simple browser-based user interface is a significant challenge. Under this proposal, the project team will develop and implement a prototype web-based software tool that will be able to use a combination of elevation and imagery data to enable users to extract vector polygon features with real-time processing speeds. The work will leverage the open source Apache Spark and GeoTrellis projects, both based on the Scala functional programming language. It will also take advantage of other open source efforts supporting data processing at scale, including the Hadoop Distributed File System (HDFS) and indexing tools such as Cassandra and Accumulo. The results of the work will be released under a business-friendly Apache2 license, and will be aimed at supporting execution of large data processing operations on clusters of virtual machines. The proposed work will be synergistic with other proposed research projects, including the HyperMap project to classify terrain types and channel areas based on large, high resolutions elevation data sets and the Place Name extraction from historic maps project.

GMU-14-04 Developing a spatiotemporal cloud advisory system for better selecting cloud services

We propose a web-based cloud advising system to (1) integrate heterogeneous cloud information from different providers, (2) automatically retrieve update-to-date cloud information, (3) recommend and evaluate cloud solutions according to users’ selection preferences.

GMU-14-02 Developing an open spatiotemporal analytics platform for big event data

We propose to design a visual analytical platform to systematically perform inductive pattern analysis on real-time volunteered event data. The platform will be built based on tools and methods that were developed by us in previous studies. The accomplished platform will not only enable spatiotemporal pattern exploration of big event data in the short term, but also lay the concrete foundation for using volunteered data for tasks such as urban planning in the long term.

GMU-14-06 Incorporating quality information to support spatiotemporal data and service exploration 2014-2015
Harvard-14-01 Temporal gazetteer for geotemporal information retrieval

Place names are a key part of geographic understanding and carry a full sense of changing perspective over time, but existing gazetteers do not in general represent the temporal dimension. This project will develop, populate, and implement services for a place name model that incorporates realistic complexity in the temporal, spatial, and language elements that form a place name. Additional tools will be developed to conflate and reconcile place name evidence from authoritative, documentary, and social sources.

Harvard-14-04 Cartographic ontology for semantic map annotation

Map annotation produces highly-relevant, high-value information, whose utility however often critically depends on semantic interoperability. Achieving that requires an ontology-based, semantic web, linked open data approach. We will develop a key missing ingredient, the cartographic annotation ontology, to characterize the complex structures and rich visual, symbolic, and geospatial languages that maps use to represent geographic information.

Harvard-14-05 A Paleo-Event Ontology for Characterizing and Publishing Irish Historical and Fossil Climate Data

Integration of both Big and Little spatio-temporal data from different scientific domains is vital for validating climate models, as a single volcanic eruption, for example, can have a great effect. Yet observation of deep time events, without deep-time observers, means we must discern paleo-events through observation of fossilized, event-proxy, features. Using medieval monastic records, tree-ring data, ice core features and volcanic eruption phenomena to inform our efforts, we will develop a deep-time climate event observation ontology to characterize the nature and relationships of the data.

Harvard-14-06 Emotional City – measuring, analyzing and visualizing citizens’ emotions for urban planning in smart cities

Emotional City contributes provides a human-centered approach for extracting contextual emotional information from technical and human sensor data. The methodology used in this project consists of four steps: 1) detecting emotions using wristband sensors, 2) “ground-truthing” these measurements using a People as Sensors smartphone app, 3) extracting emotion information from crowdsourced data like Twitter, 4) correlating the measured and extracted emotions. Finally, the emotion information is mapped and fed back into urban management for decision support and for evaluating ongoing planning processes.

UCSB-14-01 Dismounted navigationIndoor Mapping Using Multi-Sensor Point Clouds 2014-2015
UCSB-14-02 Indoor Mapping Using Multi-Sensor Point Clouds

Develop and evaluate method for creating 3D indoor maps using point clouds generated by multiple sensor platforms.

UCSB-14-03 Pattern driven Exploratory Interaction with Big Geo Data 2014-2015