Below are highlights of my course projects completed during my study at OSU.
TransMapper, a light web mapping application targeting ‘big’ census data.
- TransMapper aims to transform the way of mapping the census by providing a cartogram-based visualizer and a simple natural language interface for filtering attributes.
Genome Sequence Clustering Analysis
- Engineered a streamlined pipeline for DNA sequence analysis using natural language processing and machine learning algorithms.
- Worked with biologists to develop methods to transform DNA sequences into valid protein sequences.
- Implemented annotators using UIMA and persist results in CAS.
- Developed dynamic programming algorithms (Needleman-Wunsch and Hirschberg algorithm) to align large number of protein sequences to detect genetic similarity.
- Implemented a k-means cluster to classify massive protein sequences.
Geographic Question Answering System, Ph.D. dissertation
- Natural language analysis of geographic question corpus.
- Experimented on various classic machine learning techniques for classifying questions.
- Proposed a new classification method based on dynamic programming and voting algorithm.
- Built spatial ontologies to disambiguate linguistic geographic terms.
- Used backend PostGIS for answering advanced geographical questions.
Web-based GIS for Political Redistricting, Master’s thesis project
- Proposed an open source framework for implementing a Web-based GIS for public participatory political redistricting.
- Developed a redistricting algorithm to devise districts based on multiple census levels and to
merge units into districts through geometric processing.
- Aimed to promote the transparency of political decision making through using the system.
High Performance Computing Experiments on OSC Oakley Cluster
- Algorithmically determined the validity of loop permutation (interchange), unrolling, and tiling (loop split and loop permutation) based on data dependency.
- Improved matrix computation performance by fitting data into cache size.
- Improved matrix computation performance to about 20-30 times through loop transformation using parallelism in OpenMP.
- Improved matrix computation performance to about 200-300 times through shared memory and tiling using CUDA.
- Implemented an OpenMPI program to find all prime numbers less than a billion with a minute on multi-cores distributed environment.
Taxi Map Android App
- Implemented the backend functions of an Android app for hailing taxis.
Featured spatial search and attribute search functions to improve usability.
- Implemented functions to update drivers’ locations based on GPS location every 10 seconds.
- Improved response time by implementing AsyncTask and threads in Android.
- Internationalization support in both English and Chinese.
An Expert System for Automatically Judging Resumes
- Implemented a NLP component to automatically judge HTML-based resumes.
- Defined filters for selecting criteria like GPA, years of experience, skill keywords, and degree
- Used CLIPS, a rule-based programming language, to make inference and score.
Spatially Augmented Twitter Search
- Implemented an enterprise Java-based web application to extend Twitter search capabilities.
- Featured two major use cases: search tweets by keywords and search tweets by locations.
- Downloaded real-time tweets into database based on Twitter search API.
- Divided the entire United States into 2500 grids where tweets were retrieved, with each grid
equal to an estimated area of 4000 square km.
- Implemented a NLP component to enable semantic rather than key word-based search.
Paper Abstraction System
- Developed a natural language program to automatically summarize paper abstracts.
- Predicted and extracted essential logical sections from a paper abstract: background, objective, data, methods, results and conclusion, based on indicator words and run on patterns.
- Could automatically generate a brief literature review using the program.
- Developed probabilistic model to split sections and a machine learning component to improve classification accuracy from ~50% to ~80%.
Study The Use of GIS in 30 Academic Disciplines Using Bayesian Hierarchical Linear Model and Literature Data
- Collected 1113 literature records including the key word “GIS” in them.
- Implemented a Bayesian hierarchical model to capture two levels of covariates: the reference
level (observations) and subject level (clusters).
- Built a Bayesian linear regression model to analyze the differences in the times cited across
disciplines and between disciplines.
- Results showed that among the top four disciplines that utilized GIS the field of environmental science had the highest times cited for publications.
- By comparison, the publications from the field of geography produced a relatively lower impact.
- Used monitors and threads in Java to simulate running a restaurant.
- Simulated cooks to take orders and prepare orders using a limited number of machines.
- Realized the objective of minimizing the average waiting time of all customers.
Phong Illumination based on Ray Tracing
- Implemented a ray tracing algorithm to illuminate 3D objects.
- Simulated the physical movement of photons based on reflection, refraction and diffusion
- Improved the performance by implementing spatial indexing and used super-sampling for antialiasing.