Work

Below are highlights of my course projects completed during my study at OSU.

TransMapper, a light web mapping application targeting ‘big’ census data.

  • Utilized 6 Javascript libraries: d3.v2.min.js , cartogram.js , natural.js , topojson.js , jquery-1.10.2.min.js and colorbrewer.js.
  • TransMapper aims to transform the way of mapping the census by providing a cartogram-based visualizer and a simple natural language interface for filtering attributes.

Genome Sequence Clustering Analysis

  • Engineered a streamlined pipeline for DNA sequence analysis using natural language processing and machine learning algorithms.
  • Worked with biologists to develop methods to transform DNA sequences into valid protein sequences.
  • Implemented annotators using UIMA and persist results in CAS.
  • Developed dynamic programming algorithms (Needleman-Wunsch and Hirschberg algorithm) to align large number of protein sequences to detect genetic similarity.
  • Implemented a k-means cluster to classify massive protein sequences.

Geographic Question Answering System, Ph.D. dissertation

  • Natural language analysis of geographic question corpus.
  • Experimented on various classic machine learning techniques for classifying questions.
  • Proposed a new classification method based on dynamic programming and voting algorithm.
  • Built spatial ontologies to disambiguate linguistic geographic terms.
  • Used backend PostGIS for answering advanced geographical questions.

Web-based GIS for Political Redistricting, Master’s thesis project

  • Proposed an open source framework for implementing a Web-based GIS for public participatory political redistricting.
  • Developed a redistricting algorithm to devise districts based on multiple census levels and to
    merge units into districts through geometric processing.
  • Aimed to promote the transparency of political decision making through using the system.

High Performance Computing Experiments on OSC Oakley Cluster

  • Algorithmically determined the validity of loop permutation (interchange), unrolling, and tiling (loop split and loop permutation) based on data dependency.
  • Improved matrix computation performance by fitting data into cache size.
  • Improved matrix computation performance to about 20-30 times through loop transformation using parallelism in OpenMP.
  • Improved matrix computation performance to about 200-300 times through shared memory and tiling using CUDA.
  • Implemented an OpenMPI program to find all prime numbers less than a billion with a minute on multi-cores distributed environment.

Taxi Map Android App

  • Implemented the backend functions of an Android app for hailing taxis.
    Featured spatial search and attribute search functions to improve usability.
  • Implemented functions to update drivers’ locations based on GPS location every 10 seconds.
  • Improved response time by implementing AsyncTask and threads in Android.
  • Internationalization support in both English and Chinese.

An Expert System for Automatically Judging Resumes

  • Implemented a NLP component to automatically judge HTML-based resumes.
  • Defined filters for selecting criteria like GPA, years of experience, skill keywords, and degree
    requirements.
  • Used CLIPS, a rule-based programming language, to make inference and score.

Spatially Augmented Twitter Search

  • Implemented an enterprise Java-based web application to extend Twitter search capabilities.
  • Featured two major use cases: search tweets by keywords and search tweets by locations.
  • Downloaded real-time tweets into database based on Twitter search API.
  • Divided the entire United States into 2500 grids where tweets were retrieved, with each grid
    equal to an estimated area of 4000 square km.
  • Implemented a NLP component to enable semantic rather than key word-based search.

Paper Abstraction System

  • Developed a natural language program to automatically summarize paper abstracts.
  • Predicted and extracted essential logical sections from a paper abstract: background, objective, data, methods, results and conclusion, based on indicator words and run on patterns.
  • Could automatically generate a brief literature review using the program.
  • Developed probabilistic model to split sections and a machine learning component to improve classification accuracy from ~50% to ~80%.

Study The Use of GIS in 30 Academic Disciplines Using Bayesian Hierarchical Linear Model and Literature Data

  • Collected 1113 literature records including the key word “GIS” in them.
  • Implemented a Bayesian hierarchical model to capture two levels of covariates: the reference
    level (observations) and subject level (clusters).
  • Built a Bayesian linear regression model to analyze the differences in the times cited across
    disciplines and between disciplines.
  • Results showed that among the top four disciplines that utilized GIS the field of environmental science had the highest times cited for publications.
  • By comparison, the publications from the field of geography produced a relatively lower impact.

Restaurant Simulation

  • Used monitors and threads in Java to simulate running a restaurant.
  • Simulated cooks to take orders and prepare orders using a limited number of machines.
  • Realized the objective of minimizing the average waiting time of all customers.

Phong Illumination based on Ray Tracing

  • Implemented a ray tracing algorithm to illuminate 3D objects.
  • Simulated the physical movement of photons based on reflection, refraction and diffusion
    physical properties.
  • Improved the performance by implementing spatial indexing and used super-sampling for antialiasing.

My GitHub

Powered by WordPress. Designed by WooThemes