Jia is a PhD student at the Computer Science department, School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, where he also is a member of Data Systems Lab. Jia's research focuses on database systems and geospatial data management. In particular, he worked on distributed data management systems, database indexing, data visualization, code generation with JIT execution. He is the main contributor of several open-sourced research projects such as GeoSpark, a cluster computing framework for processing big spatial data.
My Curriculum Vitae is here: Jia Yu CV
Arizona State UniversityTempe, Arizona, USA [SEPTEMBER 2013 - PRESENT]
Ph.D Student, COMPUTER SCIENCE
Northwest A & F UniversityYangling, Shaanxi, PRC [SEPTEMBER 2009 - JULY 2013]
BACHELOR OF ENGINEERING, SOFTWARE ENGINEERING Outstanding graduate (Top 3% students)
GeoSpark is an in-memory cluster computing system for processing large-scale spatial data. It extends Apache Spark to support spatial data types and operations. For more details, please visit the GeoSpark Project Website. Source code is available at GeoSpark GitHub Repository. GeoSpark is listed as Infrastructure Project on Apache Spark Official Third Party Project Page. GeoSpark has over 100K page visits and over 20K downloads. GeoSpark users and contributors are from Apple, Facebook, Uber and numerous startup companies. Watch this demo video from Gyana. Gyana is a British startup company that uses GeoSpark in its location intelligence dashboard.
Hippo: A Fast, yet Scalable Database Indexing Approach[Ongoing Project]
In contrast to existing tree index structures, Hippo avoids storing a pointer to each tuple in the indexed table to reduce the storage space occupied by the index. Hippo only stores pointers to disk pages that represent the indexed database table and maintains summaries for the pointed pages. The summaries are brief histograms which represent the data distribution of one or more pages. For more details, please reach Hippo Project Website, Hippo GitHub Repository and watch Hippo Video Demonstration (on YouTube), Hippo VLDB Paper (Pre)Presentation.
Geospatial data management in Apache Spark: A tutorial (Tutorial)[PDF]
Jia Yu, Mohamed Sarwat. In Proceedings of the IEEE International Conference on Data Engineering, ICDE 2019, Macau, China, April 2019
Spatial Data Management in Apache Spark: The GeoSpark Perspective and Beyond (Research paper)[PDF]
Jia Yu, Zongsi Zhang, Mohamed Sarwat. Geoinformatica Journal, 2018
GeoSparkViz: A Scalable Geospatial Data Visualization Framework in the Apache Spark Ecosystem (Research paper)[PDF]
Jia Yu, Zongsi Zhang, Mohamed Sarwat. In Proceedings of the International Conference on Scientific and Statistical Database Management, SSDBM 2018, Bolzano-Bozen, Italy July 2018
Geospatial Visual Analytics Belongs to Database Systems: The BABYLON approach (Extended abstract) [PDF]With $500 NSF Travel Grant and $500 Microsoft Travel Grant
3rd place of ACM SIGSPATIAL Student Research Competition (graduate category)
[$200 cash award, a bronze medal and a certificate][photo with other winners][photo with my PhD advisor]
Jia Yu. In Proceedings of ACM International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL GIS 2017, Redondo Beach, CA, USA November 2017
Indexing the Pick-up and Drop-off Locations of NYC Taxi Trips in PostgreSQL – Lessons from the Road (Research paper)[PDF]
Jia Yu, Mohamed Sarwat. In Proceedings of the International Symposium on Spatial and Temporal Databases, SSTD 2017, Washington D.C., USA August 2017
Hippo in Action: Scalable Indexing of a Billion New York City Taxi Trips and Beyond (DEMO paper)[PDF]With $1000 NSF Travel Grant
Jia Yu, Raha Moraffah, Mohamed Sarwat. In Proceedings of the IEEE International Conference on Data Engineering, ICDE 2017, San Diego, CA, USA, April 2017
Two Birds, One Stone: A Fast, yet Lightweight, Indexing Scheme for Modern Database Systems (Research paper)[PDF]
Jia Yu, Mohamed Sarwat. In Proceedings of the 43rd International Conference on Very Large Data Bases, VLDB 2017, Munich, Germany, August 2017
A Demonstration of GeoSpark: A Cluster Computing Framework for Processing Big Spatial Data (DEMO paper)[PDF]With €500 Travel Grant
Jia Yu, Jinxuan Wu, Mohamed Sarwat. In Proceedings of the IEEE International Conference on Data Engineering, ICDE 2016, Helsinki, Finland, May 2016
GeoSpark: A Cluster Computing Framework for Processing Large-Scale Spatial Data (Short paper)[PDF]With $500 NSF Travel Grant
Jia Yu, Jinxuan Wu, Mohamed Sarwat. In Proceedings of ACM International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL GIS 2015, Seattle, WA, USA, November 2015
IBM Research - Almaden San Jose, California, USA [MAY 2018 - AUGUST 2018]
Research Intern, Exploratory Database Group
Mentor: Vijayshankar Raman; Manager: Fatma Ozcan, Berthold Reinwald
- I worked on fast scan on compressed database tables using LLVM. In particular, I explored the code generation issues on compressed database tables and implemented a preliminary code generator for IBM HTAP system, with JIT execution using LLVM.
- I also collaborated with Yingjun Wu, Ronald Barber, Richard Sidle, and Yuanyuan Tian. This eventually resulted in another research paper which is under review.
- 3rd place in IBM Research-Almaden AI Hackathon. See [our photo] with Almaden Lab Director, IBM VP, Dr. Jeff Welser: from 2nd Left to 2nd Right, Vijayshankar Raman, Sicong Liu, David C Martin, Jeff Welser, Jia Yu, Chuan Lei, Yingjun Wu.
Apple Cupertino, California, USA [JUNE 2016 - AUGUST 2016]
Mentor: Huang-Hsiang Cheng; Manager: Alex Radeski
I worked on deploying and improving distributed computing frameworks and resource management systems such as Apache Spark and Apache Mesos. My work is expected to assist large-scale spatial analysis.
ViviBowl Co.Ltd Tempe, Arizona, USA [SEPT 2014 - SEPT 2015]
On-campus food ordering and delivering service; hundreds of Daily Active Users in Arizona State University Tempe campus
- Distributed data management
- Database management systems
- Geographic Information System (GIS)
- Apache Spark, PostgreSQL kernel
- My GitHub repositories
- Readings in Database Systems, by Peter Bailis, Joseph M. Hellerstein, Michael Stonebraker
- Database reading list maintained by Reynold Xin
- I contribute to JTS Topology Suite, a geometry library used in many famous softwares, such as ESRI, H2 DBMS, PostGIS-JDBC, Spark Notebook