About Me Research Experiences Awards Contact Certification Publications

LocLok: Protecting Location Privacy in Smartphones

Concerns on location privacy frequently arise with the rapid development of GPS enabled devices and location-based applications. However, there is no offically privacy-preserving cell phone apps which can protect users' location privacy while enabling the GPS based services. In academia, the popular method is to replace a position of latitude and longitude with a randomly generated area, called spatial cloaking. For more details, please go to http://forum.loclok.com.

Click here to view a demo system.

Our technique can rigorously protect location privacy even when the attacker is acquainted with the user. For example, the attacker knows exactly the moving habit of the user and the historically visited places of the user. Our technique can still prove the current location privacy is protected by the state-of-art differential privacy. Furthermore, we prove that our technique is the optimal solution to satisfy the guarantee of differential privacy.

I show an example as follows. Figure 1 shows a real trajectory on a real map where the two axes represent longitude and latitude. A user travaled in 500 timestamps. The corresponding grid map of the same trajectory is shown in Figure 2. Then Figure 3 shows the released trajecotry of existing method; while Figure 4 is the released trajectory of our method. Obviously our method provides more utility, which is also verified in Figure 5 where the distances between the true locations and released locations are demonstrated. LM is existing method of Laplace Mechanism; PIM is our method of Planar Isotropic Mechanism. The formal proof and experiment settings can be found in our paper.

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5

To the top.

DPCube: Releasing Differentially Private Data Cubes for Health Information

As we all know that health data is highly private. For example, a patient's disease should never be exposed to anyone without the patient's consent. On the other hand, health data is very useful for clinical treatment improvement, or medical research institute. Then the question is how to release health data so that it can protect the sensitive part while releasing the useful part. Or generally, if we have a database containing both private and useful information, how to use such data without breaching any privacy?

DPCube is a practical and rigorous method to tackle this. It satisfies the state-of-art differential privacy guarantee for privacy preservation. On the other hand, it provides useful information of the data. The brief components of DPCube is desribed as follows.

We also have the Matlab code to implement DPCube, which is available upon request. Following figure shows the interface of a data releasing scenario.

To the top.

Adaptive Differentially Private Data Release

I further dig into the theory of differential privacy and investigate a general data releasing method with differential privacy. This method is still under development. Click here to check an introduction of this project, which is supported by NSF.

To the top.

A RSA Cryptosystem

As we know, RSA is the first and practically accepted data encryption method. It is widely used in digital signature, identification verification and secured communications and transission etc. The key components of RSA include:

n: modulus;

e: public key exponent;

d: private key exponent;

p: initial prime number;

q: initial prime number;

dmp1: e*dmp1 = 1 (mod (p-1))

dmq1: e*dmq1 = 1 (mod (q-1))

iqmp: q*iqmp = 1 (mod p )

where n and e are public keys, n and d are private keys. In reality, public keys are usually used in communication cryptosystem; private keys are usually used in ditigal signatures.

In this system, OpenSSL is adopted to achieve the cryption. For example, the abbreviation of my Chinese name is "xyh". To encrypt it, it takes less than 1ms as follows. The ciphertext of "xyh" is 0xc3609dc4.

To decrypt the ciphertext "0xc3609dc4" back, it takes about 31ms. And the decrypted text is (of course) "xyh".

In this system, we can also achieve signature verification. To do this, let public keys be KU={e,n}, private keys be KR={d,p,q}. After chosen these parameters, we divide the original file into blocks of size m where 0<m<n.

To sign a file, for each block P, compute: y=Sig(x)=P^d (mod n)

To verify a signature, compute: x'=y^e (mod n)?=x

To download the code, please click here. Note that you will need to install OpenSSL to run the program.

To the top.

A P2P Video Sharing System

Many of us watch online videos everyday, either on Youtube or other video providers. However, the technique behind the video sharing is not hard. Here I show a simple video sharing system with the original code.

In my system, there are three major parts, a tracker, a super peer and many peers. The tracker is a MySQL database service. It contains the all the information about the availabe videos. If a user, which is a peer, wants to browse the videos of our system, it sends a query request to the tracker, which returns all the channels to the peer. Each of the channel has a super peer, containing the detailed information of the channel. If a user selected a channel, then it will build a TCP connection with the super peer. Then the super peer accepts the TCP connection and add the user to the neighbors of audiences.

In the TCP connection, the video content is transformed in the unit of data packets. As shown in the following figure, a peer has a data buffer area to contain these packets.

The classes used in this program is summarized as follows.

To download the original code written in C++, please click the following: super peer, tracker01, tracker01.client and tracker02.

To the top.

A Content based Video Information Retrieval System

This is a system that can find the video information using a video query. That is: given two videos, say video A and B, we would like to retrieve the similar video segement of A from the other video B. For example, we have the following two videos, showing a ping-pong game.

You may need to update your browser if above videos are not shown.

What features can we use to represent a video segement or frame from these video? I use player position detection, table position detection and ball position detection to summarize a frame. The framework is shown in the following picture.

To detect the table position, I transformed the RGB color to HSV representation. Then I use the H value, which is hue, to find the table. As shown in the left bottom picture, the table is successfully detected.

To find the player's position, I noticed that between two consecutive frames, the players' move has the largest vector change. Thus I use the CD (change detection) between two frames to find the two players. After morphological filtering and removing the noise, the players can be detected too. In the middle bottom and right bottom pictures, we can see that the two player's position. The solid lines are the partition of the frame.

The most difficult part is to detect the position of the ball, which is too small and moving too fast. I use both the color feature and moving featues to find the ball. First, I transformed the RGB color to YCrCb representation, then use the Y value to detect the change between two consecutive frames. Next I also use the shape feature (because a ball should be a circle in 2D picture) to make sure the ball's position is right. The left top picture shows the ball's position in red circle.

Finally, combing the three features, table, players and ball, together, we can find any similar video segment in B given a video frame from video A. For example, if the query is the following frame in A, then we can find the following two results in B, with the red circle indicating the ball's position.

The program was written with OpenCV, and available upon request.

To the top.

My Previous Work in China

I worked at IVO in China from 2005 to 2007. The company is a manufacturer of computer monitor and other displayers, with thousands of mechines kept running day and night. My job is to investigate the manufacturing status, like the input/output, yield, equipment efficiency, from millions of operation records generated from a MES (Manufacturing Execution System), which controls and tracks thousands of operations from thousands of mechines.

To deal with such a big data, I built a big database server, which was used to analyze the factory efficiency and discover the potential patterns leading to a low thuoughput. I ran hundreds of SQL queries and stored procedures to analyze and diagnose the factory performance everyday. For confidentiality, I cannot discuss the details here. To get a glimpse, the following figure shows an OEE (Overall Equipment Efficiency) of a workshop in the factory at one day. Note that I remove some sensitive keywords.

Click here (right click "save as") to check a snippet of the SQL queries, which was used to get the input/output of a workshop in the factory. And again for confidentiality reason, I replaced the keywords with "XX".

To the top.

A Facility Layout Planning System

This software was developed in 2005 when I was still a college student, and very young (^_^). Some components were writen in Chinese. Apology in advance.

Facility layout planning mainly focuses on investigating factory layout, plant layout and material handlings. The goal is to achieve a highly efficient production system by organizing all manufacturing units in their perfect positions. Systematic Layout Planning (SLP), proposed by Richard Muther, who's a famous industrial engineer, provides a complete set of analysis methods with rigorous logic. Following the basic principles of SLP, this software serves as a Decision Support System of facility layout program, written in C++.

The following figure shows a use case diagram.

This software allows users to input the manufacuring details of a factory, like raw materials, their topology, quantity and so on. For example, a manufacuring procedure, including all the assembling parts, is shown below.

Following the analysis process of SLP, it implements Production-Quantity(P-Q) analysis, logistics analysis, work units correlation analysis in turn, finally works out the position correlation chart and the area correlation chart. The following two figures show a logistics and correlation analysis respectively.

Users can adjust the output by experience, design principles and the real condition to support the decision making of factory plane layout. For example, two proposed layouts are shown below.

To download the code, please click here. The class diagram used in this software is also shown below.

To the top.

Machine Learning Certificate

I obtained the Machine Learning certificate from Stanford University.

To the top.

I-Corps Innovation Program

I am very lucky to participate the I-Corps program supported by National Science Foundation in 2016. Although my startup did not work after exploring the business opportunities, I am grateful to the I-Corps program, shown in the following image.

I-Corps Image

To the top.

Yonghui (Yohu) Xiao


Hi, I am a Google engineer since May 2017. Before joining Google, I was a Ph.D. student in CS department at Emory University. Prior to Emory, I received 3 bachelor degrees at Xi'an Jiaotong University in 2005. After graduation, I worked at IVO in China. In 2008, I became a graduate student in CS department at Tsinghua University, where I was lucky to join the collaborative Tsinghua-Emory research project. After 3 years researching work, I felt that I should focus on this prospective area and follow the passion of solving real-world problems. In 2011, with the kind help of Li, I became a Ph.D. student at Emory University. I interned at Samsung Research America (SRA) in summer 2014.

Click here to download my CV.

Profile Image

My Research

My research mainly focuses on data privacy protection. For example, do you worry if your location data gets stolen while using Yelp to find the nearest restaurants, or your browsing history was exposed while searching something on Google? While the concerns about data privacy arise frequently, the technique of privacy preservation still falls behind. The trick is that you have to provide your information to get the service. For example, you must give your location to Yelp in order to get the nearest restaurants. So how can we set the balance of privacy and utility? While the silver lining of this problem, called differential privacy, is still under debate in academia, practical privacy-preserving technique still has a long way to go.

I am happy to work on this area, which has enormous opportunities and so many possibilities. I believe doing is much more important than talking, which is why I am also developing an iPhone app ("LocLok") with the state-of-art technique in this area (^_^).

Professional Service


I love Math, especially vector space and matrix computations. I am fascinated by black hole theory, string theory and 11-dimensional universe (or maybe multiverse). I like to play guitar, but not very good at it. I like to jog in gym, which helps me calm down and relax.

My Publications


  • Amazon graduate research symposium, 2017
  • IEEE S&P student PC travel award, 2017
  • NSF I-Corps award as entrepreneur lead, 2016
  • CCS travel award, 2015.
  • NSF ICDE scholarship, 2012.
  • Ph.D. Fellowship, Laney Graduate School of Emory University, 2011-Present
  • Scholarship of Foxconn at Tsinghua University, Beijing, China 2009.

Contact Information

Welcome to contatct me if you have any concerns (^_^).
Please find my conatact information below.

To Contact me

Email: yhandxiao AT gmail dot com

Office: Mountain View office of Google

Phone: 404-772-0x1c2d9401 where the last four digits were encrypted by RSA. You are welcome to call me if you can crack those last four digits.