Building a Personalized, Auto-Calibrating Eye Tracker from User Interactions

July 24, 2019 posted by

morning I like to welcome you all to this morning session about I gaze and let us welcome our first speaker Michael Wong okay thank you thank you all for coming here and thank you for dinner for the introduction and our paper is about building a personalized also calibrating eye tracker from the user interactions and I'm Michael last year PhD student and this work was done in the KY lab at Hong Kong Polytechnic University and these are my dear colleagues okay I don't think anybody here in this an introduction into the eye tracking or what it is useful for and these are some examples some of the most accurate eye tracker nowadays are wearable device some are screen based they can be very precise and accurate but often too expensive so it is not likely that the average customer would own one of these therefore currently the gaze research is mostly used for the lab studies our motivation for this paper is building and egg crates webcam bass eye tracker if we can do that the gaze aware studies can take place in situ and the cage aware application will be possible for everybody but what is our key challenges the biggest challenger here is that data the data from the specialized devices extremely different from that of the webcam this is a sample image captured by an infrared camera the important I feature here are so clear that it's much easier to do the accurate I case tracking and this estimation in contrast the sample from webcam is of much lower quality and is much noisier and in will use situation we also need to handle the issues of occlusions refractions and have post variance so basically if you want to build a gauge tracker from the webcam data we need to be able to handle different shapes textures angles with different head orientations had to screen distances and under different illumination conditions that means we need a huge amount of data together well performing eye tracker the big question is that how can we get such amount of data in a feasible manner without requiring the user to go through a lot of calibration so this is what we thought nowaday web camera everywhere and practically every computer system comes with a camera people spend a lot of time on the computers and this means there is a lot of interesting data that generated every single day based on our observation in a pilot study there seemed to be a strong correlation between the gaze and interaction kill ie the location of typing camera or the mouse cursor this brings us to the question can we use the mouse clicks and key presses that the computer collects previous work if some give us some comforting information on the supporting site would an end his colleague report a strong gaze cursor alignment during the active Mouse usage such as using the mouse to follow the eye movement or to mock a particular result sukhino and his colleague with an eye tracker using the mouse click based on the assumption that the user are looking at ready click and the – sai Kwong and his colleagues before that yes there is a certain correlation between gaze and cursor but substantial variation also exists depending on the time spending on the page personal browsing habit and the current cursor behavior similarly labeling in this colleague report that gaze does lead the mouse but the cage mouse coordination is very complex in the real life scenarios and this suggests that if you want to make use of the interaction data to do an eye tracker we need to understand when the mouse and gaze or the interaction and the case actually aligned it spatially and temporally previous work shows and also common sense how as they may be many factors that complicating the assumption that the user are looking at where they click or type the question is how how do this factor affect gaze interaction consistency we ran an experiment to find out we recorded 31 subjects their gaze position where we called it using an Toby eye tracker and their facial image by a webcam and we asked them to perform some everyday interaction tasks such as clicking on long links short links and photo links yes I intended to stay on this page for a little bit longer be talking because it's just cute and we also asked them to highlight the text with mouse tracks and type what we want to find is the probability of the distance between the gaze interaction location which means I mean the distance between the gaze location and the interactional killed that is less less than 60 pixels which mean they are well aligned it so this is a moment of the interaction happens the x-axis shows the time in seconds preceding the event and we can see that the likelihood of the gaze interaction alignment Peaks at different moments across different activities this means that the user is not always looking at ready click or type but this is only the overall probability mass function let's look into the distribution of all the instances again excesses of the time preceding the interaction event y-axis shows that distance between the gaze and interaction the green line shows the median the blue region shows the data from 25th through 75th percentile and the gray region from first and 99% higher the red points indicate the outliers you can see that the gaze interaction distance at the event moment can be very large reaching 1000 pixel here and that is basically 2/3 of our screen width so this result tells us that although the user are generally looking at ready Clicquot type there is much variation exists across individual event but why does this happen this simple Cataumet illustrate a possibility by the time that the user actually clicked on a target their gaze may already have been dual by something else and the point the panic monster in this case so if we are going to use the interaction kill for I guess learning we need to identify the moment of the highest gaze interaction alignment and to look for this alignment the key is to identify the fixation and smooth pursuit which suggests that the user is focusing on some target just before the interaction happens this is an example the XS it shows the time the y axis shows the eye position as time goes on we see there is a 1 fixation once I cut to another location then there is a small pursue when the user is probably reading a line followed by another C chord and this is when the mouse event happens and this is the moment of the highest gaze and interaction alignment therefore this is what we want to identify the period of the small pursuit in this case and we need to do this automatically and from image the webcam images we use a behavior informative validation to look for the temporal reliability of the training instances what we want to do is to find the stationary periods that correspond to the fixation or smooth pursuit with smooth pursuits in a three-second windows preceding an event first we consider each feature separately and identify the candidate sequence with only small temporal change then we look at all the feature together if there is a common candidate sequence that span all the course features then that is our stationary period the point of the high test gaze interaction alignment is defined as the last frame of the stationary period and we construct the gaze feature vector at the moment of the alignment our behavior informed the validation ensured that the users gaze is stable but what if the user isn't even looking at the interaction kill at all here is an example the user is watching a YouTube video and the hint map shows the gaze location the mouse is positioned on a post button and try to capture a scream obviously the mouse target is in the case point but our behavior informed a validation doesn't know that we're therefore at a data driven validation to adjust this situation by verifying the spatial reliability of the training instances using the feature vector extracted by the behavior informative validation you can act we can calculate or estimate this predicted gaze point and if this gaze point is too far from the interaction event then this instance will be classified as potential bad data which will not be used to update the next gauge model so how well does it work we ran an experiment using the Tobii eye tracker as gold standard the yellow dashed line here shows the true gauge trajectory the red dot corresponds to the location of gaze at a click moment so this is the performance we get with the naive method the blue dog correspond to the shortest distance between the gaze and click so this is the absolute lower bound the best performance that we could possible I get the green dot correspond to the moment of the highest gaze interaction alignment as identified by our validation mechanism we can see that our method that's significantly better than simply using the night if people are looking at ready click or type assumption but how about does it work in a real used context we recruited ten subject for focus study they were asked to work on several application that will generate diverse range of interaction behaviors such as browsing coding writing drawing and gaming our gaze model is a random forest and this is the performance of our gaze model if we using that if people are looking at where they click assumption the gauge model is updated and retrained it every 150 interaction event the figures here on the left side shows the correlation coefficient between the ground truth and the predicted XY gaze coordinates and the figure on the right side shows the visual error in degrees from this results we can conclude that the chaining on the data at the event moment or meaning that data extracted using the naive method fails to improve the performance and this is the performance of our gaze model if we fit it only the data that has been validated by our process in other word this gaze model is trained using the allowing the moment we can see that the correlation is a lot higher and the visual error is much lower even better we can see that the correlation goes up and the visual error decreases over time meaning that as we get more data the better the model is our result is competitive with those of the best performance system but in either our matter is completely non-intrusive to the user so for future works we plan to take into account of the impacts of human effect we also plan to investigate a method to accelerate the case learning process as this is this is currently our major limitation in summary we come to a user's behavior study to investigate a gaze interaction consistency of course different interaction and we propose a non intrusive adaptive interaction informed a method that to identify the gaze interaction alignment from the daily human-computer interactions we also show that our method is effective across diverse interactive tasks I think that's it thank you for your attention and I'm happy to take any questions I am Merida Vidal from family claps thank you very much for the presentation I particularly enjoyed the copious amounts of cats I'm just wondering if you have ideas of applications of what to do once you have the case of the user because you're using you know mouse and keyboard so what do you plan on using the calibrated case for we want to we want to have a better understanding about the human behavior so we'll try to investigate this channel to understand the ideas or maybe the facial expressions only using the off-the-shelf devices such Resta as you say the webcam and keyboard mouse so we hope that our epic program can be distributed in a very large scale and then we can understand people in interact with the computer in different tasks so for usability per position yeah thank you for your question Google somewhat following the last question I view this as one of the major providers in in the eye-tracking industry I wonder from your experience given what you have achieved how close it is to apply such techniques to something like magic pointing for real yeah thank you for your question that is exactly what we hope to do for the next try because the writer is also very interesting and useful especially when there is multiple display and our current limitation of applying this in the previous situation is the process that we build up this user dependent model because as our data shows it require around like 1050 data point a trainer well-performing model and this data point user come from the interaction data for example a mouse key and key press for key press it would be okay because the key becomes a lot faster but most of time the reliability of keep of mouse click is rather than key press so to collect enough good data it takes a long time and that is our current biggest limitation compared to the commercial products because for example using it infrared camera we probably just only need a few seconds to do the calibration so can you default your training to some group previously trained data before you do the adaptogen to individual so you still have a good reasonable good start oh that's fun and Polly not what works we have very interested in how to make use of the existing data from different user the current is this papers focus is only the user dependent model and we are actually working on something based on that kind of idea to accelerate the process yeah thank you thank you billion Texas State University actually we have a we also have Toby IX in our laboratory and I like this with your plot of the distance of a visual angle between the target at one of the last button right top corner now yeah so it's 2.5 degrees in day and but Toby IX has a accuracy something about one degree and the precision something about half that degree how this data will what they take in a account in your research you you mean our met the performance of our method compared to the Toby your that can be induced by the equipment by two bags as you may when we use Tobias grant through the noise that introduced to our analysis yet that is one of the concert that we run this experiment thank you for your question we actually uh also concerned about that so in in the future study we probably just spend more time in evaluation part we probably require the subjects to gaze at the calibration point using the most traditional way to do that evaluation it just may be inducing the confidence interval so it can help you in this case yeah thank you thank you my question this paper also received best paper Awards so congratulations

No Comments

Leave a Comment

Your email address will not be published. Required fields are marked *