Google Vision is an image detection machine learning algorithm.
For a week, I recorded myself from my laptop. Every hour, a screenshot of the recording was taken of the recording and sent to Google Vision. Google Vision would analyze the image and output labels it would detect. These labels would be about my expression, general content (couch, hair, etc), and even if there was any content that was considered "adult", "medical", racy", or "spoof".
At every hour, I would also write down exactly what I was doing so I could compare against what Google Vision interpreted that I was doing.
While Google Vision outputs text labels, it is an image detection model so it was trained on images-- not text. So when Google Vision analyzes one of the screenshots, it is really looking at other images to describe what is seeing.
As a result, I choose to turn the "labels" (things it detected) into a collage of images, so that you can visualize what Google Vision was seeing/detecting.
These images were pulled from Google image search, since that is the data Google Vision is trained on.
Hover over each collage to discover what I actually was doing in that moment.