Disentangled Variational Autoencoders

johnmcgaughey255
Dec 13, 2020
3 min read

One of the main purposes of even creating an artificial intelligence, or a machine learning program is to predict. We try to predict through many domains with one machine, some of these domains are transformations, rotation, classification and time. The machines at the frontier of technology are those attempting to predict through the domain of time. We as humans, one of our main purposes is to predict things, and use generalization techniques to do so. Generalizing across time is so difficult because it is when a real understanding of the world must come in. There are not so many different routes the machine can take to generalize across time. Some instances of generalizing and predicting through time would be autopilot cars at Tesla, natural language generating, prediction of movement across things. As we can see, this is not the classical learning of the perceptron, this is introducing the machine to the fourth dimension: time. Let us take speech as an example of changes over time. Speech is the perfect, but also hard example to predict changes throughout time. Honestly there is no 'real world' application for predicting changes in speech over time, other than developing a better understanding of speech. Essentially, the way this works, is to look at previous trends, and train the model based on those previous training examples so the model is well enough informed to make a good prediction about the future.

One of my favorite things about deep learning, and machine learning, is that these techniques are far from fully developed. "AI began with an ancient wish to forge the gods", a perfect quote from Pamela McCorduck. I chose to look at AI and DL from this perspective, as we work to recreate God's perfection in our imperfect world, we will never achieve perfection, only closer and closer models of it. Therefore, there always be a path for people who see things the right way to improve AI. Back to prediction in time, what if we broke the phenomenon down to its most fundamental representation, and then used that representation to recreate something in a future position. How can we force a model to break something down to a more fundamental representation? The answer is with an autoencoder, specifically a disentangled variationally autoencoder. The architecture of the autoencoder is a sideways hour glass, and consists of a encoder and a decoder. The encoder forces the input data into a bottleneck, consisting of a few neurons, then the decoder reconstructs the data into its original format. Essentially this is forcing the model to create some low dimensional conceptualization of the data, that still has the property of expanding back into the original image. This is thought to extract fundamental features about the data, but often the data is entangled in a confusing way that is not representative of the true fundamental features. To solve this problem, we use a disentangled variational autoencoder, which mostly provides a structure so these fundamental features can be disentangled and represented with as little neurons as possible.

The real question is: Is prediction through time easier if we have the data in a low dimensional form representative of the fundamental structure? I believe that it is for a few reasons. High dimensional data, is very complex, although there is more information there, it is much harder to interpret effectively and learn complex behaviors through time. I think there is a real relation to our psychology and neurology in this, our intuition is the best predictor of the future. Intuition, at least to me, feels like a subconscious mechanism. My theory, is that our brain converts data into a lower dimensional form that only our subconscious is capable of processing, after the subconscious comes to a decision or prediction about the environment, it sends a signal back to our conscious in the form of intuition. We should compute the loss function from the bottleneck in relation to a signal a small step in time ahead of the input signal. This way, we are effectively training the machine to make predictions from the low dimensional compression of the high dimensional data instead of the high dimensional data. One idea I had, defiantly not perfect, was to create a structure of low dimensional data that is interconnected, instead of translating to high and low dimensional data.

Disentangled Variational Autoencoders

Recent Posts

Kommentare