Introduction to convex and non-convex optimization problems

johnmcgaughey255
Mar 21, 2022
5 min read

A convex set is a subset of a Euclidian (more specifically an Affine space), one in which the line between any two points in that subset is completely part of that subset. The purpose of defining such a set is to describe what kinds of operations we can do inside of this set. Maybe another way of interpreting a convex set is to think of protrusions as disruptions to a convex set. While circles, cubes, spheres, and hyper-cubes are all examples of convex sets; stars, and other oddly shaped objects may not be. The purpose of defining the convex set in such a rigorous and abstract way, is to generalize it to large dimensions - the dimensionality that we eventually will be considering. Many problems can be translated to finding a set of parameters that optimize or minimize some unknown function. One of the simplest convex optimization problems is finding the minima of a parabola in one dimension. We assume that the parabola is continuous and differentiable. We could find the value of x that minimized f(x), and we can do so by starting with a random x and making better and better guesses. The function, f, is ultimately unknown - the only thing we know about it is that it is convex. Given a random x, we can evaluate the derivative f’(x) and move a small step α in the opposite direction of that derivative. Iteratively performing this process until the derivative f(x) is sufficiently small will ultimately converge on a global optima given any initial parameter x, if f is convex and differentiable and if α is sufficiently small. This is true if a function f lives on a convex set, even if that function is a function of a very large number of parameters. This method I introduced is called gradient descent, and is not actually optimal for solving for the minima of parabolas in a one dimensional space; however, as the dimensionality of the problem increases this method is increasingly more useful.

Convex optimization is the building block for more complex problems that are not so easily solvable. If I was asked how to describe what a convex optimization problem was in a coherent and simple way, I would say that this problem was one in which no matter the starting approach, the same optimal approach will be reached. That is to say that there is one right method to get an answer and any other method to solve the convex problem will result in an increasingly wrong answer as the method becomes more and more different from the optimal method. These problems are easy, but boring. Convex optimization problems become increasingly uncommon as the dimensionality of the parameters that are able to be changed increases. Let me illustrate this through an example. My ultimate goal is to optimize how cool I look, and an app can tell how cool I look at any given moment and how to slightly change how I look to look a tiny bit cooler. In my example, coolness is an objective measurement that does not change from the perspective of any observer. Let’s say that I’m only able to change the color of my shoes. I will start with a random color, and get a response from my app. After receiving this response, I will change my shoes in the direction that it tells me too. I will keep changing my shoe color until I have arrived at the shoe color that optimizes coolness. After I went to school in my cool shoes I was told that although my shoes were cool, it didn’t match at all with my other clothes so my whole outfit was uncool. I went back home and tried this again. This time I would change my shirt, pants, and shoes - for the purpose of simplicity, I am only variating the color of the clothes. I started the whole process over, I put on a random shirt, pair of pants, and pair of shoes. I got an initial response from the app that told me to make my shirt brighter, my shoes browner, and my pants bluer. After many iterations of changing my outfit from the app’s feedback the app finally told me I was at an ‘optimal’ coolness. I went to school the next day and got a more favorable response from the people around me. However, I came home from school that day perplexed. I saw people wearing completely different outfits from what I wore, and my app said that they were as cool as me — some cooler! My first thought was that my app had lied to me. After this, I followed the directions on the app again with a completely different initial set of clothes, and got a completely different - but equally cool outfit. This kind of a problem is non-convex. There are many ways to attain an equally cool set of clothes. The process that the human app system uses to converge on a ‘optimal’ outfit is called gradient descent. Gradient descent / ascent treats every optimization problem as a convex optimization problem, and it will find the local optima. It treats your initial condition as a start, and searches in the direction that will converge on a cool set of clothes. Well what does it mean to ‘converge’ on a set of clothes? It means that if we were to variate our choice of clothes in any direction, the coolness metric would get worse. The gradient (derivative) in this system tells the human in which way to change their outfit to optimize the coolness factor. The gradient directs the person’s choice of an outfit, and at local optima, the gradient is equal to zero - so the human has no incentive to change their outfit. Let’s say that we have 20 pairs of shoes, 10 pairs of pants, and 30 shirts. We know that some combination of these three accessories will result in a global maxima of coolness. Without a method like gradient descent, we could try an exhaustive search of all 6,000 combinations and surely find the best combination. However, the number of combinations grows exponentially with the number of variables and amount of objects in each category, and computation becomes intractable. Gradient descent with an appropriate step size (how much the human changes his outfit each iteration) can find a semi optimal solution in a much shorter time and is scalable to more degrees of freedom. The field of non-convex optimization is so interesting because it introduces probabilistic factors of consideration and more intelligent solutions to these problems. The field of deep leaning is entirely based on non-convex optimization, because neural networks contain many degrees of freedom which must be optimized to fit an unknown function.

Problems that exist in high dimensions naturally are non-convex, and beyond the intuition of typical spatial reasoning. The next several blogs will be directed towards viewing deep learning and its problems as a non-convex optimization problem. I will be discussing why viewing these problems like this is helpful with respect to discovering new solutions.

Introduction to convex and non-convex optimization problems

Recent Posts

Comments