Traffic Lights

johnmcgaughey255
Sep 4, 2021
5 min read

Cop lights, flashlights, spotlights, strobe lights, street lights… out of all of the lights we could talk about today I want to focus on one type specifically - traffic lights. These type of lights regulate traffic flow, ensuring you get to work on time and so none of us die on the way. In ISM 2 this year, I want to try to optimize these complex traffic systems through the regulation of stop lights. There is a lot of potential and unexplored domain here, the avenue I want to pursue is some kind of reinforcement learning algorithm, where stop lights are agents of action, and they maximize some score that is representative of efficiency of traffic flow.

There are a few main branches of reinforcement learning we could drift into, briefly I will brush over each of them. What I am most practiced in - policy optimization. A policy is like a rule, one which is optimized by a process called policy optimization. This rule tells the agent which action to take given some observation from the environment. Having a policy which is paramatizable is optimal because from that we can figure out how to best change those parameters to most efficiently get a better score in the game. Another type of RL that could be used for this problem is Value function or Action - Value function approximation. Whereas in policy optimization the game was to optimize mapping observation to action, the goal in this value function game is to always move towards the best possible state. The agent learns which states lead to a good score and uses a parameterized value function to learn. The action-value function, also called a Q function, is used to predict which action will lead to the future state with the best quality. Both of these methods work well for different problems, from now on I will stick with what I know best - policy optimization.

At first I wanted to try to optimize traffic light timing through a multi agent RL setting, then I realized how unscalable and inefficient that would be. A better idea would be to use the symmetry of road systems to my advantage and have every traffic light have the same policy. If you think about it, there is no reason why one stop light block should follow different rules than another one - they should have a similar value function of the environment and objective function. Not only does it make it more scalable, but the more you scale the simulation the faster the policy can converge. The bigger the system is, the more cars will interact with traffic lights - of course in a simulation. So this is my first selling point - system is highly scalable to larger and larger networks because it takes advantage of the symmetries in traffic systems.

Down to the details. Honestly the details are very broad right now, not quite formulated. There are a few steps to formulation of these details. Firstly being formulating them conceptually, being able to write down what I want to happen in English on a piece of paper. Secondly is the mathematical formulation from the conceptual, describe all the relationships between the agents, environment, and policies in a numerical form. Lastly, I have to code it, write it in some way that can by compiled and processed by a CPU. There is this concept of graphs that I want to get nailed down a bit more in my head, a network of nodes and edges - roads. These edges have information such as the length of road, or maybe if there is a delay in traffic that day. The nodes consist of an intersection, and can also contain important information, such as a car crash or other things un thought of by me yet. One of the core idea that I think I’m going to keep in this implementation are the signals. Not traffic signals, but a signal sent out to all lights of connected intersections. Hopefully, the agent will be able to learn which signals apply to it, and how to act based on those signals. The signals are sent out because they have real consequences in the environment, for example if intersection A send a signal out to all of its neighbors that cars are turning left from that intersection, the intersection left of A - intersection B - will soon be hit with this traffic. Of course there is some initial information that will be transmitted along with the “Cars are turning left signal”. This information is embedded in the roads: the distance of the roads, how much traffic turned at traffic light A, and other additional information such as the speed limits. Notice how all of this information pertains to how agents in intersection B will have to deal with the traffic. It will make a difference to intersection B if intersection A is half a mile away versus 3 miles away, because traffic will take longer to reach B if the distance is longer. In order to make the policy self similar across all intersections, you do have to give this information about the environment so the policy can take it into its separate calculations. We have to do this so that the policy can generalize across all situations, then when given some initial conditions, it can make the correct decisions.

In many successful deep learning algorithms, we see some inspiration from nature, some observation of how nature optimizes itself, then we copy that in our algorithms. Take the generative adversarial network, it utilizes competition as a device of growth. I have never seen an algorithm like mine be implemented or thought of, of course I will do my research and read papers. In most RL settings, there is one agent like a robot or a video game player. Mine is not the typical RL setting, there are multiple identical agents that are all part of a system of traffic. Each agent’s policy function and value function, and loss function will be the same. This is to say that when the policy update is made, it will be made for every single agent identically across the system. I think my idea is pretty cool because no one has really thought of it before, by my research, and I think it has a lot of potential. I will go deeper into why exactly I believe my idea has merit and potential, but essentially it is because it utilizes the properties of symmetry in design. I call the policy self competitive, because the only thing that is going against the policy’s success is another policy’s success. Take a single intersection, there are four traffic light blocks, each of the traffic blocks being agents, direct copies of the same agent. Because they each want to maximize their own objective function, they must in some way compete against each-other. But now we ask the question: “Are they competing against each other or against themselves?”. It is kind of both… leaning more strongly towards being against themselves. Built in to this model is the idea that actions always have consequences to the actor.

Traffic Lights

Recent Posts

Comments