Optimal Bike Lane Placement

Sarah Anoke, Shiang Fang, and Ben Wong for AM207 (May 2014)

Towards preservation of the environment, ideally all local destinations would be accessible by bike. However, most streets in the Boston area were constructed with only cars in mind, and it is financially and practically infeasible to add bike lanes to all streets. Focusing on Somerville we attempt to answer the question where is the optimal location
to construct a new bike lane? Because we’re interested in an extremum, not an entire distribution, the method of optimization used is simulated annealing. The screencast associated with this project can be found on YouTube. Our poster can be viewed here.

Keywords facility placement, bike lane, transportation, simulated annealing, point-referenced data, geospatial analysis

Introduction

Facility Placement

Facility placement is a general class of problems involving the optimal placement of a new facility within a network of existing facilities. Construction of the objective function that characterizes optimality is highly dependent on the type of facility under consideration. For example, considerations required in the placement of a new McDonald’s (which would include profit and distribution of competitors) differ from the considerations in the placement of a new hospital (which would include accessibility and patient waiting time).

Our Problem

The problem we have is similar, except we want to place bike lanes within an existing structure of bike and car lanes. Towards preservation of the environment, it would be ideal if all local destinations were accessible by bike lane. However, most streets in Boston were constructed with only cars in mind, and it is financially and practically infeasible to go back and add bike lanes to all streets. So we asked the question, "where is the most ideal location to construct a new bike lane?" We describe our process towards a solution first using synthetic data, then using Somerville as a particular case. Because we’re interested in an extremum, not an entire distribution, the method of optimization we’ve selected is simulated annealing.

Our goal is to answer the question,
where is the most ideal location to construct a new bike lane?

Overview

Our overall strategy was to start by building a relatively simple city and optimization model. From there, we would understand and resolve the more nuanced issues of our problem, then apply our algorithm to real data. In the following text, we describe this process by first providing a brief background on previous work and our optimization method. We then describe the model, data collection, and the results. We conclude with a discussion of further extensions.

Motivation

Previous Work

The U.S. Department of Transportation Federal Highway Administration (FHWA) has a Bicycle & Pedestrian Program that “promotes bicycle and pedestrian transportation use, safety, and accessibility”. Each state has a Program representative that, among other responsibilities, helps craft appropriate legislation and distribute federal funding towards bike (and pedestrian) lane construction. The FHWA also funds the University of North Carolina Highway Safety Research Center, a "national clearinghouse of pedestrian and bicycle information".

In addition, several state and national organizations have taken up the goal of multimodal transportation development. To name a few, Maryland State Highway Administration, Washington State Department of Transportation (chapter 1520), and the American Association of State Highway and Transportation Officials have all published documentation to guide constituent regions in the proper construction of bicycle lanes and pedestrian paths. “Proper construction” includes lane widths/shared lane requirements, signage, demarcation, pavement quality, shoulder edge treatment, and storm drain grate allowances. Boston has been particularly progressive in this regard, having published a 30-year bike plan to build a comprehensive network of bike lanes throughout the city.

While there is an abundance of information on how to properly install a bike lane, to our knowledge there is no information about where to start. All cities we’ve reviewed seem to be employing an ad hoc method of placement in the most convenient areas, with the long-term goal of eventually covering the whole geographical area. Our project is novel in that we attempt to integrate several practical considerations into a method that selects the optimal placement of a new bike lane, given the existing bike and car lane structure. As mentioned above, this optimal placement is determined using simulated annealing.

Simulated Annealing

The simulated annealing (SA) approach was selected because we are interested in finding a global optimum over a large search space, rather than characterization of an entire distribution. The methodology borrows its name from annealing in metallurgy, where a material is heated then slowly cooled to alter its physical properties.

In the algorithmic setting, there is also a measure of temperature in the simulation of this heating process. A solution is proposed, and during the initial periods of high temperature, the algorithm is more likely to accept a proposal that is worse than the current solution. This property allows us to explore the solution space, and move away from local optima if needed. Over the course of the algorithm, the temperature is gradually cooled, reducing the probability of accepting a worse solution and restricting exploration of the solution space, which allows the algorithm to focus on the portion of the solution space that is close to the global optimum. The algorithm stops when the system is sufficiently cool; at this point we’ve reached thermal equilibrium (or, in MCMC language, a stationary state). For a more detailed description of how this algorithm works, we direct you to a brief introduction and existing literature (Kirkpatrick et al., 1983; Cerny, 1985).

A reasonable alternative to SA is the use of a genetic algorithm (GA; Holland, 1992), a method of optimization that mimics the process of natural selection. Instead of maintaining a current solution, in GA a pool of solutions is maintained. Proposed solutions are proposed by “mutation” (essentially what is done in SA) but also by “recombination” of two proposal solutions. Probabilistic “survival of the fittest” criteria allow the “best” solution to reproduce and replace “less fit” solutions. While algorithm performance was not a goal of this project, it would be a worthwhile exercise to estimate the optimum and compare it (and other elements of the solution pool) to the optimum we calculated using SA.

Model

Formulation of Simple System

People arise from sources (i.e., their residences) and uptake is into sinks (i.e., potential destinations). Ideally, sources would be individual houses but, limited by data availability, sources are US Census block groups. Sinks are major destinations: hospitals, universities, high schools, and MBTA train stations. We're assuming that the city is a closed system, so all the destinations and individual would want to visit are within the city, and no one new is entering the city.

As depicted in the image on the right,the houses denote sources, and the U, HS, and H symbols are university, high school, and hospital sinks, respectively. The city itself is represented as a grid where the intersections are all one unit apart and residents may travel horizontally or vertically, but not diagonally. An additional simplifying assumption is that all vertical roads have bicycle lanes, as do the horizontal top and bottom city limits. Feel free to click the image for a larger view.

Given a list of coordinates corresponding to a fixed number of sources generating $N$ persons and an additional list of coordinates corresponding to a fixed number of sinks, we want to find the optimal place to build one bike lane such that we minimize the total distance traveled by all $N$ persons. In this simplified model, we’re assuming that cycling is the best mode of transportation (i.e., the only other mode of transportation is walking, and a person’s walking speed is so low that the individual would be better off taking a detour than getting off their bike and pushing it on roads where there exists no bike lanes) and that cycling is only possible on roads with bike lanes because it would otherwise be too dangerous to cycle alongside cars.

Simplifying Assumptions

In the table below, we discuss several model considerations and the assumptions we made regarding those considerations. Some of these simplifying assumptions limit the generalizability of our model but were made so that our optimization problem could be executed feasibly (e.g., city roads as a grid). Other assumptions were more natural (e.g., the entire journey is spent traveling by bike) which also helped in simplifying the model and ease coding.

Consideration	Explanation
Flow of bikers	Each street has a certain flow of bikers, and we want to maximize the average flow from source to destination (averaged over the residents of Somerville). This is accounted for under Accessibility, where we took into account the number of bikers simultaneously using a bike lane.
Flow of cars	Each street has a maximum flow of cars, and we want to maximize the average flow from source to destination (averaged over the residents of Somerville). It turns out that this factor is not an important factor in the choice of bike lane placement because flow of cars is not greatly affected by construction of bike lanes. Currently, there are many roads where cyclists and drivers can both use simultaneously, and the flow of cars will not be disrupted if the lanes are made narrower when bike lanes are constructed. Hence, we do not consider this.
Cost of bike lane installation	Proper installation of a bike lane includes demarcation, labeling, and signage for motorists. The lanes are usually four to six feet wide, which limits which streets are eligible for lane placement. The average cost of a bike lane installation is $25/linear foot. Also included in this cost function will be some measure of street width, either actual width or a measure of width by street type (e.g., city versus residential). We want to minimize this quantity. In our model, the distance between each intersection on the grid is of equal length, hence for this study, if we assume that the number of bike lanes to install is fixed, then cost is fixed across all possible placements so we do not consider this.
Travel time on bike lane	Normalized by total distance traveled, this is a measure of the proportion of a bicyclist's total trip that is on the bike lane, and we want to maximize this quantity. We stated earlier our assumption that a person's walking speed is so low that one would be better off taking a detour than getting off one's bike and pushing it on roads where cycling is not possible or safe. As such, the entire trip is spent on the bike lane.
Accessibility	We want to place the bike lane where the population density is greatest and/or where the greatest number of bikers will benefit from its installation. Therefore, in our calculations, the distance from source to sink is weighted by the number of bikers that ply that particular route.

Objective Function

The objective function $f(\cdot)$ is a quantitative summary of the most important considerations when evalu- ating a proposed bike lane placement. In Table 1 we detail these considerations, as well as justification for inclusion in or exclusion from our objective function. After making simplifying assumptions as discussed in the table above, our objective is to minimize $f(\text{grid configuration}) = \sum_{i=1}^N D_i$ where $D_i$ is the distance from individual $i$ ’s source to their sink/destination. For a given grid configuration, we will refer to the quantity $f(\text{grid configuration})$ as the energy of that system.

Data

Demographic Data

Although population density information by household is the ideal, the US Census Bureau provides such information by blocks ( $\approx 150$ people), which is approximately equal to a city block. The next largest geographical unit is a block group ( $\approx 1,000$ ), then tract ( $\approx 4,000$ ), zip code, school district, and city. We selected the resolution of block group and manually created a dataset of 69 sources (consisting of population counts and latitude/longitude coordinates) using USA.com and Google Maps.

Information on the 11 Somerville sinks was also collected manually, using Google Maps to determine the latitude and longitude coordinates of the 7 area high schools, 1 university, 2 hospitals, and Davis Square. The left image shows the positioning of the 69 sources and 11 sinks, superimposed on a map of Somerville. This map was created using Google Maps; feel free to click for a larger version.

Bike Lane Data

The right image shows the distribution of existing bicycle lanes, as determined using Google Maps; feel free to click for a larger version. We rotated this map approximately $30^\circ$ clockwise, then manually created a grid approximation of the existing bike lane structure. Although we relied on Google Maps data, we note that the Massachusetts Department of Transportation (MassDOT) also provides a Bicycle Facility Inventory for the entire state.

Street Data

MassDOT also has a Road Inventory for the entire state. Our priority was to construct working code, so we made the simplifying assumption that all roads were of equal width and bidirectional.

Implementation

Calculating the Objective Function

Within the grid representation of our city, there are several routes that each of the $N$ persons could take to travel from home to destination. Because we don’t know the exact destination of each person, we created an exhaustive list of all possible source/sink pairings. Our objective function becomes $f(\text{grid configuration}) = \sum_{i=1}^N w_i D_i$ where weight $w_i$ is the product of the population density at individual $i$ 's location and the assigned density at individual $i$ 's destination. The assignment of weights to destinations was somewhat arbitrary, but included practical considerations. A weight of 0.07 was assigned to each of the seven high schools, 0.21 to Davis Square, 0.20 to Somerville Hospital, and 0.10 to a smaller health center.

For a given grid configuration, we calculate the minimum distance from source to sink for each of the $69 \times 11= 759$ matchings, and take the weighted sum of these $759$ paths. This minimum distance is determined by comparison to the maximum trivial distance, an upper bound on the travel distance. This path involves going towards the top or bottom of the grid, then moving horizontally, and then down or up towards the sink. The minimum of these 2 paths serves as an upper bound for our shortest path, so that if our biker has travelled a distance greater than this upper bound, this path is discarded and a new path is sought.

Next, define a node to be any intersection point on the grid that lies on a horizontal bike lane. Alternatively, we can think of a horizontal bike lane as joining two (or more) nodes. By definition, all intersections on the top and bottom edge of the grid are nodes. Computer-generated depiction of our city model is presented on the right; feel free to click the image to see a larger version. The yellow-green circles denote nodes that connect preexisting bike lanes. The red nodes denote sources, and the blue nodes denote sinks. In this simple example, the sources and sinks are paired are labeled. The green edge denotes an initial bike lane placement, and the semitransparent adjacent edges denote proposal bike lane placements.

From any starting point, our biker can move in a maximum of four directions. Since our starting points are intersections, it is guaranteed that he will have a minimum of two possible directions initially. If his starting point is also a node, and hence connected to a bike lane, he may have a third or fourth possible direction. He can now move in any of these four directions, and to any node along the direction chosen. These directions are chosen at random and repeated several times so that we ensure that we covered all possible paths. Finally, we sum our minimum distance over all bikers and their respective sources and sinks, given the position of nodes. This gives us the total minimum distance, $f(\text{grid configuration})$ .

Simulated Annealing

The change in energy associated with moving from current grid configuration $D^{\text{curr}}$ to proposed grid configuration $D^*$ is $\Delta E = f( D^* ) - f(D^{\text{curr}}).$ We accept a proposed grid configuration with probability $\exp\{-\beta\Delta E\}$ , where $\beta \propto 1/T$ . From this formulation we see that a higher temperature corresponds to a higher acceptance probability, as desired.

There are several other considerations involved in setting up the simulated annealing algorithm, all determined by experimentation.

The maximum temperature set at the beginning of the algorithm was determined by trial-and-error, and eventually set to $\beta = 0.003$ .
We run our algorithm for nums $\times$ bin_size = 30,000 iterations, and decrease the temperature by 4% every bin_size = 100 iterations.
The final consideration is the strategy for proposing updates, and this is described within the summary of our algorithm below.

Initialize by randomly choosing a position at which to build a new bike lane.
Calculate the objective function for this new configuration.
Propose a new position by shifting the current position of the bike lane one unit to the left, right, up, or down. For this proposal, calculate the objective function.
If the value of the objective function decreases, we accept the proposed configuration with probability 1. If the value of the objective function remains unchanged or increases, then we accept it with probability $\exp\{-\beta\Delta E\}$ or discard it.
Repeat the above two steps until the system has sufficiently cooled.

Scratch work that shows how we followed the progress of our algorithm to make sure things were operating as expected.

Scratch work that shows how we followed the progress of our algorithm to make sure things were operating as expected. The dark blue 'filled' lines are the existing bike lanes we included in our grid approximation.

We kept all of these considerations in mind when we wrote our algorithm in Python. As mentioned earlier, we optimized over a grid approximation of Somerville. To make sure our algorithm was operating as expected, we used a real map of Somerville (figure on the left) to trace its progress. Feel free to click on the image to see a larger version of our scratch work.

Results

Toy Data

We first tested our algorithm on a toy dataset, which consisted of three sources, three destinations, and a preexisting existing bike lane structure (see figure at left). The red edge denotes the optimal placement selected by our algorithm, which is obviously the “correct” choice.

Somerville Data

We also applied our algorithm to real data. We obtained population counts by US Census "block groups" in the Somerville area, as well as corresponding latitude and longitude coordinates. We also found the coordinates of 6 high schools, 1 university, 1 hospital, 1 health center and Davis Square. We then performed an approximately 45 degree rotation of the source and sink coordinates so that the main streets (Somerville Avenue, Summer Street, Highland Avenue, and Broadway) can be modeled as the vertical lines in a grid (see second figure of the Implementation section. Using Google Maps, we obtained information on the roads where it is possible and safe to travel by bike (see second figure of the Data section).

Several streets between Highland Avenue and Broadway had portions which were bike-friendly and portions which were not. We did not include these streets when inputing the coordinates of bike lanes in our model, because a horizontal street is deemed accessible by bike only if bikers could travel from one vertical road to another through that street. These include Central Street and Cedar Street.

The aforementioned destinations were chosen because they were the most likely destinations within Somerville, and we are modeling Somerville as a closed system. We assume that an equal number of students cycle to each of the 8 educational institutions, and that overall 49% of bikers (under consideration and under the closed city model) cycle to school, 21% to Davis Square, 20% to Somerville Hospital and 10% to Central Street Health Center. These numbers were determined arbitrarily, and are reasonable estimates, as actual data were not made available by the city government.

Our simulated annealing algorithm tells us that the optimal location for installing a new bike lane is on School Street between Highland Avenue and Broadway. This is marked in the map below, where we show the approximate position of the optimal bike lane location, added to a map that displays the existing bike lanes in the area. Note that it is currently possible to cycling along School Street between Somerville Avenue and Highland Avenue, so this placement makes intuitive sense.

Schools are , hospitals are , and Davis Square is .

Discussion

Our optimal placement is in agreement with what we might expect because intuitively the optimal solution should be, in some ways, an extension of a bike lane that is already present. As we saw in the toy example, it would not make sense to install a bike lane one unit above or below the red line because it would not reduce travel time much more than taking the "maximum trivial path" of going towards the edges of the grid.

It is also understandable why this would be the optimal solution, because three of the seven high schools are located on or close to School Street (hence the name), and we can expect that about half of the students would use this new lane. In addition, this proposed new lane also enables bikers from the east (of the tilted grid) to find a faster route to Somerville Hospital via Highland Avenue and to Central Street Health Center via Summer Street.

The initial total distance travelled by our population, given the current bike lanes, is 47,047,474 units. The optimal total distance was found to be 43,978,710 units, after placing our new bike lane at the corresponding location (School Street between Highland Avenue and Broadway). This is a reduction of 3,068,764 units of distance. The second most optimal bike lane placement (Lowell Street between Highland Avenue and Broadway) was found to give a total distance of 44,600,728 units, or equivalently a reduction of 2,446,746 units. Hence, our most optimal placement gives a distance reduction that is 25.4% greater than the distance reduction given by the second most optimal placement, and this further justifies selecting School Street over Lowell Street.

Therefore, if the city government wishes to build exactly one new bike lane, they should do so on School Street between Highland Ave and Broadway.

Limitations

In the table contained in Simplifying Assumptions of the Model section, we discuss several model considerations and the assumptions we made regarding those considerations. Some of these simplifying assumptions limit the generalizability of our model but were made so that our optimization problem could be executed feasibly (e.g., city roads as a grid). Other assumptions were more natural (e.g., the entire journey is spent traveling by bike) which also helped in simplifying the model and ease coding.

Future Directions

This project can be extended in several directions. For example, future work could include the placement of more than one bike lane, or placement of a single multiunit bike lane. We could also include elevation information, weighting grid edges by steepness. Another possible extension is to consider, in addition to bike lane placement adjacent to car lines, is the conversion of "car only" lanes to "shared", where both cars and bikes are legally allowed to share the entire roadway.