Solving the traveling salesman problem with Postgres recursive CTEs
- Blog
- Tech Talk
Many SQL implementations don’t have loops, making some kinds of analysis very difficult. Postgres, SQL Server, and several others have the next best thing — recursive CTEs! We’ll use them to solve the Traveling Salesman Problem and find the shortest…
Many SQL implementations don’t have loops, making some kinds of analysis very difficult. Postgres, SQL Server, and several others have the next best thing — recursive CTEs!
We’ll use them to solve the Traveling Salesman Problem and find the shortest round-trip route through several US cities.
With recursive
Normal CTEs are great at helping to organize large queries. They are a simple way to make temporary tables you can access later in your query.
Recursive CTEs are more powerful – they reference themselves and allow you to explore hierarchical data. While that may sound complicated, the underlying concept is very similar to a for loop in other programming languages.
These CTEs have two parts — an anchor member and a recursive member. The anchor member selects the starting rows for the recursive steps.
The recursive member generates more rows for the CTE by first joining against the anchor rows, and then joining against rows created in previous recursions. The recursive member comes after a union all in the CTE definition.
Here’s a simple recursive CTE that generates the numbers 1 to 10. The anchor member selects the value 1, and the recursive member adds to it up to the number 10:
with recursive incrementer(prev_val) as (
select 1 -- anchor member
union all
select -- recursive member
incrementer.prev_val + 1
from incrementer
where prev_val
The first time the recursive CTE runs it generates a single row 1 using the anchor member. In the second execution, the recursive member joins against the 1 and outputs a second row, 2. In the third execution the recursive step joins against both rows 1 and 2 and adds the rows 2 (a duplicate) and 3.
Recursive CTEs also only return distinct rows. Even though our CTE above creates many rows with the same value, only a distinct set of rows will be returned.
Notice how the CTE specifies its output as the named value prev_val. This lets us refer to the output of the previous recursive step.
And at the very end there is a termination condition to halt the recursion once the sum gets to 10. Without this condition, the CTE would enter an infinite loop!
Under the hood, the database is building up a table named after this recursive CTE using unions:
Recursive CTEs can also have many parameters. Here’s one that takes the sum, double, and square of starting values of 1, 2 and 3:
with recursive cruncher(inc, double, square) as (
select 1, 2.0, 3.0 -- anchor member
union all
select -- recursive member
cruncher.inc + 1,
cruncher.double * 2,
cruncher.square ^ 2
from cruncher
where inc
With recursive CTEs, we can solve the Traveling Salesman Problem.
Finding the shortest path
There are many algorithms for finding the shortest round-trip path through several cities. We’ll use the simplest: brute force. Our recursive CTE will enumerate all possible routes and their total distances. We’ll then sort to find the shortest.
First, a list of cities with our customers, along with their latitudes and longitudes:
create table places as (
select
'Seattle' as name, 47.6097 as lat, 122.3331 as lon
union all select 'San Francisco', 37.7833, 122.4167
union all select 'Austin', 30.2500, 97.7500
union all select 'New York', 40.7127, 74.0059
union all select 'Boston', 42.3601, 71.0589
union all select 'Chicago', 41.8369, 87.6847
union all select 'Los Angeles', 34.0500, 118.2500
union all select 'Denver', 39.7392, 104.9903
)
And we’ll need a distance function to compute how far two lat/lons are from each other (thanks to strkol on stackoverflow.com):
create or replace function lat_lon_distance(
lat1 float, lon1 float, lat2 float, lon2 float
) returns float as $$
declare
x float = 69.1 * (lat2 - lat1);
y float = 69.1 * (lon2 - lon1) * cos(lat1 / 57.3);
begin
return sqrt(x * x + y * y);
end
$$ language plpgsql
Our CTE will use San Francisco as its anchor city, and then recurse from there to every other city:
with recursive travel(places_chain, last_lat, last_lon,
total_distance, num_places) as (
select -- anchor member
name, lat, lon, 0::float, 1
from places
where name = 'San Francisco'
union all
select -- recursive member
-- add to the current places_chain
travel.places_chain || ' -> ' || places.name,
places.lat,
places.lon,
-- add to the current total_distance
travel.total_distance +
lat_lon_distance(last_lat, last_lon, places.lat, places.lon),
travel.num_places + 1
from
places, travel
where
position(places.name in travel.places_chain) = 0
)
The parameters in the CTE are:
- places_chain: The list of places visited so far, which will be different for each instance of the recursion
- last_lat and last_lon: The latitude and longitude of the last place in the places_chain
- total_distance: The distance traveled going from one place to the next in the places_chain
- num_places: The number of places in places_chain — we’ll use this to tell which routes are complete because they visited all cities
In the recursive member, the where clause ensures that we never repeat a place. If we’ve already visited Denver, position(…) will return a number greater than 0, invalidating this instance of the recursion.
We can see all possible routes by selecting all 8-city chains:
select * from travel where num_places = 8
We need to add in the distance from the last city back to San Francisco to complete the round-trip. We could hard code San Francisco’s lat/lon, but a join is more elegant. Once that’s done we sort by distance and show the smallest:
select
travel.places_chain || ' -> ' || places.name,
total_distance + lat_lon_distance(
travel.last_lat, travel.last_lon,
places.lat, places.lon) as final_dist
from travel, places
where
travel.num_places = 8
and places.name = 'San Francisco'
order by 2 -- ascending!
limit 1
Even though this query is significantly more complicated than the incrementer query earlier, the database is doing the same things behind the scenes. The top branch is the creating the CTE’s rows, the bottom branch is the final join and sort:
Run this query and you’ll see the shortest route takes 6671 miles and visits the cities in this order:
San Francisco -> Seattle -> Denver -> Chicago -> Boston -> New York -> Austin -> Los Angeles -> San Francisco
Thanks to recursive CTEs, we can solve the Traveling Salesman Problem in SQL!