Skip to content

Benchmark tests for large simulations

Using the Manhattan config file.

Maybe Manhattan as a whole has somewhere around 400K trips?? That would translated into about 300 trips per minute.

The number of vehicle hours in Manhattan CBD is (Schaller) about 100K. If the number of vehicles on the road is constant, each driving 24 hours then the number of vehicles on the road at any one time is about 4K. If they each drive two hours that would be 50K per day. The mayor’s report says an average of about 50K vehicles per day.

The NYC mayor says there are about 80K unique vehicles in NYC, averaging about 250 monthly trips each (that’s about 10 trips per day).

Converting stat lists to numpy arrays

Summary: the stat list conversion does not help. At large scales, once rides start being taken, statistics collection is not a major time contributor.

I would presume that maintaining individual trip and driver information is the challenge. Perhaps we can speed those up.

The stat list conversion was done in the “numpy” branch, while python lists were used in the “dev” branch.

CPU use is typically only around 20%, and memory is only about 30MB, even though overall memory consumption is at about 60%. Which is strange and makes me wonder if it is swapping out data to disk, even though it does not need to.

numpy

| Period | Time | Period length | |- | 0 | 2020-08-23 21:51:02 | 0

1 2020-08-23 21:51:07 0:05 2 2020-08-23 21:51:12 0:05 3 2020-08-23 21:51:17 0:05 4 2020-08-23 21:51:21 0:04 5 2020-08-23 21:51:24 0:03 6 2020-08-23 21:51:29 0:05 7 2020-08-23 21:51:39 0:10 8 2020-08-23 21:51:55 0:16 9 2020-08-23 21:52:19 0:26 10 2020-08-23 21:52:59 0:40 11 2020-08-23 21:53:39 0:40 12 2020-08-23 21:54:17 0:38 13 2020-08-23 21:55:17 1:00 14 2020-08-23 21:56:19 1:02 15 2020-08-23 21:58:04 1:45 16 2020-08-23 21:59:43 1:39 17 2020-08-23 22:01:45 2:32 18 2020-08-23 22:05:15 3:30 19 2020-08-23 22:08:53 3:38 20 2020-08-23 22:13:42 3:49 21 2020-08-23 22:18:35 4:53 22 2020-08-23 22:25:55 7:20 23 2020-08-23 22:35:29 9:34 24 2020-08-23 22:44:58 9:29 Elapsed 53:56  

Second run from numpy branch:

Period Time
0 2021-08-24 12:50:41
24 2020-08-24 13:38:32
Elapsed 47:51

dev (Python lists)

Period Time Period length
0 2020-08-24 09:14:01 0
1 2020-08-24 09:14:11 0:10
2 2020-08-24 09:14:18 0:06
3 2020-08-24 09:14:26 0:08
4 2020-08-24 09:14:33 0:07
5 2020-08-24 09:14:43 0:10
6 2020-08-24 09:14:56 0:13
7 2020-08-24 09:15:06 0:10
8 2020-08-24 09:15:24 0:18
9 2020-08-24 09:15:51 0:27
10 2020-08-24 09:16:44 0:53
11 2020-08-24 09:17:47 1:03
12 2020-08-24 09:18:33 0:46
13 2020-08-24 09:20:06 1:33
14 2020-08-24 09:21:52 1:46
15 2020-08-24 09:22:54 1:02
16 2020-08-24 09:24:16 1:22
17 2020-08-24 09:27:31 3:15
18 2020-08-24 09:30:19 2:48
19 2020-08-24 09:34:15 3:56
20 2020-08-24 09:38:10 3:55
21 2020-08-24 09:42:48 4:38
22 2020-08-24 09:47:23 4:35
23 2020-08-24 09:52:56 5:33
24 2020-08-24 09:58:55 5:59
Elapsed 44:54  

Second run from dev branch:

Period Time
0 2021-08-24 10:17:36
24 2020-08-24 11:03:10
Elapsed 46:34

Memory profiling

Here is how to install and run memory profiling.

conda install memory_profiler
python -m memory_profiler ridehail.py -@ config/manhattan.config

CProfile

With 15 periods of Manhattan run, here is the CProfile output, simplified and sorted:

120524852 function calls (120524822 primitive calls) in 433.610 seconds
ncalls tottime percall cumtime percall filename:lineno(function)
0 .000 0.000 433.610 433.610 ridehail.py:201(main)
15 0.079 0.005 433.548 28.903 ridehail\simulation.py:122(nextperiod)
0 .000 0.000 433.548 433.548 ridehail\simulation.py:108(simulate)
162 378.815 2.338 379.043 2.340 ridehail\simulation.py:435(collectgarbage)
15 0.064 0.004 52.725 3.515 ridehail\simulation.py:210(assigndrivers)
4576 5.848 0.001 52.520 0.011 ridehail\simulation.py:235(assigndriver)
8114691 7.221 0.000 14.114 0.000 ridehail\atom.py:321()
8123691 9.203 0.000 13.650 0.000 ridehail\atom.py:297(distance)

So the major work is done in:

  • garbage collection (this could be avoided)
  • assigning drivers
  • computing distances

Looks like garbage collection was being called inside a driver loop. Terrible! Took it out of the loop and then fixed it to be only every now and then. For 25 periods:

ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 84.263 84.263 ridehail.py:201(main)
1 0.000 0.000 84.204 84.204 ridehail\simulation.py:109(simulate)
25 0.137 0.005 84.204 3.368 ridehail\simulation.py:123(nextperiod)
25 0.110 0.004 75.643 3.026 ridehail\simulation.py:211(assigndrivers)
22017 7.125 0.000 75.321 0.003 ridehail\simulation.py:236(assigndriver)
8131803 6.634 0.000 40.407 0.000 ridehail\atom.py:312(traveldistance)
8131803 9.827 0.000 18.034 0.000 ridehail\atom.py:321()
22017 15.773 0.001 15.773 0.001 ridehail\simulation.py:246()
8146803 10.718 0.000 15.767 0.000 ridehail\atom.py:297(distance)
4895 3.966 0.001 11.983 0.002 random.py:264(shuffle)
17108293 6.509 0.000 8.630 0.000 types.py:164(get)
8293315 5.634 0.000 8.193 0.000 random.py:224(randbelow)
25 5.581 0.223 5.585 0.223 ridehail\simulation.py:436(collectgarbage)

So now it is down to assigning drivers and calculating travel distances.

With display, assigning drivers is still the big time consumer, and in that the effort of collecting a list of available drivers is the biggest contributor.

Each time a driver is assigned, the list of available drivers is computed, which means many times per period. Let’s compute it once per period and then update the list during the assignment phase.