Benchmark tests for large simulations
Using the Manhattan config file.
- Manhattan has about 2500 blocks, which is 50 * 50
- NYC overall has 120,000 blocks (according to the man who walked them all), which is more like 350 * 350.
- Average traffic speed in midtown Manhattan is 5 mph.
- There are about 12 city blocks to the mile, so that’s about one block per minute.
- In a day, there are 1440 minutes.
- In NYC, ridehailing did about 750K rides per day pre-pandemic (https://toddwschneider.com/dashboards/nyc-taxi-ridehailing-uber-lyft-data/)
- In Dec 2017, Bruce Schaller reported that 200K trips per day started or ended in the Manhattan Central Business District (http://www.schallerconsult.com/rideservices/emptyseats.pdf)
Maybe Manhattan as a whole has somewhere around 400K trips?? That would translated into about 300 trips per minute.
The number of vehicle hours in Manhattan CBD is (Schaller) about 100K. If the number of vehicles on the road is constant, each driving 24 hours then the number of vehicles on the road at any one time is about 4K. If they each drive two hours that would be 50K per day. The mayor’s report says an average of about 50K vehicles per day.
The NYC mayor says there are about 80K unique vehicles in NYC, averaging about 250 monthly trips each (that’s about 10 trips per day).
Converting stat lists to numpy arrays
Summary: the stat list conversion does not help. At large scales, once rides start being taken, statistics collection is not a major time contributor.
I would presume that maintaining individual trip and driver information is the challenge. Perhaps we can speed those up.
The stat list conversion was done in the “numpy” branch, while python lists were used in the “dev” branch.
CPU use is typically only around 20%, and memory is only about 30MB, even though overall memory consumption is at about 60%. Which is strange and makes me wonder if it is swapping out data to disk, even though it does not need to.
numpy
| Period | Time | Period length | |- | 0 | 2020-08-23 21:51:02 | 0
Second run from numpy branch:
| Period | Time |
|---|---|
| 0 | 2021-08-24 12:50:41 |
| 24 | 2020-08-24 13:38:32 |
| Elapsed | 47:51 |
dev (Python lists)
| Period | Time | Period length |
|---|---|---|
| 0 | 2020-08-24 09:14:01 | 0 |
| 1 | 2020-08-24 09:14:11 | 0:10 |
| 2 | 2020-08-24 09:14:18 | 0:06 |
| 3 | 2020-08-24 09:14:26 | 0:08 |
| 4 | 2020-08-24 09:14:33 | 0:07 |
| 5 | 2020-08-24 09:14:43 | 0:10 |
| 6 | 2020-08-24 09:14:56 | 0:13 |
| 7 | 2020-08-24 09:15:06 | 0:10 |
| 8 | 2020-08-24 09:15:24 | 0:18 |
| 9 | 2020-08-24 09:15:51 | 0:27 |
| 10 | 2020-08-24 09:16:44 | 0:53 |
| 11 | 2020-08-24 09:17:47 | 1:03 |
| 12 | 2020-08-24 09:18:33 | 0:46 |
| 13 | 2020-08-24 09:20:06 | 1:33 |
| 14 | 2020-08-24 09:21:52 | 1:46 |
| 15 | 2020-08-24 09:22:54 | 1:02 |
| 16 | 2020-08-24 09:24:16 | 1:22 |
| 17 | 2020-08-24 09:27:31 | 3:15 |
| 18 | 2020-08-24 09:30:19 | 2:48 |
| 19 | 2020-08-24 09:34:15 | 3:56 |
| 20 | 2020-08-24 09:38:10 | 3:55 |
| 21 | 2020-08-24 09:42:48 | 4:38 |
| 22 | 2020-08-24 09:47:23 | 4:35 |
| 23 | 2020-08-24 09:52:56 | 5:33 |
| 24 | 2020-08-24 09:58:55 | 5:59 |
| Elapsed | 44:54 |
Second run from dev branch:
| Period | Time |
|---|---|
| 0 | 2021-08-24 10:17:36 |
| 24 | 2020-08-24 11:03:10 |
| Elapsed | 46:34 |
Memory profiling
Here is how to install and run memory profiling.
conda install memory_profiler
python -m memory_profiler ridehail.py -@ config/manhattan.config
CProfile
With 15 periods of Manhattan run, here is the CProfile output, simplified and sorted:
120524852 function calls (120524822 primitive calls) in 433.610 seconds
| ncalls | tottime | percall | cumtime | percall | filename:lineno(function) |
|---|---|---|---|---|---|
| 0 | .000 | 0.000 | 433.610 | 433.610 | ridehail.py:201(main) |
| 15 | 0.079 | 0.005 | 433.548 | 28.903 | ridehail\simulation.py:122(nextperiod) |
| 0 | .000 | 0.000 | 433.548 | 433.548 | ridehail\simulation.py:108(simulate) |
| 162 | 378.815 | 2.338 | 379.043 | 2.340 | ridehail\simulation.py:435(collectgarbage) |
| 15 | 0.064 | 0.004 | 52.725 | 3.515 | ridehail\simulation.py:210(assigndrivers) |
| 4576 | 5.848 | 0.001 | 52.520 | 0.011 | ridehail\simulation.py:235(assigndriver) |
| 8114691 | 7.221 | 0.000 | 14.114 | 0.000 | ridehail\atom.py:321( |
| 8123691 | 9.203 | 0.000 | 13.650 | 0.000 | ridehail\atom.py:297(distance) |
So the major work is done in:
- garbage collection (this could be avoided)
- assigning drivers
- computing distances
Looks like garbage collection was being called inside a driver loop. Terrible! Took it out of the loop and then fixed it to be only every now and then. For 25 periods:
| ncalls | tottime | percall | cumtime | percall | filename:lineno(function) |
|---|---|---|---|---|---|
| 1 | 0.000 | 0.000 | 84.263 | 84.263 | ridehail.py:201(main) |
| 1 | 0.000 | 0.000 | 84.204 | 84.204 | ridehail\simulation.py:109(simulate) |
| 25 | 0.137 | 0.005 | 84.204 | 3.368 | ridehail\simulation.py:123(nextperiod) |
| 25 | 0.110 | 0.004 | 75.643 | 3.026 | ridehail\simulation.py:211(assigndrivers) |
| 22017 | 7.125 | 0.000 | 75.321 | 0.003 | ridehail\simulation.py:236(assigndriver) |
| 8131803 | 6.634 | 0.000 | 40.407 | 0.000 | ridehail\atom.py:312(traveldistance) |
| 8131803 | 9.827 | 0.000 | 18.034 | 0.000 | ridehail\atom.py:321( |
| 22017 | 15.773 | 0.001 | 15.773 | 0.001 | ridehail\simulation.py:246( |
| 8146803 | 10.718 | 0.000 | 15.767 | 0.000 | ridehail\atom.py:297(distance) |
| 4895 | 3.966 | 0.001 | 11.983 | 0.002 | random.py:264(shuffle) |
| 17108293 | 6.509 | 0.000 | 8.630 | 0.000 | types.py:164(get) |
| 8293315 | 5.634 | 0.000 | 8.193 | 0.000 | random.py:224(randbelow) |
| 25 | 5.581 | 0.223 | 5.585 | 0.223 | ridehail\simulation.py:436(collectgarbage) |
So now it is down to assigning drivers and calculating travel distances.
With display, assigning drivers is still the big time consumer, and in that the effort of collecting a list of available drivers is the biggest contributor.
Each time a driver is assigned, the list of available drivers is computed, which means many times per period. Let’s compute it once per period and then update the list during the assignment phase.