Baseline - Predicting Replaced (2nd choice) mode with logistic regression #1087

Abby-Wheelis · 2024-09-04T21:49:01Z

There are two main components to predicting mode choice with a choice model:

The choice model itself, representing people's preferences ie Abby's preferences for travel are: (time: -2, cost: -5, fun: 10)
The possible modes for the trip ie [Car: (cost=15$, time=10min, fun=0), E-bike:(cost=1$, time=15min, fun=1), walk:(cost=0$, time=60min, fun=-1)]

With these factors we can predict that Abby would choose e-bike (-25) and without the e-bike would choose car(-95) but wouldn't choose walk (-130), approximately.

As a baseline, we want to build a logistic regression model, since that is what is most commonly used in research and planning to model mode choice (ie what would the ridership returns on this transit investment be like?).

We have ground truth data about 2nd choice modes, through the replaced mode collected by programs that have a mode of interest, often e-bike. This is used to show the impact of the mode of interest, through things like emissions savings/reductions which we map on the public dashboard.

To build up the alternatives, we'll need a few different pieces of data, which could be complex to figure out:

what modes are available:
- initial demographic survey asks people about their options - "do you have a license" "what modes are available to you"
- we can check for transit availability: NTD? What does MEP use? Google or other API?
  - same method as Jack implemented for carbon/energy? (No because there are busses in Golden that could get me to work, but not to the climbing gym, we need routing)
cost:
- cars - reimbursement rate to account for amortized ownership/maintenance cost
- transit - what does MEP use? Does NTD have cost data? Google or other API?
- bikes/ebikes?
- shared micromobility?
time:
- use overpass to query OSM?
- what does MEP use?
- general approximation factors?
any other factors? - likely something to pay attention to in the literature

@shankari @jpfleischer for visibility

The text was updated successfully, but these errors were encountered:

shankari · 2024-09-04T21:53:47Z

FYI, I think that the uprm-civic also has replaced mode (scootershare)

jpfleischer · 2024-09-05T18:59:54Z

Hi everyone!
You currently use overpass-api.de but @Abby-Wheelis you said you need routing.
Please find this transitland route in Denver CO https://www.transit.land/routes/r-9xj3-h
Is this what we need?

"public transport routing ... requires timetable data to work properly, and OSM doesn't have that."
https://www.reddit.com/r/openstreetmap/comments/v914h0/comment/ibttco1/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

EDIT: I found that OSM does this
for example

[out:json];
area[name="Gainesville"]->.searchArea;
(
  relation["route"="bus"](area.searchArea);
);
out body;
>;
out skel qt;

https://overpass-turbo.eu

Realtime GTFS

Some places provide realtime data such as boston MBTA https://www.mbta.com/developers/gtfs-realtime
Can we find a free opensource resource that gives all available GTFS Realtime data sources worldwide?

https://github.com/MobilityData/awesome-transit?tab=readme-ov-file
https://mobilitydatabase.org/feeds/mdb-1602
https://mobilitydata.github.io/mobility-feed-api/SwaggerUI/index.html
https://docs.opentripplanner.org/en/latest/

shankari · 2024-09-05T21:08:33Z

@jpfleischer I meant we need routing in the sense of:

I went from my house to the drugstore by car. I want to be able to run a query (ideally via API) that will give me the time and cost of the alternatives (e.g. the equivalent of this

but with cost included

OSM has transit data, and we use transit data from it using overpass for mode detection (look at `emission/net/ext_services ) but it doesn't do routing. OSM-based routing services such as OSRM or GraphHopper typically do not support transit. So we cannot use them to find transit alternatives.

There is an open-source routing engine that takes transit into account (Open Trip Planner)
https://opentransitsoftwarefoundation.org/

We are friends with the OTP folks and have tried using their software before. But for us to use this in a production system, somebody still needs to run the software, load the data, keep it updated, etc. Ideally, there would be an overpass-like system that we could use for routing and that we could pay for if needed. But I am not sure that google maps alternative exists.

Can we find a free opensource resource that gives all available GTFS Realtime data sources worldwide?

transit.land is intended to do that, at least for the US. But somebody needs to load that data

shankari · 2024-09-05T21:18:26Z

One final comment on this: wrt the framing of this problem, we have discussed how there are people's preferences (which are related to the person) and the alternatives (which are related to the environment)

So the same person may make different choices in a different environment (e.g. @jpfleischer taking transit in Boston but not in FL) even though their internalized preferences have not changed.

Just wanted to highlight the flip side of that, which is that different people can have different preferences. While @jpfleischer would not ever take the bus in FL, there are clearly people who do (otherwise, the bus system would have shut down).

For the replaced mode project, we want to understand individual or group preferences, specifically as a set of factors that influence their (assumed rational) choices. We can then apply those preferences to a different set of alternatives (new transit line, no e-bike available, parking restrictions...) and get a sense of how they will behave, and by extension, what the impact of the modification to the alternatives is.

Abby-Wheelis · 2024-09-09T21:18:04Z

@jpfleischer Here is a PR related to the NTD data processing and integration for energy and emissions, maybe similar methods would allow us to extract transit cost? e-mission-common PR

I think the notebooks in metrics/footprint/.archive could be a good place to start

jpfleischer · 2024-09-09T22:26:22Z

@Abby-Wheelis
Average fare collected per passenger is a column here https://data.transportation.gov/Public-Transit/2022-NTD-Annual-Data-Metrics/ekg5-frzt/explore

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Baseline - Predicting Replaced (2nd choice) mode with logistic regression #1087

Baseline - Predicting Replaced (2nd choice) mode with logistic regression #1087

Abby-Wheelis commented Sep 4, 2024 •

edited

Loading

shankari commented Sep 4, 2024

jpfleischer commented Sep 5, 2024 •

edited

Loading

shankari commented Sep 5, 2024 •

edited

Loading

shankari commented Sep 5, 2024 •

edited

Loading

Abby-Wheelis commented Sep 9, 2024

jpfleischer commented Sep 9, 2024

Baseline - Predicting Replaced (2nd choice) mode with logistic regression #1087

Baseline - Predicting Replaced (2nd choice) mode with logistic regression #1087

Comments

Abby-Wheelis commented Sep 4, 2024 • edited Loading

shankari commented Sep 4, 2024

jpfleischer commented Sep 5, 2024 • edited Loading

Realtime GTFS

shankari commented Sep 5, 2024 • edited Loading

shankari commented Sep 5, 2024 • edited Loading

Abby-Wheelis commented Sep 9, 2024

jpfleischer commented Sep 9, 2024

Abby-Wheelis commented Sep 4, 2024 •

edited

Loading

jpfleischer commented Sep 5, 2024 •

edited

Loading

shankari commented Sep 5, 2024 •

edited

Loading

shankari commented Sep 5, 2024 •

edited

Loading