Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Baseline - Predicting Replaced (2nd choice) mode with logistic regression #1087

Open
Abby-Wheelis opened this issue Sep 4, 2024 · 6 comments

Comments

@Abby-Wheelis
Copy link
Member

Abby-Wheelis commented Sep 4, 2024

There are two main components to predicting mode choice with a choice model:

  1. The choice model itself, representing people's preferences ie Abby's preferences for travel are: (time: -2, cost: -5, fun: 10)
  2. The possible modes for the trip ie [Car: (cost=15$, time=10min, fun=0), E-bike:(cost=1$, time=15min, fun=1), walk:(cost=0$, time=60min, fun=-1)]

With these factors we can predict that Abby would choose e-bike (-25) and without the e-bike would choose car(-95) but wouldn't choose walk (-130), approximately.

As a baseline, we want to build a logistic regression model, since that is what is most commonly used in research and planning to model mode choice (ie what would the ridership returns on this transit investment be like?).

We have ground truth data about 2nd choice modes, through the replaced mode collected by programs that have a mode of interest, often e-bike. This is used to show the impact of the mode of interest, through things like emissions savings/reductions which we map on the public dashboard.

To build up the alternatives, we'll need a few different pieces of data, which could be complex to figure out:

  • what modes are available:
    • initial demographic survey asks people about their options - "do you have a license" "what modes are available to you"
    • we can check for transit availability: NTD? What does MEP use? Google or other API?
      • same method as Jack implemented for carbon/energy? (No because there are busses in Golden that could get me to work, but not to the climbing gym, we need routing)
  • cost:
    • cars - reimbursement rate to account for amortized ownership/maintenance cost
    • transit - what does MEP use? Does NTD have cost data? Google or other API?
    • bikes/ebikes?
    • shared micromobility?
  • time:
    • use overpass to query OSM?
    • what does MEP use?
    • general approximation factors?
  • any other factors? - likely something to pay attention to in the literature

@shankari @jpfleischer for visibility

@shankari
Copy link
Contributor

shankari commented Sep 4, 2024

FYI, I think that the uprm-civic also has replaced mode (scootershare)

@jpfleischer
Copy link

jpfleischer commented Sep 5, 2024

Hi everyone!
You currently use overpass-api.de but @Abby-Wheelis you said you need routing.
Please find this transitland route in Denver CO https://www.transit.land/routes/r-9xj3-h
Is this what we need?

"public transport routing ... requires timetable data to work properly, and OSM doesn't have that."
https://www.reddit.com/r/openstreetmap/comments/v914h0/comment/ibttco1/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

EDIT: I found that OSM does this
for example

[out:json];
area[name="Gainesville"]->.searchArea;
(
  relation["route"="bus"](area.searchArea);
);
out body;
>;
out skel qt;

https://overpass-turbo.eu

Realtime GTFS

Some places provide realtime data such as boston MBTA https://www.mbta.com/developers/gtfs-realtime
Can we find a free opensource resource that gives all available GTFS Realtime data sources worldwide?

https://github.com/MobilityData/awesome-transit?tab=readme-ov-file
https://mobilitydatabase.org/feeds/mdb-1602
https://mobilitydata.github.io/mobility-feed-api/SwaggerUI/index.html
https://docs.opentripplanner.org/en/latest/

@shankari
Copy link
Contributor

shankari commented Sep 5, 2024

@jpfleischer I meant we need routing in the sense of:

I went from my house to the drugstore by car. I want to be able to run a query (ideally via API) that will give me the time and cost of the alternatives (e.g. the equivalent of this
Screenshot 2024-09-05 at 1 57 53 PM

but with cost included

OSM has transit data, and we use transit data from it using overpass for mode detection (look at `emission/net/ext_services ) but it doesn't do routing. OSM-based routing services such as OSRM or GraphHopper typically do not support transit. So we cannot use them to find transit alternatives.

Screenshot 2024-09-05 at 1 59 53 PM

There is an open-source routing engine that takes transit into account (Open Trip Planner)
https://opentransitsoftwarefoundation.org/

We are friends with the OTP folks and have tried using their software before. But for us to use this in a production system, somebody still needs to run the software, load the data, keep it updated, etc. Ideally, there would be an overpass-like system that we could use for routing and that we could pay for if needed. But I am not sure that google maps alternative exists.

Can we find a free opensource resource that gives all available GTFS Realtime data sources worldwide?

transit.land is intended to do that, at least for the US. But somebody needs to load that data

@shankari
Copy link
Contributor

shankari commented Sep 5, 2024

One final comment on this: wrt the framing of this problem, we have discussed how there are people's preferences (which are related to the person) and the alternatives (which are related to the environment)

So the same person may make different choices in a different environment (e.g. @jpfleischer taking transit in Boston but not in FL) even though their internalized preferences have not changed.

Just wanted to highlight the flip side of that, which is that different people can have different preferences. While @jpfleischer would not ever take the bus in FL, there are clearly people who do (otherwise, the bus system would have shut down).

For the replaced mode project, we want to understand individual or group preferences, specifically as a set of factors that influence their (assumed rational) choices. We can then apply those preferences to a different set of alternatives (new transit line, no e-bike available, parking restrictions...) and get a sense of how they will behave, and by extension, what the impact of the modification to the alternatives is.

@Abby-Wheelis
Copy link
Member Author

@jpfleischer Here is a PR related to the NTD data processing and integration for energy and emissions, maybe similar methods would allow us to extract transit cost? e-mission-common PR

I think the notebooks in metrics/footprint/.archive could be a good place to start

@jpfleischer
Copy link

@Abby-Wheelis
Average fare collected per passenger is a column here https://data.transportation.gov/Public-Transit/2022-NTD-Annual-Data-Metrics/ekg5-frzt/explore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants