Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

full scale test #19

Open
fscottfoti opened this issue Sep 5, 2014 · 5 comments
Open

full scale test #19

fscottfoti opened this issue Sep 5, 2014 · 5 comments

Comments

@fscottfoti
Copy link
Contributor

Should probably synthesize the population of the Bay Area and solve any issues that come up. If it's fast enough we should go for the whole county (why not?).

@fscottfoti
Copy link
Contributor Author

@jiffyclub I gave this a shot. There was at least one block group that needed 15K iterations in the ipu. When I upped it to 20K iterations the Bay Area completed successfully. Right now it's running in about 40 minutes. Seems like checking the results is the next order of business.

@jiffyclub
Copy link
Member

Nice! A couple thoughts:

  • I wonder if the convergence criterial in the IPU could be loosened a bit without affecting the final results.
  • I wonder if addressing the zero-cell thing would make it easier to reach convergence in the IPU.
  • If we want it to be even faster we can experiment with numba and cython.

As we work on the validation we'll want to track everything we do so it can be publicized. I dunno if maybe a separate repo would be good for that, or if we keep it in this one somewhere.

@fscottfoti
Copy link
Contributor Author

I wonder these things too - we can definitely try it and see.

I agree on publicized validation - I vote for keeping it in this repo - maybe with a notebook (or more than one) that's well annotated, I would guess in a separate directory.

@waddell
Copy link
Member

waddell commented Sep 8, 2014

Nice progress. 40 minutes includes the sampling of household and person
records and writing the resulting synthetic population out? or just through
the IPU step?

I also like the publicized validation approach, and keeping that on the
same repo sounds good.

On Sun, Sep 7, 2014 at 4:27 PM, Fletcher Foti notifications@github.com
wrote:

I wonder these things too - we can definitely try it and see.

I agree on publicized validation - I vote for keeping it in this repo -
maybe with a notebook (or more than one) that's well annotated, I would
guess in a separate directory.


Reply to this email directly or view it on GitHub
#19 (comment).

@darebrawley
Copy link

Hi -- I'm trying to use SynthPop as part of a research project and am encountering runtime issues.
I'm applying the synthesizer for Mecklenburg County, NC and am getting the following runtime for a single block. Any suggestions?

I was super encouraged to see that @waddell was able to do the full bay area in 40 minutes.

Time to run ipu: 390.129s
IPU weights:
count 3.687000e+03
mean 1.933344e-01
std 4.484030e-01
min 3.711018e-11
25% 4.032434e-06
50% 7.556055e-05
75% 1.988441e-01
max 7.685979e+00
dtype: float64
Fit quality:
4.872272957062106
Number of iterations:
234
Drawing 620 households

The following was achieved by using:

from synthpop.recipes.starter2 import Starter
from synthpop.synthesizer import synthesize_all, enable_logging
import os
import pandas as pd
enable_logging()

# setting API Key
os.environ["CENSUS"] = "d95e144b39e17f929287714b0b8ba9768cecdc9f"
starter = Starter(os.environ["CENSUS"], "NC", "Mecklenburg County")
ind = pd.Series(["37", "119", "005706", "4"], index=["state", "county", "tract", "block group"])
output = synthesize_all(starter, indexes=[ind])
output.to_csv("data/test_synth_output.csv")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants