full scale test #19

fscottfoti · 2014-09-05T19:14:08Z

Should probably synthesize the population of the Bay Area and solve any issues that come up. If it's fast enough we should go for the whole county (why not?).

fscottfoti · 2014-09-07T22:51:47Z

@jiffyclub I gave this a shot. There was at least one block group that needed 15K iterations in the ipu. When I upped it to 20K iterations the Bay Area completed successfully. Right now it's running in about 40 minutes. Seems like checking the results is the next order of business.

jiffyclub · 2014-09-07T23:18:15Z

Nice! A couple thoughts:

I wonder if the convergence criterial in the IPU could be loosened a bit without affecting the final results.
I wonder if addressing the zero-cell thing would make it easier to reach convergence in the IPU.
If we want it to be even faster we can experiment with numba and cython.

As we work on the validation we'll want to track everything we do so it can be publicized. I dunno if maybe a separate repo would be good for that, or if we keep it in this one somewhere.

fscottfoti · 2014-09-07T23:27:38Z

I wonder these things too - we can definitely try it and see.

I agree on publicized validation - I vote for keeping it in this repo - maybe with a notebook (or more than one) that's well annotated, I would guess in a separate directory.

waddell · 2014-09-08T00:10:18Z

Nice progress. 40 minutes includes the sampling of household and person
records and writing the resulting synthetic population out? or just through
the IPU step?

I also like the publicized validation approach, and keeping that on the
same repo sounds good.

On Sun, Sep 7, 2014 at 4:27 PM, Fletcher Foti notifications@github.com
wrote:

I wonder these things too - we can definitely try it and see.

I agree on publicized validation - I vote for keeping it in this repo -
maybe with a notebook (or more than one) that's well annotated, I would
guess in a separate directory.

—
Reply to this email directly or view it on GitHub
#19 (comment).

darebrawley · 2019-03-03T14:26:58Z

Hi -- I'm trying to use SynthPop as part of a research project and am encountering runtime issues.
I'm applying the synthesizer for Mecklenburg County, NC and am getting the following runtime for a single block. Any suggestions?

I was super encouraged to see that @waddell was able to do the full bay area in 40 minutes.

Time to run ipu: 390.129s
IPU weights:
count 3.687000e+03
mean 1.933344e-01
std 4.484030e-01
min 3.711018e-11
25% 4.032434e-06
50% 7.556055e-05
75% 1.988441e-01
max 7.685979e+00
dtype: float64
Fit quality:
4.872272957062106
Number of iterations:
234
Drawing 620 households

The following was achieved by using:

from synthpop.recipes.starter2 import Starter
from synthpop.synthesizer import synthesize_all, enable_logging
import os
import pandas as pd
enable_logging()

# setting API Key
os.environ["CENSUS"] = "d95e144b39e17f929287714b0b8ba9768cecdc9f"
starter = Starter(os.environ["CENSUS"], "NC", "Mecklenburg County")
ind = pd.Series(["37", "119", "005706", "4"], index=["state", "county", "tract", "block group"])
output = synthesize_all(starter, indexes=[ind])
output.to_csv("data/test_synth_output.csv")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

full scale test #19

full scale test #19

fscottfoti commented Sep 5, 2014

fscottfoti commented Sep 7, 2014

jiffyclub commented Sep 7, 2014

fscottfoti commented Sep 7, 2014

waddell commented Sep 8, 2014

darebrawley commented Mar 3, 2019

full scale test #19

full scale test #19

Comments

fscottfoti commented Sep 5, 2014

fscottfoti commented Sep 7, 2014

jiffyclub commented Sep 7, 2014

fscottfoti commented Sep 7, 2014

waddell commented Sep 8, 2014

darebrawley commented Mar 3, 2019