Gathers single audits from the federal Single Audit Warehouse (and selected state file repositories), assigns them human readable named and posts them to a public repository.
Python
Switch branches/tags
Nothing to show
Pull request Compare This branch is 63 commits ahead, 20 commits behind josifoski:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
FAC_parms.txt
IL_parms.txt
Illinois Entities.xlsx
README.md
SingleAuditees.xlsx
conf.ini
get_AK.py
get_AZ.py
get_FAC.py
get_FAC_downloadpart.py
get_FAC_rename_upload_part.py
get_FL.py
get_GA.py
get_IA.py
get_IL.py
get_IN.py
get_LA.py
get_ME.py
get_MN.py
get_NC.py
get_ND.py
get_NE.py
get_NY.py
get_UT.py Typo fix Jan 25, 2018
get_VA.py
get_WA.py
requirements.txt
utils.py

README.md

SingleAuditRepo

The goal of this project is to provide a comprehensive, free and regularly updated directory of US local government audited financial statements. The main source is the Federal Audit Clearinghouse, but this will be supplemented from state repositories and potentially other sources.

The files are being stored at http://www.govwiki.info/pdfs. The file naming convention is [SS EEEEE YYYY.pdf] where: SS = Two position state code EEEEE = Name of entity (variable number of positions) YYYY = Fiscal Year

The files are divided into folders for General Purpose governments (cities, counties and states), School Districts, Community College Districts, Public Higher Education and Special Districts. Because many single audit filers are private, not-for-profits, we have also included a Non-Profit folder. Due to classification errors in the Federal Single Audit data set and other technology problems, the classification is imperfect at this time.

Following are descriptions of the download scripts.

get_FAC.py

Script for downloading zip files from Federal Audit Clearinghouse, extracting pdfs from, then renaming files and uploading via FTP

Installation

Script is python3.5+ program
Depends on installed selenium, pyvirtualdisplay, BeautifulSoup4, openpyxl
pip install -U selenium pyvirtualdisplay BeautifulSoup4 openpyxl

Also depends on geckodriver.
geckodriver can be downloaded from
https://github.com/mozilla/geckodriver/releases

Don't forget to fill FAC_parms.txt file with correct values

Note. You can use combination of get_FAC_downloadpart.py and get_FAC.py with todownload value 0 in FAC_parms.txt
in which case get_FAC.py will process zip files stored in dir_downloads
or only get_FAC.py with todownload value 1 in FAC_parms.txt

If some of the pdf file(s) are not renamed, you can use get_FAC_rename_upload_part.py
placing previously in dir_pdfs unrenamed files, also previously preparing FileNameCrossReferenceList.xlsx
as merged document from all FileNameCrossReferenceList.xlsx partials in zip files.
Merged FileNameCrossReferenceList.xlsx should be placed in dir_pdfs directory.

get_IL.py

Script for downloading pdfs from Illinois Comptroller's Warehouse, merging partial pdfs when split up in the warehouse
then renaming files (and eventually uploading via FTP)

Installation

Script depends on openpyxl, pdftk
pip install -U openpyxl

Don't forget to fill IL_parms.txt with correct values

pdftk is used for merging (if more then one) pdf files
on linux it can be installed sudo apt install pdftk
on windows download and install executable from https://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/
pdftk.exe must be startable in dir_pdfs
testing example in dir_pdfs via terminal on windows pdftk.exe file1.pdf file2.pdf cat output newfile.pdf

get_VA.py

Script for downloading pdfs from Virginia Local Government Reports web page Usage:

python get_VA.py --year YEAR --category CATEGORY_NAME

Both arguments are optional.

Installation

pip install -r requirements.txt

get_GA.py

Script for downloading pdfs from Georgia Local Government Reports web page Usage:

python get_GA.py START_YEAR END_YEAR

Both arguments are required.

get_WA.py

Script for downloading pdfs from Washington State Local Government Reports web page Usage:

python get_WA.py START_YEAR END_YEAR

Both arguments are required.

get_AZ.py

Script for downloading pdfs from Arizona Local Government Reports web page Usage:

python get_AZ.py

get_FL.py

Script for downloading pdfs from Florida Local Government Reports web page Usage:

python get_FL.py START_YEAR END_YEAR

Both arguments are required.

get_AK.py

Script for downloading pdfs from Alaska Local Government Reports web page Usage:

python get_AK.py YEAR

YEAR arguments is required.

get_UT.py

Script for downloading pdfs from Utah Local Government Reports web page Usage:

python get_UT.py YEAR

YEAR arguments is required.

Licence

GPL