Download Module
The Download module provides a set of functions to download and manage census data and administrative boundaries. The main functions allow downloading data from specific links, organizing files into folder structures, and handling the automatic extraction and deletion of ZIP files.
download_base(data_link, data_file_path_destination, data_folder, destination_folder)
Download a file from URL, display progress bar, and extract the resulting ZIP.
This function handles the complete workflow for downloading an archive (typically
ZIP format) from an HTTP/HTTPS URL. It saves the file to a local path, displays
a progress bar using tqdm, extracts the content to the destination folder, and
finally removes the compressed file.
| PARAMETER | DESCRIPTION |
|---|---|
data_link
|
URL from which to download the file (e.g., census data ZIP archive).
TYPE:
|
data_file_path_destination
|
Complete local path where the downloaded compressed file will be saved.
TYPE:
|
data_folder
|
Folder where the compressed file content will be extracted.
TYPE:
|
destination_folder
|
Logical destination folder associated with the downloaded dataset. This path is returned at the end for consistency with the calling workflow.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Path
|
Path to |
Path
|
downloaded and extracted data. |
| RAISES | DESCRIPTION |
|---|---|
Exception
|
If the server returns an HTTP status code other than 200, or if any error occurs during download, file saving, or extraction. |
download_all_census_data_1991(output_data_folder, region_list=[])
Download complete census and geographic dataset for the 1991 Census.
This function coordinates all necessary operations to obtain census data and geographic information associated with the 1991 Census. It enables downloading of:
- Tabular census data
- Geodata specific to one or more regions
- Official administrative boundaries
If no value is provided for region_list, geodata for all regions is
downloaded.
| PARAMETER | DESCRIPTION |
|---|---|
output_data_folder
|
Main path where all downloaded and processed data will be saved.
TYPE:
|
region_list
|
List containing region codes or names for which to download geodata. If empty, all available regions are considered.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Path
|
Path to the root folder containing the downloaded data. |
Note
This function operates exclusively on the 1991 Census.
It uses support functions such as download_data(),
download_geodata(), and download_administrative_boundaries().
The necessary folder structure is created automatically.
download_data(output_data_folder, census_year)
Download, organize, and process census data for a specific year.
This function manages the complete workflow for acquiring census data from source through to producing final CSV files. The following operations are performed:
- Retrieval of the dictionary of links for the census year.
- Download of raw data via the
dwn()function. - Creation of the output folder structure.
- Identification and reading of
.xlsfiles. - Conversion of Excel files to CSV.
- Extraction of tracking metadata (codifications) from the first available file.
- Removal of original Excel files.
| PARAMETER | DESCRIPTION |
|---|---|
output_data_folder
|
Root folder path where downloaded data will be saved.
TYPE:
|
census_year
|
Reference year for the census data to process.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Path
|
Path to the folder containing the downloaded and processed census data. |
| RAISES | DESCRIPTION |
|---|---|
Exception
|
If no |
Note
Conversion from XLS to CSV is performed via the read_xls() function.
Dataset tracking is performed only on the first XLS file found.
XLS files are removed at the end of the process to reduce disk space usage.
download_all_census_data_2001(output_data_folder, region_list=[])
Download complete census and geographic dataset for the 2001 Census.
This function coordinates all necessary operations to obtain census data and geographic information associated with the 2001 Census. Specifically, it handles:
- Downloading tabular census data
- Downloading geodata (for all regions or a specified subset)
- Downloading official administrative boundaries
If region_list is empty, geodata for all available regions is downloaded.
| PARAMETER | DESCRIPTION |
|---|---|
output_data_folder
|
Main path where all downloaded and processed data will be saved.
TYPE:
|
region_list
|
List of region codes or names for which to download geodata. If left empty, the function considers all regions.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Path
|
Path to the root folder |
Path
|
data structure. |
Note
This function is specific to the 2001 Census (the census_year parameter
is fixed internally to 2001).
It uses support functions: download_data(), download_geodata(), and
download_administrative_boundaries().
A preprocessing subfolder is created automatically, defined by the
PREPROCESSING_FOLDER constant.
download_administrative_boundaries(output_data_folder, census_year)
Download administrative boundaries for a census year and save to data structure.
This function retrieves from the census dictionary the URL for administrative boundaries (regions, provinces, municipalities), prepares the folder structure dedicated to the census year, and downloads the associated ZIP file. The content is not extracted: the function only performs the download.
| PARAMETER | DESCRIPTION |
|---|---|
output_data_folder
|
Main path where the census data structure will be created.
TYPE:
|
census_year
|
Reference census year (e.g., 1991, 2001, 2011).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Path
|
Path to the folder containing the downloaded administrative boundaries. |
| RAISES | DESCRIPTION |
|---|---|
Exception
|
If an error occurs during folder creation or download. |
Note
Administrative boundaries include municipalities, provinces, and regions.
The file is saved to the folder defined by the BOUNDARIES_DATA_FOLDER
constant.
The returned path is the census year folder, not the individual downloaded file.
download_all_census_data_2011(output_data_folder, region_list=[])
Download complete census and geographic dataset for the 2011 Census.
This function coordinates the three fundamental operations necessary to obtain all data for the 2011 Population Census:
- Download of tabular census data.
- Download of geodata for one or more regions (or all, if
region_listis empty). - Download of official administrative boundaries.
In addition to downloading files, the function automatically creates the necessary folder structure within the output directory.
| PARAMETER | DESCRIPTION |
|---|---|
output_data_folder
|
Main path where all census data will be saved.
TYPE:
|
region_list
|
List containing region codes or names for which to download geodata. If not provided or empty, the function downloads data for all regions.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Path
|
Path to the root folder containing the 2011 Census data structure. |
Note
This is a function specific to the 2011 census (the census_year parameter
is fixed internally).
It uses support functions: download_data(), download_geodata(),
and download_administrative_boundaries().
The internal folder used for preprocessing is determined by the
PREPROCESSING_FOLDER constant.
download_data(output_data_folder, census_year)
Download census data for a specific year and organize the working folder.
This function retrieves from the census dictionary the link to data for the
specified year, prepares the destination folder structure, and delegates
the actual download to the download_base() function. Upon completion,
it returns the path to the downloaded file (or folder, depending on how
download_base() is implemented).
| PARAMETER | DESCRIPTION |
|---|---|
output_data_folder
|
Path to the output folder where the census data structure will be created (e.g., preprocessing folder or project).
TYPE:
|
census_year
|
Reference year for the census data to download (e.g., 1991, 2001, 2011).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Path
|
Path returned by |
Path
|
downloaded census data (file or folder). |
| RAISES | DESCRIPTION |
|---|---|
Exception
|
If an error occurs during link retrieval, folder creation, or data download. |
download_geodata(output_data_folder, census_year, region_list=[])
Download census geodata for one or more regions for a census year.
This function retrieves URLs for census geodata packages (ZIP) via
get_census_dictionary(), creates the destination folder structure for the
census year, and proceeds to download the ZIP files for the requested regions.
If region_list is empty, data for all available regions in the dictionary
is downloaded (typically 20).
| PARAMETER | DESCRIPTION |
|---|---|
output_data_folder
|
Root folder where the entire census data structure will be saved.
TYPE:
|
census_year
|
Reference year for the census (e.g., 1991, 2001, 2011).
TYPE:
|
region_list
|
List of ISTAT region codes for which to download geodata. If empty, geodata for all predefined regions is downloaded.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Path
|
Path to the folder containing the downloaded census geodata ZIP files. |
Note
The function does not extract the ZIP files: it only downloads them.
Geodata URLs are obtained from the census dictionary via the
geodata_urls key.
The final data path is organized via the global GEODATA_FOLDER constant.
download_all_census_data_2021(output_data_folder, region_list=[])
Download complete census and geographic dataset for the 2021 Census.
This function performs all necessary operations to obtain 2021 census data, automatically coordinating:
- Download of tabular census data.
- Download of geodata for requested regions (or all, if
region_listis empty). - Download of official administrative boundaries.
The function automatically creates the necessary folder structure to organize
the data, defined by the PREPROCESSING_FOLDER constant.
| PARAMETER | DESCRIPTION |
|---|---|
output_data_folder
|
Main path where 2021 Census data will be saved.
TYPE:
|
region_list
|
Region codes or names for which to download geodata. If the list is empty, geodata for all regions is downloaded.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Path
|
Path to the root folder containing the complete downloaded data structure |
Path
|
for the 2021 Census. |
Note
The census_year parameter is fixed at 2021.
It uses support functions: download_data(), download_geodata(),
and download_administrative_boundaries().
download_data(output_data_folder, census_year)
Download, organize, and convert census data to processable format.
This function manages the complete workflow of downloading and preparing census data for a specific year. It identifies XLSX files present, performs conversion to CSV, and removes the original files that are no longer needed.
| PARAMETER | DESCRIPTION |
|---|---|
output_data_folder
|
Path to the main folder where downloaded and processed data will be saved.
TYPE:
|
census_year
|
Reference year for the census data to download.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Path
|
Path to the folder containing the downloaded and processed data. |
| RAISES | DESCRIPTION |
|---|---|
Exception
|
If no XLSX file is found in the downloaded data folder. |
Note
XLSX files are converted to CSV to facilitate subsequent ETL and
analysis phases.
Original XLSX files are deleted to reduce disk space usage.
The final data structure depends on the global constants
DATA_FOLDER and CENSUS_DATA_FOLDER.
download_census(years, output_data_folder, region_list=[])
Download census data for one or more requested years including geodata and boundaries.
This function provides centralized download of census data for years 1991, 2001, 2011, and 2021. For each year, it automatically executes the following procedures:
- Download of tabular census data
- Download of geodata (for all regions or specified ones)
- Download of administrative boundaries
If years is empty, data for all available years is downloaded automatically.
| PARAMETER | DESCRIPTION |
|---|---|
years
|
List of census years to download. If the list is empty, data for all census years (1991, 2001, 2011, 2021) will be downloaded.
TYPE:
|
output_data_folder
|
Path to the folder where all downloaded data will be saved.
TYPE:
|
region_list
|
Optional list of region codes or names for which to download geodata. If empty, geodata for all available regions is downloaded.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If one of the specified years is not supported or does not exist in the mapping. |