Skip to content

Download Module

The Download module provides a set of functions to download and manage census data and administrative boundaries. The main functions allow downloading data from specific links, organizing files into folder structures, and handling the automatic extraction and deletion of ZIP files.

download_base(data_link, data_file_path_destination, data_folder, destination_folder)

Download a file from URL, display progress bar, and extract the resulting ZIP.

This function handles the complete workflow for downloading an archive (typically ZIP format) from an HTTP/HTTPS URL. It saves the file to a local path, displays a progress bar using tqdm, extracts the content to the destination folder, and finally removes the compressed file.

PARAMETER DESCRIPTION
data_link

URL from which to download the file (e.g., census data ZIP archive).

TYPE: str

data_file_path_destination

Complete local path where the downloaded compressed file will be saved.

TYPE: Path

data_folder

Folder where the compressed file content will be extracted.

TYPE: Path

destination_folder

Logical destination folder associated with the downloaded dataset. This path is returned at the end for consistency with the calling workflow.

TYPE: Path

RETURNS DESCRIPTION
Path

Path to destination_folder, usable as a reference to the root of the

Path

downloaded and extracted data.

RAISES DESCRIPTION
Exception

If the server returns an HTTP status code other than 200, or if any error occurs during download, file saving, or extraction.

download_all_census_data_1991(output_data_folder, region_list=[])

Download complete census and geographic dataset for the 1991 Census.

This function coordinates all necessary operations to obtain census data and geographic information associated with the 1991 Census. It enables downloading of:

  • Tabular census data
  • Geodata specific to one or more regions
  • Official administrative boundaries

If no value is provided for region_list, geodata for all regions is downloaded.

PARAMETER DESCRIPTION
output_data_folder

Main path where all downloaded and processed data will be saved.

TYPE: Path

region_list

List containing region codes or names for which to download geodata. If empty, all available regions are considered.

TYPE: list DEFAULT: []

RETURNS DESCRIPTION
Path

Path to the root folder containing the downloaded data.

Note

This function operates exclusively on the 1991 Census. It uses support functions such as download_data(), download_geodata(), and download_administrative_boundaries(). The necessary folder structure is created automatically.

download_data(output_data_folder, census_year)

Download, organize, and process census data for a specific year.

This function manages the complete workflow for acquiring census data from source through to producing final CSV files. The following operations are performed:

  1. Retrieval of the dictionary of links for the census year.
  2. Download of raw data via the dwn() function.
  3. Creation of the output folder structure.
  4. Identification and reading of .xls files.
  5. Conversion of Excel files to CSV.
  6. Extraction of tracking metadata (codifications) from the first available file.
  7. Removal of original Excel files.
PARAMETER DESCRIPTION
output_data_folder

Root folder path where downloaded data will be saved.

TYPE: Path

census_year

Reference year for the census data to process.

TYPE: int

RETURNS DESCRIPTION
Path

Path to the folder containing the downloaded and processed census data.

RAISES DESCRIPTION
Exception

If no .xls file is found in the data folder.

Note

Conversion from XLS to CSV is performed via the read_xls() function. Dataset tracking is performed only on the first XLS file found. XLS files are removed at the end of the process to reduce disk space usage.

download_all_census_data_2001(output_data_folder, region_list=[])

Download complete census and geographic dataset for the 2001 Census.

This function coordinates all necessary operations to obtain census data and geographic information associated with the 2001 Census. Specifically, it handles:

  • Downloading tabular census data
  • Downloading geodata (for all regions or a specified subset)
  • Downloading official administrative boundaries

If region_list is empty, geodata for all available regions is downloaded.

PARAMETER DESCRIPTION
output_data_folder

Main path where all downloaded and processed data will be saved.

TYPE: Path

region_list

List of region codes or names for which to download geodata. If left empty, the function considers all regions.

TYPE: list DEFAULT: []

RETURNS DESCRIPTION
Path

Path to the root folder output_data_folder containing the 2001 Census

Path

data structure.

Note

This function is specific to the 2001 Census (the census_year parameter is fixed internally to 2001). It uses support functions: download_data(), download_geodata(), and download_administrative_boundaries(). A preprocessing subfolder is created automatically, defined by the PREPROCESSING_FOLDER constant.

download_administrative_boundaries(output_data_folder, census_year)

Download administrative boundaries for a census year and save to data structure.

This function retrieves from the census dictionary the URL for administrative boundaries (regions, provinces, municipalities), prepares the folder structure dedicated to the census year, and downloads the associated ZIP file. The content is not extracted: the function only performs the download.

PARAMETER DESCRIPTION
output_data_folder

Main path where the census data structure will be created.

TYPE: Path

census_year

Reference census year (e.g., 1991, 2001, 2011).

TYPE: int

RETURNS DESCRIPTION
Path

Path to the folder containing the downloaded administrative boundaries.

RAISES DESCRIPTION
Exception

If an error occurs during folder creation or download.

Note

Administrative boundaries include municipalities, provinces, and regions. The file is saved to the folder defined by the BOUNDARIES_DATA_FOLDER constant. The returned path is the census year folder, not the individual downloaded file.

download_all_census_data_2011(output_data_folder, region_list=[])

Download complete census and geographic dataset for the 2011 Census.

This function coordinates the three fundamental operations necessary to obtain all data for the 2011 Population Census:

  1. Download of tabular census data.
  2. Download of geodata for one or more regions (or all, if region_list is empty).
  3. Download of official administrative boundaries.

In addition to downloading files, the function automatically creates the necessary folder structure within the output directory.

PARAMETER DESCRIPTION
output_data_folder

Main path where all census data will be saved.

TYPE: Path

region_list

List containing region codes or names for which to download geodata. If not provided or empty, the function downloads data for all regions.

TYPE: list DEFAULT: []

RETURNS DESCRIPTION
Path

Path to the root folder containing the 2011 Census data structure.

Note

This is a function specific to the 2011 census (the census_year parameter is fixed internally). It uses support functions: download_data(), download_geodata(), and download_administrative_boundaries(). The internal folder used for preprocessing is determined by the PREPROCESSING_FOLDER constant.

download_data(output_data_folder, census_year)

Download census data for a specific year and organize the working folder.

This function retrieves from the census dictionary the link to data for the specified year, prepares the destination folder structure, and delegates the actual download to the download_base() function. Upon completion, it returns the path to the downloaded file (or folder, depending on how download_base() is implemented).

PARAMETER DESCRIPTION
output_data_folder

Path to the output folder where the census data structure will be created (e.g., preprocessing folder or project).

TYPE: Path

census_year

Reference year for the census data to download (e.g., 1991, 2001, 2011).

TYPE: int

RETURNS DESCRIPTION
Path

Path returned by download_base(), representing the location of the

Path

downloaded census data (file or folder).

RAISES DESCRIPTION
Exception

If an error occurs during link retrieval, folder creation, or data download.

download_geodata(output_data_folder, census_year, region_list=[])

Download census geodata for one or more regions for a census year.

This function retrieves URLs for census geodata packages (ZIP) via get_census_dictionary(), creates the destination folder structure for the census year, and proceeds to download the ZIP files for the requested regions. If region_list is empty, data for all available regions in the dictionary is downloaded (typically 20).

PARAMETER DESCRIPTION
output_data_folder

Root folder where the entire census data structure will be saved.

TYPE: Path

census_year

Reference year for the census (e.g., 1991, 2001, 2011).

TYPE: int

region_list

List of ISTAT region codes for which to download geodata. If empty, geodata for all predefined regions is downloaded.

TYPE: list[int] DEFAULT: []

RETURNS DESCRIPTION
Path

Path to the folder containing the downloaded census geodata ZIP files.

Note

The function does not extract the ZIP files: it only downloads them. Geodata URLs are obtained from the census dictionary via the geodata_urls key. The final data path is organized via the global GEODATA_FOLDER constant.

download_all_census_data_2021(output_data_folder, region_list=[])

Download complete census and geographic dataset for the 2021 Census.

This function performs all necessary operations to obtain 2021 census data, automatically coordinating:

  1. Download of tabular census data.
  2. Download of geodata for requested regions (or all, if region_list is empty).
  3. Download of official administrative boundaries.

The function automatically creates the necessary folder structure to organize the data, defined by the PREPROCESSING_FOLDER constant.

PARAMETER DESCRIPTION
output_data_folder

Main path where 2021 Census data will be saved.

TYPE: Path

region_list

Region codes or names for which to download geodata. If the list is empty, geodata for all regions is downloaded.

TYPE: list DEFAULT: []

RETURNS DESCRIPTION
Path

Path to the root folder containing the complete downloaded data structure

Path

for the 2021 Census.

Note

The census_year parameter is fixed at 2021. It uses support functions: download_data(), download_geodata(), and download_administrative_boundaries().

download_data(output_data_folder, census_year)

Download, organize, and convert census data to processable format.

This function manages the complete workflow of downloading and preparing census data for a specific year. It identifies XLSX files present, performs conversion to CSV, and removes the original files that are no longer needed.

PARAMETER DESCRIPTION
output_data_folder

Path to the main folder where downloaded and processed data will be saved.

TYPE: Path

census_year

Reference year for the census data to download.

TYPE: int

RETURNS DESCRIPTION
Path

Path to the folder containing the downloaded and processed data.

RAISES DESCRIPTION
Exception

If no XLSX file is found in the downloaded data folder.

Note

XLSX files are converted to CSV to facilitate subsequent ETL and analysis phases. Original XLSX files are deleted to reduce disk space usage. The final data structure depends on the global constants DATA_FOLDER and CENSUS_DATA_FOLDER.

download_census(years, output_data_folder, region_list=[])

Download census data for one or more requested years including geodata and boundaries.

This function provides centralized download of census data for years 1991, 2001, 2011, and 2021. For each year, it automatically executes the following procedures:

  • Download of tabular census data
  • Download of geodata (for all regions or specified ones)
  • Download of administrative boundaries

If years is empty, data for all available years is downloaded automatically.

PARAMETER DESCRIPTION
years

List of census years to download. If the list is empty, data for all census years (1991, 2001, 2011, 2021) will be downloaded.

TYPE: list[int]

output_data_folder

Path to the folder where all downloaded data will be saved.

TYPE: Path

region_list

Optional list of region codes or names for which to download geodata. If empty, geodata for all available regions is downloaded.

TYPE: list DEFAULT: []

RAISES DESCRIPTION
ValueError

If one of the specified years is not supported or does not exist in the mapping.