Vai al contenuto

Modulo Process e Pre-process

Il modulo Process e Pre-process si concentra sulla gestione avanzata e sull'elaborazione dei dati del censimento, integrando informazioni amministrative e combinando dati geografici e tabellari per ottenere dataset arricchiti.

finalize_census_data(census_data_path, years, output_data_folder=None, delete_preprocessed_data=False)

Finalize census data by merging geodata and tabular data into a single GeoPackage.

For each specified year, this function: 1. Reads geographic data (census sections) from layer census<year>. 2. Reads tabular data from layer data<year>. 3. Removes unnecessary columns from tabular data. 4. Performs a join between geographic and tabular data on key SEZ<year>. 5. Sorts the data and builds a final GeoDataFrame. 6. Saves the result to a GeoPackage named census_data.gpkg, with one layer per year (census<year>).

PARAMETER DESCRIPTION
census_data_path

Folder containing the pre-processed GeoPackage file (census.gpkg), generated by previous workflow stages.

TYPE: Path

years

List of census years to finalize (e.g., [1991, 2001, 2011, 2021]).

TYPE: list

output_data_folder

Optional folder where the final GeoPackage census_data.gpkg will be saved. If None, the file is saved in census_data_path.

TYPE: Path | None DEFAULT: None

delete_preprocessed_data

If True, deletes the census.gpkg file after finalization is complete.

TYPE: bool DEFAULT: False

RAISES DESCRIPTION
FileNotFoundError

If the main GeoPackage file census.gpkg does not exist in the path specified by census_data_path.

KeyError

If the join column SEZ<year> is not present in the geographic or tabular data, or if the columns to remove are not defined in census_data[year]['data_columns_to_remove'].

Note

The join key between geographic and tabular data is dynamic and follows the SEZ<year> convention. Columns to remove from tabular data are defined in census_data[year]['data_columns_to_remove']. The final GeoPackage may contain multiple layers, one per processed year.

preprocess_census(processed_data_folder, years, output_data_folder=None, delete_download_folder=False, municipalities_code=[])

Preprocess census data for one or more years, integrating geodata, boundaries, and tables.

This function coordinates the entire census preprocessing workflow, executing for each requested year:

  1. Preprocessing of geographic data (shapefiles or similar).
  2. Preprocessing of tabular data (CSV).
  3. Addition of administrative information (optional).
  4. Saving of resulting data into a GeoPackage.
  5. Optional deletion of the pre-processed data folder.

Required inputs are read from the census_data configuration dictionary, which defines paths, columns, mappings, and specific settings for each census year.

PARAMETER DESCRIPTION
processed_data_folder

Folder containing pre-downloaded or pre-processed data (geodata, tabular data, administrative boundaries).

TYPE: Path

years

List of census years to process (e.g., [1991, 2001, 2011, 2021]).

TYPE: list[int]

output_data_folder

Optional folder where processed data will be saved (GeoPackage and associated layers). If None, the parent folder of processed_data_folder is used.

TYPE: Path | None DEFAULT: None

delete_download_folder

If True, deletes the processed_data_folder at the end of the process.

TYPE: bool DEFAULT: False

municipalities_code

Optional list of ISTAT municipality codes to filter in the data (PRO_COM key). If empty, all available municipalities are used.

TYPE: list[int] DEFAULT: []

RETURNS DESCRIPTION
Path

Path to the final folder containing the processed data.

RAISES DESCRIPTION
KeyError

If the census_data dictionary does not contain configuration for a requested year.

Exception

For any errors during reading, preprocessing, or saving.

Note

The GeoPackage file is opened and written via sqlite3.connect(). Layers inserted into the GeoPackage follow conventions such as: - sezioni<year> for the geographic layer - data<year> for the tabular dataset - tracciato<year> for the field trace record The municipalities_code parameter is applied during the geodata phase.

add_administrative_info(census_data, regions_data_path, regions_target_columns, provinces_data_path, provinces_target_columns, municipalities_data_path, municipalities_target_columns)

Enrich census data with administrative information (municipalities, provinces, regions).

This function integrates corresponding administrative codes and names into the census data, sourced from three external datasets: regional, provincial, and municipal boundaries.

The logical workflow includes: 1. Standardization of census dataset column names. 2. Reading of administrative datasets (regions, provinces, municipalities). 3. Merge of municipalities with provinces. 4. Merge of result with regions. 5. Final join with census dataset on the municipal key (PRO_COM). 6. Cleanup and renaming of final administrative columns.

PARAMETER DESCRIPTION
census_data

Census dataset to which administrative information will be added.

TYPE: DataFrame

regions_data_path

Path to the file containing regional data.

TYPE: Path

regions_target_columns

List of columns to extract from the regional dataset (the first column is used as the index).

TYPE: list

provinces_data_path

Path to the file containing provincial data.

TYPE: Path

provinces_target_columns

List of columns to extract from the provincial dataset (the first column is used as the index).

TYPE: list

municipalities_data_path

Path to the file containing municipal data.

TYPE: Path

municipalities_target_columns

List of columns to extract from the municipal dataset (the first column is used as the index).

TYPE: list

RETURNS DESCRIPTION
DataFrame

Census DataFrame enriched with administrative information on municipalities,

DataFrame

provinces, and regions.

Note

Administrative codes used for merges are assumed to be: PRO_COM (municipality), COD_PROV/COD_PRO (province), COD_REG (region). The function uses read_administrative_boundaries() to load and filter administrative datasets.