Modulo Process e Pre-process

Il modulo Process e Pre-process si concentra sulla gestione avanzata e sull'elaborazione dei dati del censimento, integrando informazioni amministrative e combinando dati geografici e tabellari per ottenere dataset arricchiti.

`finalize_census_data(census_data_path, years, output_data_folder=None, delete_preprocessed_data=False)`

Finalize census data by merging geodata and tabular data into a single GeoPackage.

For each specified year, this function: 1. Reads geographic data (census sections) from layer census<year>. 2. Reads tabular data from layer data<year>. 3. Removes unnecessary columns from tabular data. 4. Performs a join between geographic and tabular data on key SEZ<year>. 5. Sorts the data and builds a final GeoDataFrame. 6. Saves the result to a GeoPackage named census_data.gpkg, with one layer per year (census<year>).

PARAMETER	DESCRIPTION
`census_data_path`	Folder containing the pre-processed GeoPackage file (`census.gpkg`), generated by previous workflow stages. TYPE: `Path`
`years`	List of census years to finalize (e.g., [1991, 2001, 2011, 2021]). TYPE: `list`
`output_data_folder`	Optional folder where the final GeoPackage `census_data.gpkg` will be saved. If None, the file is saved in `census_data_path`. TYPE: `Path \| None` DEFAULT: `None`
`delete_preprocessed_data`	If True, deletes the `census.gpkg` file after finalization is complete. TYPE: `bool` DEFAULT: `False`

RAISES	DESCRIPTION
`FileNotFoundError`	If the main GeoPackage file `census.gpkg` does not exist in the path specified by `census_data_path`.
`KeyError`	If the join column `SEZ<year>` is not present in the geographic or tabular data, or if the columns to remove are not defined in `census_data[year]['data_columns_to_remove']`.

Note

The join key between geographic and tabular data is dynamic and follows the SEZ<year> convention. Columns to remove from tabular data are defined in census_data[year]['data_columns_to_remove']. The final GeoPackage may contain multiple layers, one per processed year.

`preprocess_census(processed_data_folder, years, output_data_folder=None, delete_download_folder=False, municipalities_code=[])`

Preprocess census data for one or more years, integrating geodata, boundaries, and tables.

This function coordinates the entire census preprocessing workflow, executing for each requested year:

Preprocessing of geographic data (shapefiles or similar).
Preprocessing of tabular data (CSV).
Addition of administrative information (optional).
Saving of resulting data into a GeoPackage.
Optional deletion of the pre-processed data folder.

Required inputs are read from the census_data configuration dictionary, which defines paths, columns, mappings, and specific settings for each census year.

PARAMETER	DESCRIPTION
`processed_data_folder`	Folder containing pre-downloaded or pre-processed data (geodata, tabular data, administrative boundaries). TYPE: `Path`
`years`	List of census years to process (e.g., [1991, 2001, 2011, 2021]). TYPE: `list[int]`
`output_data_folder`	Optional folder where processed data will be saved (GeoPackage and associated layers). If None, the parent folder of `processed_data_folder` is used. TYPE: `Path \| None` DEFAULT: `None`
`delete_download_folder`	If True, deletes the `processed_data_folder` at the end of the process. TYPE: `bool` DEFAULT: `False`
`municipalities_code`	Optional list of ISTAT municipality codes to filter in the data (`PRO_COM` key). If empty, all available municipalities are used. TYPE: `list[int]` DEFAULT: `[]`

RETURNS	DESCRIPTION
`Path`	Path to the final folder containing the processed data.

RAISES	DESCRIPTION
`KeyError`	If the `census_data` dictionary does not contain configuration for a requested year.
`Exception`	For any errors during reading, preprocessing, or saving.

Note

The GeoPackage file is opened and written via sqlite3.connect(). Layers inserted into the GeoPackage follow conventions such as: - sezioni<year> for the geographic layer - data<year> for the tabular dataset - tracciato<year> for the field trace record The municipalities_code parameter is applied during the geodata phase.

`add_administrative_info(census_data, regions_data_path, regions_target_columns, provinces_data_path, provinces_target_columns, municipalities_data_path, municipalities_target_columns)`

Enrich census data with administrative information (municipalities, provinces, regions).

This function integrates corresponding administrative codes and names into the census data, sourced from three external datasets: regional, provincial, and municipal boundaries.

The logical workflow includes: 1. Standardization of census dataset column names. 2. Reading of administrative datasets (regions, provinces, municipalities). 3. Merge of municipalities with provinces. 4. Merge of result with regions. 5. Final join with census dataset on the municipal key (PRO_COM). 6. Cleanup and renaming of final administrative columns.

PARAMETER	DESCRIPTION
`census_data`	Census dataset to which administrative information will be added. TYPE: `DataFrame`
`regions_data_path`	Path to the file containing regional data. TYPE: `Path`
`regions_target_columns`	List of columns to extract from the regional dataset (the first column is used as the index). TYPE: `list`
`provinces_data_path`	Path to the file containing provincial data. TYPE: `Path`
`provinces_target_columns`	List of columns to extract from the provincial dataset (the first column is used as the index). TYPE: `list`
`municipalities_data_path`	Path to the file containing municipal data. TYPE: `Path`
`municipalities_target_columns`	List of columns to extract from the municipal dataset (the first column is used as the index). TYPE: `list`

RETURNS	DESCRIPTION
`DataFrame`	Census DataFrame enriched with administrative information on municipalities,
`DataFrame`	provinces, and regions.

Note

Administrative codes used for merges are assumed to be: PRO_COM (municipality), COD_PROV/COD_PRO (province), COD_REG (region). The function uses read_administrative_boundaries() to load and filter administrative datasets.