Modulo GeoData
Il modulo Geodata fornisce strumenti per la gestione e il preprocessing dei dati geografici, inclusi confini amministrativi e dati censuari. Le funzioni di questo modulo facilitano la lettura, il filtraggio e la conversione dei dati in formati utilizzabili per l'analisi geografica, come GeoDataFrame e GeoPackage.
preprocess_geodata(census_shp_folder, census_target_columns, census_tipo_loc_mapping, output_folder, census_layer_name, census_column_remapping=None, regions_file_path=None, regions_target_columns=None, regions_index_column=None, regions_column_remapping=None, provinces_file_path=None, provinces_target_columns=None, provinces_index_column=None, provinces_column_remapping=None, municipalities_file_path=None, municipalities_target_columns=None, municipalities_index_column=None, municipalities_column_remapping=None, municipalities_code=[])
Preprocess census geodata and administrative boundaries and save to GeoPackage.
This function executes the complete workflow for preparing geographic data for a census year, combining:
- Reading and normalizing administrative boundaries (regions, provinces, municipalities).
- Optionally correcting missing fields (e.g.,
COD_PROVfor 2021). - Reading and preparing census data (sections) from shapefiles.
- Joining sections with municipalities, provinces, and regions.
- Optionally filtering for a subset of municipalities (
municipalities_code). - Saving the final result to a GeoPackage.
| PARAMETER | DESCRIPTION |
|---|---|
census_shp_folder
|
Folder containing census shapefiles (sections).
TYPE:
|
census_target_columns
|
Columns to select from census data (sections).
TYPE:
|
census_tipo_loc_mapping
|
Mapping for the
TYPE:
|
output_folder
|
Folder where the resulting GeoPackage will be saved.
TYPE:
|
census_layer_name
|
Name of the census layer (e.g.,
TYPE:
|
census_column_remapping
|
Optional mapping to rename census data columns.
TYPE:
|
regions_file_path
|
Optional path to the regional boundaries vector file.
TYPE:
|
regions_target_columns
|
Optional columns to select from regional data.
TYPE:
|
regions_index_column
|
Optional column to use as index for regional data.
TYPE:
|
regions_column_remapping
|
Optional mapping to rename regional data columns.
TYPE:
|
provinces_file_path
|
Optional path to the provincial boundaries vector file.
TYPE:
|
provinces_target_columns
|
Optional columns to select from provincial data.
TYPE:
|
provinces_index_column
|
Optional column to use as index for provincial data.
TYPE:
|
provinces_column_remapping
|
Optional mapping to rename provincial data columns.
TYPE:
|
municipalities_file_path
|
Optional path to the municipal boundaries vector file.
TYPE:
|
municipalities_target_columns
|
Optional columns to select from municipal data.
TYPE:
|
municipalities_index_column
|
Optional column to use as index for municipal data.
TYPE:
|
municipalities_column_remapping
|
Optional mapping to rename municipal data columns.
TYPE:
|
municipalities_code
|
Optional list of ISTAT municipality codes (
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Path
|
Path to the generated GeoPackage containing the census layer enriched |
Path
|
with administrative information. |
Note
The census year is derived from the layer name census_layer_name[6:]
(e.g., census2011 → 2011). For 2021, the COD_PROV column is manually
reconstructed from PRO_COM_T (see repository issue #47). The GeoPackage
is saved as {YEAR_GEODATA_NAME}.gpkg and the layer as
{YEAR_GEODATA_NAME}{census_year}.
read_administrative_boundaries(file_path, target_columns, index_column, column_remapping=None, output_folder=None, layer_name=None)
Read administrative boundaries and return a DataFrame or GeoPackage.
This function reads an administrative boundary file (typically a shapefile), selects a subset of columns, and sets a column as the index. Depending on the provided parameters, it can:
- Return a DataFrame without geometry, sorted and indexed; or
- Save the data as a layer in a GeoPackage, preserving the geometry.
The encoding is derived from the .dbf file associated with the shapefile to avoid issues with accented characters or special symbols.
| PARAMETER | DESCRIPTION |
|---|---|
file_path
|
Path to the vector file (e.g., shapefile) containing administrative boundaries.
TYPE:
|
target_columns
|
List of columns to select from the source dataset. The geometry column is added automatically.
TYPE:
|
index_column
|
Name of the column to use as the DataFrame index (e.g., ISTAT code).
TYPE:
|
column_remapping
|
Optional dictionary to rename columns (e.g.,
TYPE:
|
output_folder
|
Optional output folder where the GeoPackage will be saved. If None, the function returns a DataFrame (without geometry) instead of writing to disk.
TYPE:
|
layer_name
|
Optional name of the layer to use within the GeoPackage. Must
be specified if
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame | Path
|
Either an indexed and sorted DataFrame without geometry column if |
DataFrame | Path
|
|
DataFrame | Path
|
|
Note
The geometry column is automatically added to target_columns via the
GEOMETRY_COLUMN_NAME constant. The GeoPackage is saved with a name
based on the YEAR_GEODATA_NAME constant and contains the layer
specified by layer_name.
read_census(shp_folder, target_columns, tipo_loc_mapping, column_remapping=None, output_folder=None, layer_name=None)
Read census data from shapefiles and return a GeoDataFrame or GeoPackage.
This function recursively searches for all shapefiles in a folder, reads their
data, selects a subset of columns, corrects invalid geometries, adds the
locality type description (derived from tipo_loc_mapping), and builds a
unified GeoDataFrame with all census sections.
Depending on the parameters, it can:
- Return the resulting GeoDataFrame directly; or
- Save the data as a layer in a GeoPackage (
YEAR_GEODATA_NAME.gpkg) and return the path to the created file.
| PARAMETER | DESCRIPTION |
|---|---|
shp_folder
|
Path to the folder containing census shapefiles (recursive
reading via
TYPE:
|
target_columns
|
List of columns to select from each shapefile (must include or be compatible with the geometry column).
TYPE:
|
tipo_loc_mapping
|
Mapping of locality codes for the
TYPE:
|
column_remapping
|
Optional dictionary to rename selected columns
(e.g.,
TYPE:
|
output_folder
|
Optional folder where the resulting GeoPackage will be saved. If None, the function does not write to disk and returns the GeoDataFrame directly.
TYPE:
|
layer_name
|
Optional name of the layer to use within the GeoPackage.
Must be specified if
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
GeoDataFrame | Path
|
Either a |
GeoDataFrame | Path
|
|
GeoDataFrame | Path
|
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If no shapefile is found in the specified folder. |
Note
Geometries are validated with make_valid() to reduce issues caused
by invalid polygons. An area_mq column containing the area in square
meters is calculated. The GeoDataFrame index is set to the first column
in df_columns (typically the census section code).