Release and Version History#
x.y.z (Backlog)#
Features and Improvements
Minor Improvements
Bugfixes
Miscellaneous
0.3.2 (2024-08-31)#
Bugfixes
Fix a bug that when writing the dataframe to final datalake, it should write to the root S3 folder, not the partition folder.
Fix a bug that when dealing without partition, it fail to print the right s3 uri logging.
Fix a bug that when validating the datalake in deltalake format, it uses the wrong storage options from the writer.
0.3.1 (2024-08-28)#
💥Breaking Changes
- Removed the following public APIs. We no longer uses parameter to custom the
batch_read_snapshot_data_file_funclogic, all the data transformation logic should be implemented in thebatch_read_snapshot_data_file_funcfunction. dbsnaplake.api.T_EXTRACTORdbsnaplake.api.DerivedColumn
- Removed the following public APIs. We no longer uses parameter to custom the
- Removed the following writer. We start using polars_writer to write parquet files.
dbsnaplake.api.write_parquet_to_s3dbsnaplake.api.write_data_file
- Add
polars_writerparameter to the following API: dbsnaplake.api.step_2_3_process_partition_file_group_manifest_filedbsnaplake.api.Project
- Add
Features and Improvements
No longer force to use parquet as the datalake format. Now you can use any format that supported by
polars.Add support for deltalake format.
Allow to skip creating the datalake. This is useful when the user only wants to export the data but not to create the datalake.
- Add the following public APIs:
dbsnaplake.api.constants.S3_METADATA_KEY_N_RECORD
0.2.1 (2024-08-16)#
Minor Improvements
Now the
columnparameter is optional indbsnaplake.api.validate_datalake.- Add the following public APIs:
dbsnaplake.api.S3Location.s3path_validate_datalake_resultdbsnaplake.api.step_3_1_validate_datalakedbsnaplake.api.Project.step_3_1_validate_datalake
0.1.2 (2024-08-16)#
Features and Improvements
- Add the following public APIs that forgot to add:
dbsnaplake.api.ValidateDatalakeResultdbsnaplake.api.validate_datalake
0.1.1 (2024-08-15)#
First release
Add the following public APIs:
dbsnaplake.api.constantsdbsnaplake.api.constants.COL_RECORD_IDdbsnaplake.api.constants.COL_CREATE_TIMEdbsnaplake.api.constants.COL_UPDATE_TIMEdbsnaplake.api.constants.S3_METADATA_KEY_SIZEdbsnaplake.api.constants.S3_METADATA_KEY_N_RECORDdbsnaplake.api.constants.S3_METADATA_KEY_SNAPSHOT_DATA_FILEdbsnaplake.api.constants.S3_METADATA_KEY_STAGING_PARTITIONdbsnaplake.api.constants.MANIFESTS_FOLDERdbsnaplake.api.constants.DATALAKE_FOLDERdbsnaplake.api.constants.SNAPSHOT_FILE_GROUPS_FOLDERdbsnaplake.api.constants.STAGING_FILE_GROUPS_FOLDERdbsnaplake.api.constants.PARTITION_FILE_GROUPS_FOLDERdbsnaplake.api.constants.MANIFEST_SUMMARY_FOLDERdbsnaplake.api.constants.MANIFEST_DATA_FOLDERdbsnaplake.api.T_RECORDdbsnaplake.api.T_DF_SCHEMAdbsnaplake.api.T_EXTRACTORdbsnaplake.api.T_OPTIONAL_KWARGSdbsnaplake.api.repr_data_sizedbsnaplake.api.S3Locationdbsnaplake.api.Partitiondbsnaplake.api.extract_partition_datadbsnaplake.api.encode_hive_partitiondbsnaplake.api.get_s3dir_partitiondbsnaplake.api.get_partitionsdbsnaplake.api.write_parquet_to_s3dbsnaplake.api.write_data_filedbsnaplake.api.read_parquet_from_s3dbsnaplake.api.read_many_parquet_from_s3dbsnaplake.api.group_by_partitiondbsnaplake.api.get_merged_schemadbsnaplake.api.harmonize_schemasdbsnaplake.api.dummy_loggerdbsnaplake.api.DBSnapshotManifestFiledbsnaplake.api.DBSnapshotManifestFile.split_into_groupsdbsnaplake.api.DBSnapshotFileGroupManifestFiledbsnaplake.api.DBSnapshotFileGroupManifestFile.read_all_groupsdbsnaplake.api.DerivedColumndbsnaplake.api.StagingFileGroupManifestFiledbsnaplake.api.T_BatchReadSnapshotDataFileCallabledbsnaplake.api.process_db_snapshot_file_group_manifest_filedbsnaplake.api.extract_s3_directorydbsnaplake.api.PartitionFileGroupManifestFiledbsnaplake.api.PartitionFileGroupManifestFile.plan_partition_compactiondbsnaplake.api.PartitionFileGroupManifestFile.read_all_groupsdbsnaplake.api.process_partition_file_group_manifest_filedbsnaplake.api.T_TASKdbsnaplake.api.create_orm_modeldbsnaplake.api.step_1_1_plan_snapshot_to_stagingdbsnaplake.api.step_1_2_get_snapshot_to_staging_todo_listdbsnaplake.api.step_1_3_process_db_snapshot_file_group_manifest_filedbsnaplake.api.step_2_1_plan_staging_to_datalakedbsnaplake.api.step_2_2_get_staging_to_datalake_todo_listdbsnaplake.api.step_2_3_process_partition_file_group_manifest_filedbsnaplake.api.loggerdbsnaplake.api.Projectdbsnaplake.api.Project.step_1_1_plan_snapshot_to_stagingdbsnaplake.api.Project.step_1_2_process_db_snapshot_file_group_manifest_filedbsnaplake.api.Project.step_2_1_plan_staging_to_datalakedbsnaplake.api.Project.step_2_2_process_partition_file_group_manifest_file