Changelog#
Version 0.0.20 (2022-12-30)#
Highlights of this release:
Hightly support H3 (Hexagonal hierarchical geospatial indexing system) via .to_h3
and .H3.*
.
>>> import dtoolkit.geoaccessor
>>> import pandas as pd
>>> df = pd.DataFrame({"x": [122, 100], "y": [55, 1]}).from_xy('x', 'y', crs=4326)
>>> df
x y geometry
0 122 55 POINT (122.00000 55.00000)
1 100 1 POINT (100.00000 1.00000)
# GeoDataFrame -> h3 cell
>>> df_with_h3 = df.to_h3(8)
>>> df_with_h3
x y geometry
612845052823076863 122 55 POINT (122.00000 55.00000)
614269156845420543 100 1 POINT (100.00000 1.00000)
# Calculate h3 cell area
>>> df_with_h3.h3.area
612845052823076863 710781.770906
614269156845420543 852134.191671
dtype: float64
# h3 cell -> GeoDataFrame
>>> df_parent_cell = df_with_h3.h3.to_parent()
>>> df_parent_cell
x y geometry
608341453197803519 122 55 POINT (122.00000 55.00000)
609765557230632959 100 1 POINT (100.00000 1.00000)
>>> df_parent_cell.h3.to_points()
x y geometry
608341453197803519 122 55 POINT (122.00991 55.00606)
609765557230632959 100 1 POINT (100.00504 0.99852)
New features and improvements:
PR#739, PR#800, PR#817, PR#825: New geoaccessor
to_h3()
to convert geometry to h3 index.PR#778: Speed up
textdistance_matrix()
.PR#779, PR#811, PR#819: New geoaccessor
H3
to handle h3’s geohash.PR#801: New accessor for Series
invert_or_not()
.PR#803: New geoaccessor
select_geom_type()
.
Small bug-fix:
PR#780: Fix
to_geoframe()
’s geometry isGeoSeries
.PR#816: Fix
to_geoframe()
result CRS is missing.PR#822:
to_geoframe()
supports replacing old geometry.PR#824: Fix inputting
GeoDataFrame
butrepeat()
returnDataFrame
.
API changes:
PR#807:
get_coordinates()
->coordinates()
.PR#814: Drop keyword argument
drop
.
Documentation:
Maintenance development:
PR#774: pre-commit hooks autoupdate.
PR#798: Remove pygeos dependency from dtoolkit.
PR#805: Remove
ci/env/311-latest-shapely2.yaml
.PR#806: Compat pandas 2.x.
PR#810: Remove
dtoolkit.accessor.series._getattr_helper.py
.PR#812: Add blank lines.
PR#813: Remove 0.0.19 version warning information.
PR#818: Simplify import shapely object
from shapely.geometry import xxx
->from shapely import xxx
.
Version 0.0.19 (2022-12-11)#
Highlights of this release:
PR#772: Simplify importing
import dtoolkit
==import dtoolkit.accessor
.
New features and improvements:
PR#724: New accessor for Series to calculate text distance
textdistance()
.PR#745:
duplicated_geometry()
’spredicate
support to directly compare value.PR#768: New accessor
change_axis_type()
.
Small bug-fix:
PR#576: Fix
DataFrame.append
’s FutureWarning.PR#765: Fix sklearn pipeline visualization can’t print
OneHotEncoder
.PR#776: After v0.0.17 github release page don’t have tarball file anymore.
API changes:
PR#762: Drop
columns
arguments forerror_report
.
Documentation:
Maintenance development:
PR#737: Leave TODO marks for deleting pygeos.
PR#746: All envs will get daily test.
PR#747: Set a env to test that dtoolkit works with only base dependencies.
PR#749: Use
.is_monotonic_increasing
replace.is_monotonic
.PR#751, PR#756, PR#773, PR#781, PR#787, PR#795: Compat shapely 2.x.
PR#763: Simplify versioneer updating CI.
PR#764: versioneer updating only works on main branch.
PR#770: Minimal environments only test base features.
PR#775: Remove
set-output
from github actions yaml files.PR#791: Compat with pandas 2.x.
Version 0.0.18 (2022-10-14)#
New features and improvements:
PR#721: New accessor for
Series
to convert datetime type,to_datetime()
.PR#715: New accessor
equal()
to compare pandas-object with other.PR#712: Support use
DataFrame
’s column as the distance forgeobuffer()
.PR#711, PR#713: New geoaccessor for GeoSeries to return tuple of coordinates
(x, y)
,xy()
.PR#701, PR#704, PR#705, PR#706: New geoaccessor to generate great circle distances matrix,
geodistance_matrix()
.PR#699, PR#702, PR#707, PR#735: New geoaccessor to calculate two coordinates distance on earth,
geodistance()
.PR#696: New geoaccessor to handle China webmap offset problem,
cncrs_offset()
.PR#691, PR#703: New geoaccessor to filter geometry via spatial relationship,
filter_geometry()
.PR#688: New accessor
weighted_mean()
for DataFrame.PR#685: Let
Pipeline
’sfit_predict
andpredict
support outputtingDataFrame
.PR#680, PR#682: New geoaccessor to check Polygon whether having hole,
has_hole()
.PR#679: New geoaccessor to count the hole number of
Polygon
,hole_counts()
.PR#668: Add a new option
dropna
forvalues_to_dict()
to handle nan value.PR#667: New accessor
dropna_index()
.
API changes:
Small bug-fix:
PR#714, PR#716: Fix
decompose()
can’t collapsedict
.PR#692: Reset non-monotonic index.
Documentation:
PR#732: Add description for
jenks_bin()
.PR#709: Update
toposimplify
example.PR#697: Simplify doc link via
klass
variable.PR#693: Reforce pydata-sphinx-theme to v0.9.0.
PR#689: Update author information.
PR#686: Correct link.
PR#553: Add description for
pipeline
.
Maintenance development:
PR#730, PR#731: Simplify codes (directly select DataFrame, rename Series, and add
/
for method to only receive positional argument).PR#720: Add comment for why updating the version of dependencies.
PR#717: Compat Python 3.7 / 3.8 which requires pandas >= 1.2.
PR#710, PR#727: Lint codes (includes
top_n()
,warning()
, andfilter_in()
).PR#700: Simplify CodeQL CI.
PR#687: Add new pre-commit hooks.
PR#684: Use official
concurrency
instead ofcancel.yaml
.PR#678, PR#698, PR#718, PR#722: pre-commit hooks autoupdate.
PR#677: Update workflow-run-cleaner option.
PR#673: Merge two test CIs.
PR#672: Small patch to release CI.
PR#671: Don’t lint versioneer.
PR#666: Merge ‘sdist’ and ‘release’ two CIs.
PR#664: use
*.size
replacelen(*)
.PR#663: Update
duplicated_geometry_groups()
description and simplify its logic.PR#661: Update
to_series()
description and simplify its logic.PR#660: Set
sdist
default job name.
Version 0.0.17 (2022-8-15)#
Highlights of this release:
Speed up geoaccessor
geobuffer()
viaUTM
CRS (PR#638).Require minimal Python 3.8+ (PR#554).
New features and improvements:
A syntactic sugar to parallelize multi-jobs
parallelize()
(PR#635, PR#641).New geoaccessor to label / drop duplicate geometry:
duplicated_geometry_groups()
,duplicated_geometry()
, anddrop_duplicates_geometry()
(PR#631, PR#632).New accessor for Series
swap_index_values()
(PR#630).New accessor group by index
groupby_index()
(PR#625).New geoaccessor for GeoDataFrame
toposimplify()
(PR#624, PR#649, PR#651).to_series()
gets onlyvalue_column
also return Series from DataFrame (PR#620).New accessor for Series
jenks_bin()
andjenks_breaks()
(PR#618, .PR#629)New accessor for Series
filter_in()
(PR#614).New geoaccessor for GeoDataFrame
to_geoseries()
(PR#609).New geoaccessor remove active geometry
drop_geometry()
(PR#599).New geoaccessor for Series
from_wkt()
(PR#596).New geoaccessor get coordinates from addresses
geocode()
and get addresses from coordinatesreverse_geocode()
(PR#591, PR#594, PR#643, PR#636, PR#652).New geoaccessor
from_wkb()
(PR#584, PR#598).New geoaccessor
to_geoframe()
(PR#568, PR#642, PR#646).
Small bug-fix:
Avoid GeoDataFrame constructor mutating the original (inputting) DataFrame (PR#644).
Avoid
fillna_regression()
mutating the original dataframe (PR#622).Compat with sklearn 1.2 stricter class parameters checking (PR#602).
geobuffer()
uses the active geometry to generate buffers (PR#583).Hook accessor method’s attrs into both class and instance (PR#580).
API changes:
Remove warning message and drop
inplace
option (PR#555).Use positional-only arguments (
/
) to limitname
(PR#435).
Documentation:
Add Raises part for documentation (PR#623).
Apply singular file name style to
/doc/*
(PR#613).Remove title ‘.dev0’ and ‘.post0’ suffixes (PR#587).
Beautify the format of inputting dictionary (PR#577).
Maintenance development:
Set timeout for updating versioneer CI (PR#657).
drop_inf/get_inf_range
returnsset
instead oflist
(PR#656).Remove ‘fkirc/skip-duplicate-actions’ (PR#655).
Rename arguments of methods (PR#647).
Remove ‘geopy’ from
*-minmal.yaml
env (PR#621).Follow
Series.nlargest(n=5, keep='first')
API (PR#616).Follow
numpy.repeat(repeats, axis)
API (PR#615).Set only positional parameter (
/
) for(geo)accessor
(PR#612).Add
environment.yaml
at root path for user (PR#611).Use
pandas.testing.assert_*_equal
replace(Series|DataFrame).equals
in testing (PR#607, PR#608).Use function style rather than OOP (PR#606, PR#633, PR#648, PR#653).
Singular style file name (PR#605).
Correct file name (PR#604).
Rename yaml file
*.yml
->*.yaml
(PR#603).(Geo)DataFrame
geoaccessor don’t return(Geo)Series
anymore (PR#601).Set default coding style via EditorConfig (PR#600).
Suit actions/setup-python@v4 new changing (PR#581).
pre-commit hooks autoupdate (PR#579, PR#595, PR#610, PR#627, PR#634, PR#639).
Move
dtoolkit.transformer.pipeline
intodtoolkit.pipeline
(PR#563).
Typing annotations:
Version 0.0.16 (2022-5-30)#
New features and improvements:
New accessor
fillna_regression()
(PR#556, PR#567).New
unique
option forvalues_to_dict()
(PR#548).Speed up
find_stack_level()
(PR#546).filter_in()
’show
only works oncondition
DataFrame
’s columns (PR#545).drop_inf()
’sinf
option supports+
and-
(PR#539).New
complement
option forfilter_in()
(PR#533).New method
decompose()
forDataFrame
(PR#488, PR#573).
API changes:
Add deprecated warning for
dtoolkit.transformer.pipeline
(PR#558).Split
dtoolkit.transformer
scripts into sub-pakcages (PR#557).Drop
inplace
fordrop_inf()
(PR#540).Drop
generic
package (PR#535).Drop
inplace
option offilter_in()
(PR#518, PR#531, PR#559).
Documentation:
Maintenance development:
Don’t skip dist when
ci/**
path files changing (PR#570).Remove
TYPE_CHECKING
blocks (PR#566).Remove
__future__
useless line importing (PR#564).Simplify
register_method_factory()
(PR#552).Correct name
excepted
->expected
(PR#547).Complete the accessor subpackage test suitcase (PR#544).
Move
collapse
from_util
intoexpand
(PR#541).Lint importing (PR#536).
Test
deprecated_kwargs()
(PR#534).
Version 0.0.15 (2022-5-13)#
New features and improvements:
New decorator
deprecated_kwargs()
(PR#525).Add the index register method
register_index_method()
, support register method intoIndex
(PR#507).
API changes:
Add version information for warning (PR#528).
Add
DeprecationWarning
for droppingaxis
option offilter_in()
(PR#522).Add
DeprecationWarning
for droppinggeneric
package (PR#521).Add
DeprecationWarning
for droppinginplace
option offilter_in()
(PR#519).Drop
unique_counts()
method (PR#512).
Maintenance development:
Version 0.0.14 (2022-5-1)#
New features and improvements:
Replace
.shape
with.__len__
, 1.6x speed up than older method (PR#506).New option
to_list
forvalues_to_dict()
(PR#500).New decorator
dtoolkit.util._decorator.deprecated_alias()
(PR#498).New option
order
forvalues_to_dict()
(PR#495).Return the error place is first happening via
stacklevel
option(PR#490).New method
from_wkt()
(PR#486).New method
drop_or_not()
(PR#485).New decorator
warning
(PR#484).
API changes:
Drop
unique_counts()
, usepandas.DataFrame.nunique()
instead (PR#502)Rename
values_to_dict()
’s argument fromfew_as_key
toasscending
(PR#499).Rename accessor name,
get_attr
->getattr
,lens
->len
(PR#487).Simplify
bin()
’s parameters via*args
and**kwargs
(PR#481).
Documentation:
Maintenance development:
Autoupdate actions (PR#494, PR#496, PR#497, PR#504, PR#510).
Use
pd.concat
replacepd.DataFrame.append
(PR#491).Update black version (PR#483).
Split package to scripts,
dataframe.py
->dataframe/
,series.py
->series/
,geodataframe.py
->geodataframe/
,geoseries.py
->geoseries/
,generic.py
->generic/
(PR#475, PR#480, PR#482).
Version 0.0.13 (2022-4-2)#
New features and improvements:
Use
.loc[:, wrong_keys]
instead of.get(wrong_keys)
(PR#473).New method
values_to_dict()
(PR#470).New method
unique_counts()
(PR#469).to_series()
could convert two or more columns DataFrame (PR#468).
API changes:
Array in array out (PR#460).
OneHotEncoder
’sfit_transform
use inputting’s index (PR#458).Let
Pipeline
’sfit_transform
supportsSeries
(PR#457).Drop
dtoolkit.transformer.MinMaxScaler
(PR#451).
Small bug-fix:
Fix jupyter notebook can’t render (PR#438).
Documentation:
Rename sphinx project name from ‘dtoolkit’ to ‘my data toolkit’ (PR#454).
Add ‘feature’ section at documentation homepage (PR#452).
Maintenance development:
Update versioneer (PR#471).
Autoupdate pre-commit hooks (PR#464).
Yaml file uses list item replace
[]
(PR#461).Group test suits (PR#459).
Handle
GeoSeries
FutureWarning (PR#456).Move all data to conftest.py (PR#453).
Typing annotations:
Version 0.0.12 (2022-2-11)#
Highlights of this release:
Specific pandas minimal version to each python version (PR#440).
One column data pipeline supports return
Series
(PR#431).
API changes:
Add
DeprecationWarning
fordtoolkit.transformer.MinMaxScaler
(PR#449).
Documentation:
New documentation, Register a Method as the Original Attribute of Pandas Object (PR#445).
Correct jupyter link (PR#444).
Maintenance development:
Version 0.0.11 (2022-1-25)#
New features and improvements:
Simplify
OneHotEncoder
examples and inputs (PR#434).FeatureUnion
would merge all into one DataFrame and the index would use the common part (PR#433).
Small bug-fix:
Fix jupyter notebook can’t render (PR#438).
Maintenance development:
Simplify linting workflow (PR#437).
Version 0.0.10 (2022-1-21)#
Highlights of this release:
New features and improvements:
Add
number
andother
option forlens()
(PR#406).
Documentation:
Fix Readthedocs running excessive memory consumption (PR#436).
Update installation documentation (PR#419).
New documentation, Tips About Accessing Element Attributes of
Series
(PR#408).Use jupyter replace markdown (PR#405, PR#409, PR#410, PR#414).
Remove warning for
Series.lens
(PR#399).
Maintenance development:
Cancel any previous runs that are not completed (PR#426).
Add skip check job (PR#425).
Use mamba to speed up building env (PR#422, PR#427, PR#436).
Test
register_*_method
positional arguments (PR#420).Add some new pre-commit hooks (PR#407).
Contained ‘rc’ tag would be as ‘pre-release’ (PR#404).
Rename
ci/envs/*
toci/env/*
(PR#403).Add skip check avoid frequently creating versioneer’s autoupdating PR (PR#397).
Version 0.0.9 (2022-1-10)#
Highlights of this release:
Use
squash merge
to keep a cleaning git commit history (Issue#386).register_series_method()
andregister_dataframe_method()
support alias (PR#392).
New features and improvements:
points_from_xy()
would returnGeoSeries
if df only has one column (PR#385).New accessor method
to_series()
(PR#380).
API changes:
Call
lens()
viaSeries.len
(PR#394).
Maintenance development:
Draft github-action release then add changelog by manually (PR#396).
Fix words, a -> an (PR#387).
Pre-commit hooks autoupdate (PR#384).
Contribuing development:
Add pull request template (PR#361).
Documentation:
Correct sphinx method link (PR#390).
Version 0.0.8 (2022-1-1)#
Highlights of this release:
API changes:
Remove
geographic_buffer()
(PR#348).
Maintenance development:
Let git choose the default branch (PR#376).
Update pre-commit commit message (PR#371).
Enable labeled ‘auto-merged’ PR could merge master branch into PR (PR#368, PR#370, PR#372, PR#375).
Github action runner update (PR#365, PR#366, PR#367, PR#369, PR#383).
Pre-commit hooks auto update (PR#359).
Documentation:
Correct package name,
MinMaxScaler
->OneHotEncoder
(PR#374).Shorten package path,
dtoolkit.accessor.register
->dtoolkit.accessor
(PR#373).
New contributors:
Version 0.0.7 (2021-12-30)#
Highlights of this release:
New features and improvements:
API changes:
Add
DeprecationWarning
forgeographic_buffer()
(PR#341).
Maintenance development:
Import uncommon packages at inner of method (PR#343, PR#344).
Tag event also trigger to release to
test.pypi.org
(PR#340).
Typing annotations:
Version 0.0.6 (2021-12-13)#
Highlights of this release:
New features and improvements:
Add
columns
argument forerror_report()
(PR#328).New method
points_from_xy()
(PR#316).
Bug fixes:
Fix version number showing at sphinx home page (PR#318).
Maintenance development:
Publish to test.pypi.org only when event is ‘push’ (PR#337).
pre-commit autoupdate (PR#324).
Update commit message of bot (PR#321).
Add workflow to automatically update versioneer (PR#319, PR#333).
Documentation:
Documentation pathch (PR#329).
Version 0.0.5 (2021-12-6)#
Highlights of this release:
Remove test from release package (PR#307).
Use
TAG[.postDIST[.dev0]]
version style (PR#299, PR#300, PR#306).Simplify methods of importing
dtoolkit.accessor
anddtoolkit.geoaccessor
(PR#294, PR#295, PR#297, PR#303).
New features and improvements:
Add new method for Series,
error_report()
(PR#304).Let
expand()
support sub-element type is list-like (PR#283).Add new accessor
lens()
(PR#282).
API changes:
Remove
toolkit.geogarphy
(PR#277).
Maintenance development:
Let CI fetch all git history to get correct version (PR#312).
Add yaml file checker (PR#302).
Update versioneer (PR#296).
Bump version of pre-commit repos (PR#292).
Publish to TestPyPI (PR#291).
Gather information into setup.cfg (PR#298).
Create codeql analysis CI (PR#287).
Add
.PHONY
into Makefile to avoid name conflict (PR#285).Adjust tests CI (PR#284, PR#288, PR#290, PR#293, PR#310, PR#311).
Documentation:
Redirect py-modindex.html to reference.html (PR#314).
Update Readme file (PR#313).
Add documentation for generating geographic buffer methods (PR#308).
New contributors:
Version 0.0.4 (2021-11-8)#
Highlights of this release:
Let GeoPandas also has Pandas accessor function (PR#261, PR#265, PR#266, PR#268, PR#271, PR#273, PR#275, PR#276, PR#280, PR#281).
DToolKit requires Pandas >= 1.1.3 to support Python 3.9 (PR#254).
New features and improvements:
API changes:
Add
DeprecationWarning
fortoolkit.geogarphy
(PR#274).Keep snake name style,
dropinf
->drop_inf
andfilterin
->filter_in
(PR#249, PR#253).
Maintenance development:
Only publish
.tar
file (PR#246).Use
artifact
to save sdist to fix different CI jobs that can’t exchange data problems (PR#242).
Documentation:
Use
sphinx.ext.autosectionlabel
to add anchor (PR#272).Start to use IPython Sphinx Directive (PR#258, PR#259, PR#260).
Drop python module index html page (PR#256).
Small patches to documentation (PR#245, PR#251, PR#255, PR#257, PR#262, PR#263, PR#267, PR#286).
Fix these doc doesn’t exist in dtoolkit (PR#244).
Fix documentation building environment (PR#243).
Version 0.0.3 (2021-10-21)#
New features and improvements:
Documentation:
Add a new documentation about
Workflow
, see AutomatedPipeline
:AutoML
(PR#236, PR#237).Add a new documentation about
AutoML
, seeTransformer
andPipeline
Brief Description (PR#235, PR#237).Generate sphinx python model index, see py-modindex (PR#231).
Add an introduction for
geographic_buffer()
, see What is the Geographic Buffer? (PR#229).Let class method doc could show (PR#226).
Add
CHANGELOG.md
file (PR#222).Change API URL, from
reference/api/geography/dtoolkit.geography.geographic_buffer.html
toreference/api/dtoolkit.geography.geographic_buffer.html
(PR#225).
Maintenance development:
Version 0.0.2 (2021-9-2)#
Highlights of this release:
Now DToolKit supports py3.9, works with Python >= 3.7 (PR#211).
New features and improvements:
Add
transform_series_to_frame()
function series to dataframe, keep the data structure in the pipeline data stream is still DataFrame (PR#202).Make a generic array to frame transform function (PR#193, PR#198).
Simplify base
Transformer
, moveTransformer
’s__init__
andfit
toMethodTF
(PR#192).Let
update_invargs()
could could use the old arguments when new are empty (PR#191).
API changes:
Bug fixes:
Fix error typing cause vscode plugin can’t show function’s documentation (PR#203, PR#205).
Fix
pip show dtoolkit
error homepage name (PR#201).
Typing annotations:
Add
OneDimArray
andTwoDimArray
typing (PR#209).Add
GeoSeriesOrGeoFrame
typing (PR#207).Add
SeriesOrFrame
typing (PR#206).Specific
make_union()
input is a list ofTransformer
(PR#199).Rich
transform()
’s annotations (PR#197).Fix
multi_if_else()
’if_condition_return
parameter annotation (PR#195).Rremove
PandasType
andGeoPandasType
(PR#190).Fix
dtoolkit.transformer._util.isin()
’s annotation (PR#188).Let
dtoolkit.transformer._util.isin()
’saxis
could acceptstr
type (PR#188).
Documentation:
Maintenance development: