Some utility scripts for working with excel files; pandas is used for accessing excel files, with openpyxl engine.
(works unexpected slowly on WSL2 but quickly on a MAC...)
# first time: create virtual env
python3 -m venv .venv
# venv activate (each usage)
source .venv/bin/activate
# first time: install requirements
pip3 install -r requirements.txtusage: exceldiff.py [-h] --excel1 EXCEL_FILE1 [--excel2 EXCEL_FILE2]
Ever had to modify an excel in a few places, then report back to the owner
what changed? This script compares all sheets of an excel file, and reports
the cells that changed.
options:
-h, --help show this help message and exit
--excel1 EXCEL_FILE1, -f EXCEL_FILE1
file name of first excel input file
--excel2 EXCEL_FILE2, -t EXCEL_FILE2
file name of second excel input file
Usage:
usage: excel2json.py [-h] --excel EXCEL_FILE [--tab EXCEL_TAB] --json OUT_FILE
--row_from ROW_FROM --col_from COL_FROM
[--filter USE_COLUMNS]
Extract a table from an excel file into a json file. The json file is an array
of records, where each row stands for a record in the json. The row-record
consists of name-value pairs, where the name is the table header and the value
is the cell value for this row. The initial position is the start of he header
line. All data lines are taken, until end of excel, or until the first line
that does not have any values Example: python3 excel2json.py --excel=tbl.xlsx
--col_from=3 --row_from=1 --json=out.json Extract the table where the header
line starts from column 3 (one is the first column) and row 1 (one is the
first column) python3 excel2json.py --excel=tbl.xlsx --json=out.json
options:
-h, --help show this help message and exit
--excel EXCEL_FILE, -i EXCEL_FILE
file name of excel input file
--tab EXCEL_TAB, -t EXCEL_TAB
tab name of excel tab
--json OUT_FILE, -o OUT_FILE
output file file
--row_from ROW_FROM, -x ROW_FROM
starting row of range (one based)
--col_from COL_FROM, -y COL_FROM
starting column of range (one based or excel characte
notation)
--filter USE_COLUMNS, -f USE_COLUMNS
filter a subset of column (comma delimited list of
column names)
Example:
For an input table of the following form
| a col | b col | c col | ||
| 1 | 2 | 3 | ||
| a | b | c | ||
| 4 | 5 | 6 | ||
| e | f | g |
This command
python3 excel2json.py --excel=tbl.xlsx --col_from=4 --row_from=2 --json=out.json
We will get a json of the following form
[
{
"a-col": "1",
"b-col": "2",
"c-col": "3"
},
{
"a-col": "a",
"b-col": "b",
"c-col": "c"
},
{
"a-col": "4",
"b-col": "5",
"c-col": "6"
},
{
"a-col": "e",
"b-col": "f",
"c-col": "g"
}
]
The following command will filter a subset of the columns a-col and c-col
python3 tabletojson.py --excel=tbl.xlsx --col_from=3 --row_from=1 --json=out.json --filter=a-col,c-col
And result in:
[
{
"a-col": "1",
"c-col": "3"
},
{
"a-col": "a",
"c-col": "c"
},
{
"a-col": "4",
"c-col": "6"
},
{
"a-col": "e",
"c-col": "g"
}
]
usage: excelgrep.py [-h] --dir SEARCH_DIR [--regex REGEX_SEARCH]
[--search STRING_SEARCH]
for all files (with extension .xlsx) in a given directory/suddirectory: Search
all cells in all tabs for a given search term and report matching lines
options:
-h, --help show this help message and exit
--dir SEARCH_DIR, -d SEARCH_DIR
directory to search for excel files
--regex REGEX_SEARCH, -r REGEX_SEARCH
Regular expression to search for
--search STRING_SEARCH, -s STRING_SEARCH
String to search for
Warning! This program uses pandas to read and write an excel workbook. This works fine for excel files with data only - without formatting / scripts.
Help text
Example usage
python3 excelsed.py --excel a.xlsx --col_from=4 --row_from=1 --row_to=5 --py="re.sub(r'[0-9]','_', arg)"
Help text
usage: excelsed.py [-h] --excel EXCEL_FILE [--tab EXCEL_TAB] --row_from
ROW_FROM --col_from COL_FROM [--row_to ROW_TO]
[--col_to COL_TO] --py PY_CODE
apply python expression on range of excel cells Allows to apply regex
substitution over a bunch of excel cells The python code has access to
packages re and math, the input value is in global variable arg Example usage:
python3 excelsed.py --excel a.xlsx --col_from=4 --row_from=1 --row_to=5
--py="re.sub(r'[0-9]','_', arg)"
options:
-h, --help show this help message and exit
--excel EXCEL_FILE, -i EXCEL_FILE
file name of excel input file
--tab EXCEL_TAB, -t EXCEL_TAB
tab name of excel tab
--row_from ROW_FROM, -x ROW_FROM
starting row of range (one based)
--col_from COL_FROM, -y COL_FROM
starting column of range (one based)
--row_to ROW_TO ending row of range (must have at least --row_to or
--col_to) (one based)
--col_to COL_TO ending row of range (must have at least --row_to or
--col_to) (one based)
--py PY_CODE, -p PY_CODE
python expression to transform each cell. The current
cell value is in global variable arg
```