- Add SharedStorageMonitor thread to periodically verify shared storage:
* Writes a temp file to the shared location and validates MD5 from all nodes.
* Skips nodes with unstable recent heartbeats; retries once; defers decision if any node is unreachable.
* Updates a cluster-wide stateful flag (shared_storage_on) only on conclusive checks.
- New CMAPI endpoints:
* PUT /cmapi/{ver}/cluster/check-shared-storage — orchestrates cross-node checks.
* GET /cmapi/{ver}/node/check-shared-file — validates a given file’s MD5 on a node.
* PUT /cmapi/{ver}/node/stateful-config — fast path to distribute stateful config updates.
- Introduce in-memory stateful config (AppStatefulConfig) with versioned flags (term/seq) and shared_storage_on flag:
* Broadcast via helpers.broadcast_stateful_config and enhanced broadcast_new_config.
* Config PUT is now validated with Pydantic models; supports stateful-only updates and set_mode requests.
- Failover behavior:
* NodeMonitor keeps failover inactive when shared_storage_on is false or cluster size < 3.
* Rebalancing DBRoots becomes a no-op when shared storage is OFF (safety guard).
- mcl status improvements: per-node 'state' (online/offline), better timeouts and error reporting.
- Routing/wiring: add dispatcher routes for new endpoints; add ClusterModeEnum.
- Tests: cover shared-storage monitor (unreachable nodes, HB-based skipping), node manipulation with shared storage ON/OFF, and server/config flows.
- Dependencies: add pydantic; minor cleanups and logging.
* feat(cmapi): add read_only param for API add node endpoint
* style(cmapi): fixes for string length and quotes
Add dbroots of other nodes to the read-only node
On every node change adjust dbroots in the read-only nodes
Fix logging (trace level) in tests
Remove ExeMgr from constants
Fix tests
Manually remove read-only node from ReadOnlyNodes on node removal (because nodes are only deactivated)
Review fixes (mostly switching to StrEnum analog before py3.11, also changes in ruff config)
Read-only nodes are now called read replica consistently
Don't write hostname into IP fields of the config like PMSx/IPAddr, pmx_WriteEngineServer/IPAddr
We calculate ReadReplicas by finding PMs without WriteEngineServer
In _replace_localhost, replace local IP addrs with resolved IP addrs and local hostnames -- with the resolved hostnames.
ModuleHostName/ModuleIPAddr is kept intact.
Keep only IPv4 in ActiveNodes/DesiredNodes/InactiveNodes
feat: add mock DNS resolution builder for testing hostname/IP mappings
* Fix _add_node_to_PMS: if node is already in PMS, save it to existing items to not miss it during the reconstruction of the list
* Make tests independent from CWD
Fixed for _add_Module_entries
Fixed node removal and tests
Fixes for node manipulation tests
Don't log trace_params in tracing logger, because it already has all this data
Don't print span attrs, it can contain lots of headers
Save small part of the response into the span, if the response was a JSON string
Added JSON logging of trace details into a separate file (to not spam the main log with machine readable stuff)
Record part of the response into the span
Set duration attribute in server spans
Log 404 errors
Colorize the traces (each span slightly changes the color of the parent span)
Improve trace visualization with duration formatting and notes for request/response pairs
Tracing requests
Custom log factory adds all trace values as one log record parameter (it will be empty if trace values are empty, like in MainThread where there are no incoming requests)
* feat(cmapi): NetworkManager class for some ip hostname opoerations.
* fix(cmapi): Use NetworkManager class to resolve ip and hostname in node_manipulation.add_node function
* fix(cmapi): Minor docstring and formatting fixes
* chore(mcs, scripts): extra/columnstore_review.sh with scripts/columnstore_review.sh with 1.4.13 version
* feat(mcs): add review command to the Tools section. It's the wrapper for columnstore_review.sh
* feat(mcs): add review command implementation to tools.py file + constansts.py
* chore(mcs): add separator argument to cook_sh_arg function
* docs(mcs): updated README.md and mcs.1 man file
* fix(mcs, wrapper): default is None for --backup-location, --backup-destination, --storage, --parallel, --highavilability, --skip-save-brm, --skip-polls, --skip-locks, --skip-mariadb-backup, --skip-bucket-data, --name-backup, --quiet, --no-verify-ssl, --poll-interval, --poll-max-wait, --retention-days, -scp, -bb, -url, -f, -m, -aro, -li and most of arguments in backup_commands.py and restore_commands.py
* fix(mcs, helpers): cook_sh_arg parser function now detects None as a value
* fix(mcs, wrapper): list -> li in typer command argument name for both backup_commands.py and restore_commands.py
* docs(mcs, wrapper): --parralel arg help message was edited to simple and readable one
* fix(mcs, wrapper): removing -no- prefixed flag variants for all bool arguments in backup_commands.py and restore_commands.py
[fix] CEJPasswordHandler class methods to use directory for cskeys file
[fix] CEJPasswordHandler.encrypt_password to return password in hex format
[fix] CEJPasswordHandler key_length
[fix] CEJPasswordHandler os.urandom call typo
[upd] mcs cli README.md and man page
[upd] mcs cli README_DEV.md
[fix] mcs_cluster_tool/decorators.py to handle typer.Exit exception
[add] various docstrings
[add] distribute .secrets file to all nodes while adding a new node
[add] encrypt_password, generate_secrets_data, save_secrets to CEJPasswordHandler
[add] tools section to mcs cli tool
[add] mcs_cluster_tool/tools_commands.py file with cskeys and cspasswd commands
[add] cskeys and cspasswd commands to tools section of mcs cli
[mv] backup/restore commands to tools section mcs cli
[fix] minor imports ordering
[fix] constants