- Add SharedStorageMonitor thread to periodically verify shared storage:
* Writes a temp file to the shared location and validates MD5 from all nodes.
* Skips nodes with unstable recent heartbeats; retries once; defers decision if any node is unreachable.
* Updates a cluster-wide stateful flag (shared_storage_on) only on conclusive checks.
- New CMAPI endpoints:
* PUT /cmapi/{ver}/cluster/check-shared-storage — orchestrates cross-node checks.
* GET /cmapi/{ver}/node/check-shared-file — validates a given file’s MD5 on a node.
* PUT /cmapi/{ver}/node/stateful-config — fast path to distribute stateful config updates.
- Introduce in-memory stateful config (AppStatefulConfig) with versioned flags (term/seq) and shared_storage_on flag:
* Broadcast via helpers.broadcast_stateful_config and enhanced broadcast_new_config.
* Config PUT is now validated with Pydantic models; supports stateful-only updates and set_mode requests.
- Failover behavior:
* NodeMonitor keeps failover inactive when shared_storage_on is false or cluster size < 3.
* Rebalancing DBRoots becomes a no-op when shared storage is OFF (safety guard).
- mcl status improvements: per-node 'state' (online/offline), better timeouts and error reporting.
- Routing/wiring: add dispatcher routes for new endpoints; add ClusterModeEnum.
- Tests: cover shared-storage monitor (unreachable nodes, HB-based skipping), node manipulation with shared storage ON/OFF, and server/config flows.
- Dependencies: add pydantic; minor cleanups and logging.
* feat(cmapi): add read_only param for API add node endpoint
* style(cmapi): fixes for string length and quotes
Add dbroots of other nodes to the read-only node
On every node change adjust dbroots in the read-only nodes
Fix logging (trace level) in tests
Remove ExeMgr from constants
Fix tests
Manually remove read-only node from ReadOnlyNodes on node removal (because nodes are only deactivated)
Review fixes (mostly switching to StrEnum analog before py3.11, also changes in ruff config)
Read-only nodes are now called read replica consistently
Don't write hostname into IP fields of the config like PMSx/IPAddr, pmx_WriteEngineServer/IPAddr
We calculate ReadReplicas by finding PMs without WriteEngineServer
In _replace_localhost, replace local IP addrs with resolved IP addrs and local hostnames -- with the resolved hostnames.
ModuleHostName/ModuleIPAddr is kept intact.
Keep only IPv4 in ActiveNodes/DesiredNodes/InactiveNodes
feat: add mock DNS resolution builder for testing hostname/IP mappings
* Fix _add_node_to_PMS: if node is already in PMS, save it to existing items to not miss it during the reconstruction of the list
* Make tests independent from CWD
Fixed for _add_Module_entries
Fixed node removal and tests
Fixes for node manipulation tests
[add] distribute .secrets file to all nodes while adding a new node
[add] encrypt_password, generate_secrets_data, save_secrets to CEJPasswordHandler
[add] tools section to mcs cli tool
[add] mcs_cluster_tool/tools_commands.py file with cskeys and cspasswd commands
[add] cskeys and cspasswd commands to tools section of mcs cli
[mv] backup/restore commands to tools section mcs cli
[fix] minor imports ordering
[fix] constants
MAJOR: Some logic inside node remove changed significantly using active
nodes list from Columnstore.xml to broadcast config after remove.
[fix] TransactionManager passsing extra, remove and optional nodes arguments to start_transaction function
[fix] commit and rollback methods of TransactionManager adding nodes argument
[fix] TransactionManager using success_txn_nodes inside
[fix] remove node logic to use Transaction manager
[fix] cluster set api key call using totp on a top level cli call
[add] missed docstrings
[fix] cluster shutdown timeout for next release
[add] cluster api client class
[fix] cli cluster_app using cluster api client
[add] ClusterAction Enum
[add] toggle_cluster_state function to reduce code duplication
refactor(ci): switch S3 uploading to AWS CLI for logging S3 link
docs: update README with to reflect changes in Drone pipeline
feat(env): add activate script for portable Python
fix(logging): log RequestException response body
* MCOL-5594: Interactive "mcs cluster stop" command for CMAPI. (#3024)
* MCOL-5594: Interactive "mcs cluster stop" command for CMAPI.
[add] NodeProcessController class to handle Node operations
[add] two endpoints: stop_dmlproc (PUT) and is_process_running (GET)
[add] NodeProcessController.put_stop_dmlproc method to separately stop DMLProc on primary Node
[add] NodeProcessController.get_process_running method to check if specified process running or not
[add] build_url function to helpers.py. It needed to build urls with query_params
[add] MCSProcessManager.gracefully_stop_dmlproc method
[add] MCSProcessManager.is_service_running method as a top level wrapper to the same method in dispatcher
[fix] MCSProcessManager.stop by using new gracefully_stop_dmlproc
[add] interactive option and mode to mcs cluster stop command
[fix] requirements.txt with typer version to 0.9.0 where supports various of features including "Annotated"
[fix] requirements.txt click version (8.1.3 -> 8.1.7) and typing-extensions (4.3.0 -> 4.8.0). This is dependencies for typer package.
[fix] multiple minor formatting, docstrings and comments
* MCOL-5594: Add new CMAPI transaction manager.
- [add] TransactionManager ContextDecorator to manage transactions in less code and in one place
- [add] TransactionManager to cli cluster stop command and to API cluster shutdown command
- [fix] id -> txn_id in ClusterHandler class
- [fix] ClusterHandler.shutdown class to use inside existing transaction
- [add] docstrings in multiple places
* MCOL-5594: Review fixes.
* fix(cmapi): MCOL-5594: Fix after testing.. (#3150)
- [fix] wrong invoke already imported constants and function from signal lib