Memory Across the Many Faces of OpenSees
- silviamazzoni
- 5 days ago
- 14 min read
How Tcl, Python, and parallel patterns chnage what "using more memory" actually means
This post is another postcard from one of my travels through the looking glass with ChatGPT — which means it’s not always obvious what’s solid ground and what might be a GPT hallucination.
My goal here is not to hand you a final answer, but to give you enough structure that you start questioning what you think you know about memory in OpenSees.
In this second memory post, I’m zooming out from Jupyter and focusing on the engines and parallel patterns themselves.
At a high level, we really have six distinct ways of running the same C++ core:
Tcl engines
OpenSees (classic, single process)
OpenSeesMP (MPI, many ranks)
OpenSeesSP (single process with parallel solver / domain decomposition)
Python engines
OpenSeesPy (single process, Python + C++)
OpenSeesPy + concurrent.futures (many workers on one node)
OpenSeesPy + mpi4py (many MPI ranks, each with Python + C++)
All six share the same C++ OpenSees engine, so the kind of memory they allocate—domains, elements, matrices, solvers, histories—is fundamentally similar. What changes is how many times you pay for it and what extra layers sit on top.
The focus of this post is only memory. Not “which one is fastest,” not “which syntax is nicer,” but:
How many processes exist (one vs many MPI ranks vs many Python workers)
How many copies of the domain exist (one per rank / per worker vs shared)
Whether there’s a language runtime sitting on top (Python for OpenSeesPy)
How you tend to use them (notebooks, scripts, Tapis jobs, MPI batch runs)
The usual question is: “Is memory important?” And the usual honest answer is: it depends. My goal here is to make it clearer what it depends on, so that when a job dies with an out-of-memory error, you have a mental model of why.
With that framing, let’s start with the big-picture comparison, then work our way down into the details and rules of thumb.
I. Big Picture Comparison
OpenSees, OpenSeesMP, OpenSeesSP, and OpenSeesPy all use the same core C++ engine (OpenSees), so the type of memory they allocate (domains, matrices, solvers, histories) is similar.
What really changes is:
How many processes exist (one vs many MPI ranks vs many Python workers)
How many copies of the domain exist (single shared domain vs one per rank/worker)
Whether there’s a language runtime on top (Tcl only vs Python + C++)
How you tend to run them (scripts, notebooks, MPI jobs, parametric sweeps, etc.)
From a memory perspective, each of the six patterns is basically:
OpenSees domain memory+ optional Python runtime and data× (number of ranks or workers that hold a domain)
II. High-Level Comparison Table
Engine / Pattern | Execution Model | Parallelism Type | Memory per “unit” | Scaling Behavior | Typical Use |
OpenSees | Single native C++ process | None (serial) | 1 domain, 1 process | Memory ∝ model size | Local / HPC |
OpenSeesSP | Single process, distributed solve (domain decomposition / parallel solver) | Shared-memory / hybrid-ish | 1 domain, augmented solver structures | Slightly > OpenSees for same model | HPC node / cluster |
OpenSeesMP | Multiple MPI ranks | MPI (distributed) | 1 domain per rank (unless coded otherwise) | Memory ∝ (model size × # ranks) + comm | HPC clusters only |
OpenSeesPy | Python + C++ domain in-process | Serial (unless you bolt on MPI) | C++ domain plus Python runtime + arrays + plotting | Memory ∝ model size + Python overhead | Jupyter, Python scripts |
OpenSeesPy + concurrent.futures | N Python workers (threads or processes) | Task-level parallel | Threads: 1 domain shared (if safe) Procs: 1 domain per worker | Threads: similar to OpenSeesPy; Procs: ≈ N_workers × OpenSeesPy | Param sweeps |
OpenSeesPy + mpi4py | N MPI ranks, each w/ Python + C++ | Distributed MPI | ~1 domain per rank + Python runtime per rank | ≈ N_ranks × (OpenSees memory + Python) | HPC / clusters |
A. OpenSees (classic Tcl interpreter)
Execution & memory model
1 OS process:
Tcl interpreter + C++ OpenSees engine in the same process.
Memory consumers:
Domain: nodes, elements, materials, constraints, recorders.
Solver: stiffness matrices, factorizations, solver workspace.
Analysis objects: integrator, algorithm, system, convergence tests.
Tcl objects (script, variables) – usually small compared to FE data.
Memory behavior
Scales roughly linearly with:
Number of DOF
Number of elements/materials
Complexity of solver (Newton + Krylov vs simple LU, etc.).
No domain replication: exactly one domain in memory.
When the process exits, the OS returns everything.
Typical failure mode
On large models: malloc/new failure, or OS OOM-kill.
Usually clean – when it dies, memory is gone.
Best use case (memory-wise)
Large but single-run FE models where you want lean, predictable memory.
Good baseline for “this is the minimum memory this model needs”.
B. OpenSeesSP
(Details depend on the exact SP implementation, but conceptually):
Execution & memory model
Still one executable process on each participating node, but:
Uses a parallel solver / domain decomposition strategy.
Domain often stored once, but solver data structures are bigger/staged differently.
Memory behavior
Compared to plain OpenSees (same model):
Domain memory is similar or slightly larger (extra bookkeeping).
Solver memory (global system matrices, parallel data structures) is larger, especially for:
Iterative parallel solvers
Overlapping subdomains
Ghost nodes / interface DOF
Still not replicated per rank the way OpenSeesMP is.
Pros vs OpenSees
Can handle somewhat larger systems on the same hardware.
Better use of node resources for big linear algebra.
Cons vs OpenSees
More opaque memory footprint.
Harder to predict exact RAM needs because solver internals matter more.
C. OpenSeesMP (MPI)
This is where memory really changes.
Execution & memory model
N MPI ranks ⇒ N OS processes (one per rank).
In the common use pattern:
Each rank holds its own domain (or a largely complete copy).
Each rank has its own solver objects, matrices, vectors, etc.
You may also have:
Duplicate input: each rank reads similar input files.
Redundant state: recorders or outputs per rank.
Memory behavior
For many OpenSeesMP workflows:
Effective memory usage ≈ N_ranks × (memory of “equivalent” OpenSees model) overhead for communication buffers, MPI, etc.
So if a model takes ~8 GB in plain OpenSees and you naïvely run it on 8 ranks with full domain replication, total memory across the node can easily approach ~64 GB (plus overhead).
Even if domain decomposition is used more cleverly, the pattern is:
Per-rank memory is non-trivial.
Total memory across node grows significantly with rank count.
Where this bites you
On HPC nodes with many ranks:
You can hit node memory limits even if each rank is under some per-process mental threshold.
On Jupyter or shared nodes, this is basically untenable (hence “don’t run OpenSeesMP from a notebook”).
Best use case (memory-wise)
When you genuinely need distributed memory parallelism and:
Each rank handles a controllable portion of the model, or
You’ve designed your partitioning so per-rank memory is modest.
Best deployed with careful MPI design + batch jobs, not exploratory runs.
D. OpenSeesPy (single sequential)
Execution & memory model
1 OS process:
CPython interpreter + C++ OpenSees engine in the same process.
Memory consumers:
C++ OpenSees domain (same flavor as Tcl OpenSees).
Python runtime:
Interpreter, garbage collector, internal structures.
Python objects:
NumPy arrays, Pandas DataFrames, Python lists/dicts.
Copies of geometry, loads, results, etc.
Plotting / post-processing libraries:
Matplotlib, Plotly, PyVista, etc. can be very memory-hungry.
Notebook overhead (if in Jupyter):
Stored outputs, hidden references (Out[], , _, etc.).
Memory behavior vs Tcl OpenSees
Baseline FE memory (nodes/elements/matrices) ≈ same order as OpenSees.
On top of that, you can easily double or triple the footprint if you:
Store full time histories in NumPy arrays.
Keep multiple versions of geometry/results in memory.
Create many plots and never close them.
Rebuild the domain multiple times in one kernel session.
Lifetimes & leaks
C++ domain memory lifespan:
Created when model()/elements are defined.
Partially cleaned by wipe(), but not always fully returned.
Fully freed only when the Python process exits (e.g., restart kernel).
Python objects:
Subject to reference counting + garbage collector.
Live until all references disappear.
Result: in a notebook, memory tends to creep upward more than in pure Tcl OpenSees for the same model/use pattern.
Best use case (memory-wise)
Small to medium models.
Heavy post-processing where the Python ecosystem is worth the overhead.
Teaching, debugging, prototyping, and post-processing HPC outputs.
E. OpenSeesPy + concurrent.futures
This pattern usually shows up in parametric studies, ensemble runs, or small in
dependent analyses where you want to farm out multiple runs from one Python driver.
Execution & memory model
There are two distinct modes:
a. ThreadPoolExecutor
from concurrent.futures import ThreadPoolExecutor
def run_case(params):
# uses OpenSeesPy in the same process
...
with ThreadPoolExecutor(max_workers=4) as ex:
ex.map(run_case, cases)
1 OS process, multiple threads.
Single CPython interpreter.
Single OpenSeesPy C++ domain per process, unless you explicitly create/destroy different domains per thread (which is generally not thread-safe and not recommended).
Threads share:
Python heap
C++ domain
Global OpenSees state (this is the scary part).
Memory behavior:
Domain memory: essentially the same as plain OpenSeesPy.
Extra memory: thread stacks, extra Python objects per task.
Total RAM footprint is not multiplied dramatically, but:
You can easily create a logical mess (races in the OpenSees domain).
OpenSees is not designed as a thread-safe engine for multiple simultaneous analyses in a shared domain.
Conclusion:
Memory-efficient, but not safe for independent simultaneous analyses.
Reasonable for lightweight parallel post-processing of results already on disk, where you don’t touch the OpenSees domain.
b. ProcessPoolExecutor
(what most people actually want for parallel FE runs)
from concurrent.futures import ProcessPoolExecutor
def run_case(params):
# each worker is a fresh Python process
# often: build domain + run + write results
...
with ProcessPoolExecutor(max_workers=4) as ex:
ex.map(run_case, casesN OS processes, each:
its own Python interpreter
its own C++ OpenSeesPy domain
No shared memory between workers (beyond OS tricks like copy-on-write at fork).
Memory behavior:
Each worker behaves like an independent OpenSeesPy run.
Total memory ≈ N_workers × (domain memory + Python overhead + data/plots per worker).
If one worker’s model uses ~4 GB and you launch 4 workers on a 16 GB node:
You’re at ~16 GB just for domains, plus Python + overhead → likely OOM.
DesignSafe / Jupyter implications:
Easy to blow through your memory allocation if you oversubscribe workers.
In a notebook, ProcessPoolExecutor is extra fragile:
Workers inherit the notebook’s already-inflated state at fork.
RAM usage can jump immediately by N_workers × baseline_kernel_RAM.
c. Best practice for FE with concurrent.futures:
Prefer ProcessPoolExecutor, but only:
for relatively small models per worker, or
when workers run on different nodes via batch jobs, not inside a single Jupyter kernel.
Treat it like running N separate OpenSeesPy jobs.
F. OpenSeesPy + mpi4py
This pattern is conceptually similar to OpenSeesMP, but with a Python layer on top. Typical structure:
from mpi4py import MPI
from openseespy.opensees import *
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
# Each rank builds its own domain or a partition of a domain
Execution & memory model
mpiexec -n N python script.py → N OS processes.
Each process:
runs a CPython interpreter,
imports OpenSeesPy,
constructs its own C++ domain (full or partial),
communicates via mpi4py (wrapping MPI).
Memory per rank:
C++ domain memory (similar to OpenSees/OpenSeesMP for that subdomain).
Python runtime memory (~hundreds of MB depending on imports and data structures).
Python-side arrays / data structures (e.g., for assembling results, controlling workflow).
Total memory across the job:
Roughly:[\text{Total RAM} \approx N_\text{ranks} \times (\text{OpenSees domain} + \text{Python runtime} + \text{Python data})]
If you do one full domain per rank (naïve pattern), this is like:
OpenSeesMP’s full replication cost plus Python per rank.
If you implement a proper domain decomposition:
Each rank’s domain is smaller, so:
Domain memory per rank drops.
But Python overhead stays per-rank.
Result: for large N, Python overhead is non-negligible, especially on memory-constrained nodes.
Jupyter / DesignSafe implications
Like OpenSeesMP, you generally do not launch this from a notebook.
You run mpiexec in:
a batch job script, or
a terminal session (e.g., on Stampede3).
Inside a Jupyter kernel, trying to do MPI with mpi4py:
Is brittle (multi-process inside a managed kernel).
Can confuse Jupyter’s IO, signals, etc.
Can saturate node memory quickly if you build large domains per rank.
Where it makes sense
When you want:
MPI-level parallelism, plus
Python-level orchestration (smart control logic, adaptive workflows, etc.).
Good for:
sophisticated parametric or multi-physics workflows
where each rank runs OpenSeesPy as a “worker” controlled by Python.
But: memory-wise, you pay:
Domain memory per rank (like OpenSeesMP)
Python memory per rank
So you need to be even more careful with:
rank count
model size per rank
node memory capacity
III. Where These Two Fit in the Overall Story
a. OpenSeesPy + concurrent.futures (ProcessPool) vs OpenSeesMP
For FE-heavy workloads:
OpenSeesMP:
C++ only, no Python per rank.
Memory per rank = domain + solver + MPI.
Total memory scaling is still high for full replication, but leaner than Python.
OpenSeesPy + ProcessPoolExecutor:
Multiple Python interpreters + domains on one node.
Memory per worker = OpenSees domain + Python + data.
Usually heavier per rank/worker than OpenSeesMP for the same domain.
So if the goal is “big single model distributed across ranks,” OpenSeesMP is the more memory-efficient choice.
concurrent.futures shines when:
Each worker runs a small to mid-sized, independent model.
You’re doing embarrassingly parallel param sweeps, not partitioning one massive domain.
b. OpenSeesPy + mpi4py vs OpenSeesMP
OpenSeesMP:
MPI & domain decomposition implemented in C++.
No Python overhead per rank.
OpenSeesPy + mpi4py:
Everything OpenSeesMP does (conceptually), plus:
Python runtime per rank
Python-side logic
Python arrays & state
Memory-wise, for the same domain partitioning:
Total RAM for OpenSeesPy + mpi4py ≥ OpenSeesMP.(Same domain memory + extra Python overhead × ranks.)
You choose OpenSeesPy + mpi4py when:
You need Python for:
adaptive control
complex IO
integration with other Python tools
And you’re willing to pay extra in RAM and complexity.
IV. Quick “Memory-Conscious Choice” Guide
If your main constraint is RAM on a node / job, then very roughly:
Best (leanest) memory per DOF:
OpenSees (Tcl)
OpenSeesSP
OpenSeesMP
OpenSeesPy
OpenSeesPy + concurrent.futures (ProcessPool)
OpenSeesPy + mpi4py (Python + MPI + domain per rank)
When to prefer each:
Use OpenSees / OpenSeesSP / OpenSeesMP for:
biggest models
production HPC runs
when memory is tight and you don’t need Python during the solve.
Use OpenSeesPy (single process) for:
small/medium models
interactive work, teaching, and post-processing.
Use OpenSeesPy + concurrent.futures (ProcessPool) for:
many small independent runs (param sweeps) where each run is modest in size.
Use OpenSeesPy + mpi4py for:
advanced MPI workflows where Python orchestration is a must,
and nodes have enough memory to carry the Python overhead per rank.
V. How They Compare in Practice (Rules of Thumb)
For a given model size:
A. Small models (e.g., < 20k DOF)
At this scale:
Memory footprints are small across all engines.
Differences are dominated by your workflow, not the engine architecture.
Python overhead (OpenSeesPy) is negligible in absolute terms.
Parallel strategies typically don’t matter for RAM:
Engines that work perfectly fine:
OpenSees (Tcl)
OpenSeesSP
OpenSeesMP (though unnecessary)
OpenSeesPy
OpenSeesPy + concurrent.futures (threads or processes)
Workers are small, so even ProcessPool duplication is manageable.
OpenSeesPy + mpi4py
Per-rank Python overhead is modest when DOF is small.
Bottom line: You can use anything here — memory is almost never the limiting factor.
B. Medium models (≈ 20k–200k DOF)
Here memory overhead begins to matter. Patterns start to diverge.
Most memory-efficient and stable:
OpenSees (Tcl) Leanest per DOF.
OpenSeesSP Slightly larger than Tcl OpenSees, but efficient for repeated runs.
Still good, but requires care:
OpenSeesPy
Python adds overhead but still manageable.
Avoid storing large NumPy arrays or full-history data in RAM.
Restart kernel between major experiments to avoid C++ residual memory.
Use cautiously:
OpenSeesMP
Can be overkill memory-wise unless truly parallelized.
If each rank holds a full domain copy, memory multiplies quickly.
OpenSeesPy + concurrent.futures (ProcessPool)
Each worker = its own Python + domain copy →memory ~ N_workers × (OpenSeesPy footprint)
Use only if:
workers handle small-to-mid models, OR
each worker runs on a separate HPC node via SLURM/Tapis.
OpenSeesPy + mpi4py
More memory-heavy than OpenSeesMP for the same decomposition because each rank has:C++ domain + Python interpreter + Python-side data
Reasonable if Python-level control logic is essential, but pricier in RAM.
Bottom line:
Tcl OpenSees and OpenSeesSP remain memory winners; OpenSeesPy is fine with discipline; MPI-based Python workflows require active RAM budgeting.
C. Large models (hundreds of thousands to millions of DOF)
Here memory is the governing constraint, and the engine choice critically affects feasibility.
The only practical option for distributed-memory scalability:
OpenSeesMP
True MPI domain decomposition.
Necessary for multi-million DOF models.
BUT:
Expect large total memory across ranks.
Requires careful domain partitioning & batch scripting.
Strong scaling depends on high-quality MPI decomposition.
A possible bridge (depending on node memory and solver type):
OpenSeesSP
Useful on well-provisioned nodes.
More memory-efficient than Python-based MPI solutions.
Still single-process, but with parallel solver acceleration.
Generally not suitable for large-model solving:
OpenSeesPy
Python overhead + large C++ domain → too heavy for million-DOF scale.
Feasible only for post-processing (reading results) at this size.
Usually inappropriate for solving large models:
OpenSeesPy + concurrent.futures (ProcessPool)
Each worker replicates the entire large domain →explosive memory usage, unusable on shared nodes.
OpenSeesPy + concurrent.futures (threads)
Threading not safe for shared-domain FE analyses;solver and domain structures are not thread-safe.
OpenSeesPy + mpi4py
In theory, could handle large models with proper MPI decomposition.
In practice:
Per-rank Python overhead is high.
Python memory scales with rank count.
Total RAM can exceed OpenSeesMP by a large margin.
Only viable on large-memory HPC nodes and with careful design.
Bottom line:
For large FE models:
Solve with OpenSeesMP or OpenSeesSP (depending on scale).
Use OpenSeesPy only for preprocessing or post-processing.
Avoid Python-based multi-rank approaches for solving unless you have large-memory nodes and a very specific need for Python control logic.
Flowchart: Choosing the Right OpenSees Engine Based on Memory, Model Size, and Workflow
┌──────────────────────────┐
│ Start: What are you │
│ trying to do? │
└─────────────┬────────────┘
│
┌────────────────┴─────────────────┐
│ │
Solve a LARGE FE model Solve a SMALL/MEDIUM model
( ≥ 200k–1M+ DOF ) ( ≤ 200k DOF )
│ │
▼ ▼
┌─────────────────────────────────┐ ┌──────────────────────────────┐
│ Need distributed-memory MPI? │ │ Running inside Jupyter? │
└─────────────┬───────────────────┘ └──────────────┬──────────────┘
│ │
Yes │ No Yes
│ │
▼ ▼
┌──────────────────────────────────────┐ ┌──────────────────────────────────┐
│ Use **OpenSeesMP** │ │ Use **OpenSeesPy** │
│ - Multi-rank MPI │ │ - Good for interactive use │
│ - Requires batch/Tapis jobs │ │ - Restart kernel often │
│ - Most memory-efficient MPI option │ │ - Avoid large in-memory results │
└──────────────────────────────────────┘ └──────────────────────────────────┘
│ │
▼ ▼
┌─────────────────────────────────────┐ ┌─────────────────────────────────────┐
│ Need solver acceleration on one │ │ Need parallelizing MANY small runs?│
│ large-memory node? │ └────────────────┬───────────────────┘
└──────────────┬──────────────────────┘ │
│ Yes │ No
▼ ▼
┌─────────────────────────────────────┐ ┌──────────────────────────────────────┐
│ Use **OpenSeesSP** │ │ Use **OpenSeesPy + concurrent.futures** │
│ - Parallel solvers in one process │ │ (ProcessPool) │
│ - Moderately higher mem than Tcl │ │ - Independent small/medium models │
│ - Good for large static & modal │ │ - Avoid in a notebook (RAM blow-up) │
└─────────────────────────────────────┘ └──────────────────────────────────────┘
│
▼
┌──────────────────────────────────────┐
│ Considering Python + MPI logic? │
└───────────────────┬──────────────────┘
│
Yes │ No
▼
┌──────────────────────────────────────┐
│ Use **OpenSeesPy + mpi4py** │
│ - Domain per rank + Python per rank │
│ - More RAM than OpenSeesMP │
│ - Only viable for HPC batch jobs │
└──────────────────────────────────────┘
Cheat Sheet: Memory Behavior & Recommended Use Cases
This is the compact comparison table you can place right after your flowchart.
A. Memory Footprint Behavior (relative, same model size)
Engine / Pattern | Memory Footprint (Relative) | Why |
OpenSees (Tcl) | ★ Lowest | No Python; one domain; lean C++ |
OpenSeesSP | ★★ Low–Medium | Parallel solver data structures |
OpenSeesMP | ★★★ Medium–High | Domain often replicated per rank |
OpenSeesPy | ★★ Medium | Python runtime + domain |
OpenSeesPy + concurrent.futures (threads) | ★★ Medium (unsafe) | Shared domain → not thread-safe |
OpenSeesPy + concurrent.futures (processes) | ★★★ High | Python + domain per worker |
OpenSeesPy + mpi4py | ★★★★ Highest | MPI ranks + Python per rank + domain per rank |
B. Best Use Cases
Engine / Pattern | Best For | Avoid When |
OpenSees (Tcl) | Large single-run models; production HPC; memory-constrained runs | Complex Python-based workflows |
OpenSeesSP | Large static/eigen problems on one node; moderate memory growth | Truly distributed problems |
OpenSeesMP | Very large models; strong MPI scaling; HPC batch runs | Jupyter, low-memory nodes |
OpenSeesPy | Medium models; prototyping; teaching; post-processing; Jupyter | Large models; huge arrays; long-run analyses |
OpenSeesPy + concurrent.futures (threads) | Parallel post-processing (not FE solving) | Any FE solve (domain not thread-safe) |
OpenSeesPy + concurrent.futures (processes) | Independent small/medium parametric runs | Notebook environments; large models per worker |
OpenSeesPy + mpi4py | MPI workflows + Python control logic; advanced research | Memory-constrained nodes; multi-million DOF solves |
C. Simple “When to Use What” Summary
Model Size | Most Memory-Efficient Engine | Engines Allowed | Engines to Avoid |
Small (<20k DOF) | OpenSees (Tcl) | All engines/code paths fine | None (memory-wise) |
Medium (20k–200k DOF) | OpenSees (Tcl), OpenSeesSP | OpenSeesPy (carefully), OpenSeesMP (with partitioning) | OpenSeesPy ProcessPool if many workers |
Large (>200k–1M DOF) | OpenSeesMP, OpenSeesSP | Tcl OpenSees (if single node), OpenSeesPy (only for post-processing) | OpenSeesPy ProcessPool, OpenSeesPy_MPI in tight memory |
Huge (1M+ DOF) | OpenSeesMP | OpenSeesSP (if node RAM permits) | All Python-based solve workflows |
D. Jupyter/DesignSafe Quick Reference
Scenario | Recommended | Not Recommended |
Interactive small model | OpenSeesPy | Any MPI or multi-process approach |
Teaching / demos | OpenSeesPy | OpenSeesMP |
Post-processing HPC output | OpenSeesPy | None |
Solving a large model | Submit a Tapis job running OpenSees or OpenSeesMP | Running solve inside Jupyter |
Parametric sweep of many small models | OpenSeesPy + ProcessPool, but only in batch / HPC mode | Running ProcessPool inside a notebook |

Comments