Memory Across the Many Faces of OpenSees

silviamazzoni
5 days ago
14 min read

How Tcl, Python, and parallel patterns chnage what "using more memory" actually means

This post is another postcard from one of my travels through the looking glass with ChatGPT — which means it’s not always obvious what’s solid ground and what might be a GPT hallucination.

My goal here is not to hand you a final answer, but to give you enough structure that you start questioning what you think you know about memory in OpenSees.

In this second memory post, I’m zooming out from Jupyter and focusing on the engines and parallel patterns themselves.

At a high level, we really have six distinct ways of running the same C++ core:

Tcl engines
- OpenSees (classic, single process)
- OpenSeesMP (MPI, many ranks)
- OpenSeesSP (single process with parallel solver / domain decomposition)
Python engines
- OpenSeesPy (single process, Python + C++)
- OpenSeesPy + concurrent.futures (many workers on one node)
- OpenSeesPy + mpi4py (many MPI ranks, each with Python + C++)

All six share the same C++ OpenSees engine, so the kind of memory they allocate—domains, elements, matrices, solvers, histories—is fundamentally similar. What changes is how many times you pay for it and what extra layers sit on top.

The focus of this post is only memory. Not “which one is fastest,” not “which syntax is nicer,” but:

How many processes exist (one vs many MPI ranks vs many Python workers)
How many copies of the domain exist (one per rank / per worker vs shared)
Whether there’s a language runtime sitting on top (Python for OpenSeesPy)
How you tend to use them (notebooks, scripts, Tapis jobs, MPI batch runs)

The usual question is: “Is memory important?” And the usual honest answer is: it depends. My goal here is to make it clearer what it depends on, so that when a job dies with an out-of-memory error, you have a mental model of why.

With that framing, let’s start with the big-picture comparison, then work our way down into the details and rules of thumb.

I. Big Picture Comparison

OpenSees, OpenSeesMP, OpenSeesSP, and OpenSeesPy all use the same core C++ engine (OpenSees), so the type of memory they allocate (domains, matrices, solvers, histories) is similar.

What really changes is:

How many processes exist (one vs many MPI ranks vs many Python workers)
How many copies of the domain exist (single shared domain vs one per rank/worker)
Whether there’s a language runtime on top (Tcl only vs Python + C++)
How you tend to run them (scripts, notebooks, MPI jobs, parametric sweeps, etc.)

From a memory perspective, each of the six patterns is basically:

OpenSees domain memory+ optional Python runtime and data× (number of ranks or workers that hold a domain)

II. High-Level Comparison Table

Engine / Pattern	Execution Model	Parallelism Type	Memory per “unit”	Scaling Behavior	Typical Use
OpenSees	Single native C++ process	None (serial)	1 domain, 1 process	Memory ∝ model size	Local / HPC
OpenSeesSP	Single process, distributed solve (domain decomposition / parallel solver)	Shared-memory / hybrid-ish	1 domain, augmented solver structures	Slightly > OpenSees for same model	HPC node / cluster
OpenSeesMP	Multiple MPI ranks	MPI (distributed)	1 domain per rank (unless coded otherwise)	Memory ∝ (model size × # ranks) + comm	HPC clusters only
OpenSeesPy	Python + C++ domain in-process	Serial (unless you bolt on MPI)	C++ domain plus Python runtime + arrays + plotting	Memory ∝ model size + Python overhead	Jupyter, Python scripts
OpenSeesPy + concurrent.futures	N Python workers (threads or processes)	Task-level parallel	Threads: 1 domain shared (if safe) Procs: 1 domain per worker	Threads: similar to OpenSeesPy; Procs: ≈ N_workers × OpenSeesPy	Param sweeps
OpenSeesPy + mpi4py	N MPI ranks, each w/ Python + C++	Distributed MPI	~1 domain per rank + Python runtime per rank	≈ N_ranks × (OpenSees memory + Python)	HPC / clusters

A. OpenSees (classic Tcl interpreter)

Execution & memory model

1 OS process:
- Tcl interpreter + C++ OpenSees engine in the same process.
Memory consumers:
- Domain: nodes, elements, materials, constraints, recorders.
- Solver: stiffness matrices, factorizations, solver workspace.
- Analysis objects: integrator, algorithm, system, convergence tests.
- Tcl objects (script, variables) – usually small compared to FE data.

Memory behavior

Scales roughly linearly with:
- Number of DOF
- Number of elements/materials
- Complexity of solver (Newton + Krylov vs simple LU, etc.).
No domain replication: exactly one domain in memory.
When the process exits, the OS returns everything.

Typical failure mode

On large models: malloc/new failure, or OS OOM-kill.
Usually clean – when it dies, memory is gone.

Best use case (memory-wise)

Large but single-run FE models where you want lean, predictable memory.
Good baseline for “this is the minimum memory this model needs”.

B. OpenSeesSP

(Details depend on the exact SP implementation, but conceptually):

Execution & memory model

Still one executable process on each participating node, but:
- Uses a parallel solver / domain decomposition strategy.
- Domain often stored once, but solver data structures are bigger/staged differently.

Memory behavior

Compared to plain OpenSees (same model):

Domain memory is similar or slightly larger (extra bookkeeping).
Solver memory (global system matrices, parallel data structures) is larger, especially for:
- Iterative parallel solvers
- Overlapping subdomains
- Ghost nodes / interface DOF
Still not replicated per rank the way OpenSeesMP is.

Pros vs OpenSees

Can handle somewhat larger systems on the same hardware.
Better use of node resources for big linear algebra.

Cons vs OpenSees

More opaque memory footprint.
Harder to predict exact RAM needs because solver internals matter more.

C. OpenSeesMP (MPI)

This is where memory really changes.

Execution & memory model

N MPI ranks ⇒ N OS processes (one per rank).
In the common use pattern:
- Each rank holds its own domain (or a largely complete copy).
- Each rank has its own solver objects, matrices, vectors, etc.
You may also have:
- Duplicate input: each rank reads similar input files.
- Redundant state: recorders or outputs per rank.

Memory behavior

For many OpenSeesMP workflows:

Effective memory usage ≈ N_ranks × (memory of “equivalent” OpenSees model) overhead for communication buffers, MPI, etc.

So if a model takes ~8 GB in plain OpenSees and you naïvely run it on 8 ranks with full domain replication, total memory across the node can easily approach ~64 GB (plus overhead).

Even if domain decomposition is used more cleverly, the pattern is:

Per-rank memory is non-trivial.
Total memory across node grows significantly with rank count.

Where this bites you

On HPC nodes with many ranks:
- You can hit node memory limits even if each rank is under some per-process mental threshold.
On Jupyter or shared nodes, this is basically untenable (hence “don’t run OpenSeesMP from a notebook”).

Best use case (memory-wise)

When you genuinely need distributed memory parallelism and:
- Each rank handles a controllable portion of the model, or
- You’ve designed your partitioning so per-rank memory is modest.
Best deployed with careful MPI design + batch jobs, not exploratory runs.

D. OpenSeesPy (single sequential)

Execution & memory model

1 OS process:
- CPython interpreter + C++ OpenSees engine in the same process.
Memory consumers:
1. C++ OpenSees domain (same flavor as Tcl OpenSees).
2. Python runtime:
  - Interpreter, garbage collector, internal structures.
3. Python objects:
  - NumPy arrays, Pandas DataFrames, Python lists/dicts.
  - Copies of geometry, loads, results, etc.
4. Plotting / post-processing libraries:
  - Matplotlib, Plotly, PyVista, etc. can be very memory-hungry.
5. Notebook overhead (if in Jupyter):
  - Stored outputs, hidden references (Out[], , _, etc.).

Memory behavior vs Tcl OpenSees

Baseline FE memory (nodes/elements/matrices) ≈ same order as OpenSees.
On top of that, you can easily double or triple the footprint if you:
- Store full time histories in NumPy arrays.
- Keep multiple versions of geometry/results in memory.
- Create many plots and never close them.
- Rebuild the domain multiple times in one kernel session.

Lifetimes & leaks

C++ domain memory lifespan:
- Created when model()/elements are defined.
- Partially cleaned by wipe(), but not always fully returned.
- Fully freed only when the Python process exits (e.g., restart kernel).
Python objects:
- Subject to reference counting + garbage collector.
- Live until all references disappear.

Result: in a notebook, memory tends to creep upward more than in pure Tcl OpenSees for the same model/use pattern.

Best use case (memory-wise)

Small to medium models.
Heavy post-processing where the Python ecosystem is worth the overhead.
Teaching, debugging, prototyping, and post-processing HPC outputs.

E. OpenSeesPy + concurrent.futures

This pattern usually shows up in parametric studies, ensemble runs, or small in

dependent analyses where you want to farm out multiple runs from one Python driver.

Execution & memory model

There are two distinct modes:

a. ThreadPoolExecutor

from concurrent.futures import ThreadPoolExecutor

def run_case(params):
    # uses OpenSeesPy in the same process
    ...

with ThreadPoolExecutor(max_workers=4) as ex:
    ex.map(run_case, cases)

1 OS process, multiple threads.
Single CPython interpreter.
Single OpenSeesPy C++ domain per process, unless you explicitly create/destroy different domains per thread (which is generally not thread-safe and not recommended).
Threads share:
- Python heap
- C++ domain
- Global OpenSees state (this is the scary part).

Memory behavior:

Domain memory: essentially the same as plain OpenSeesPy.
Extra memory: thread stacks, extra Python objects per task.
Total RAM footprint is not multiplied dramatically, but:
- You can easily create a logical mess (races in the OpenSees domain).
- OpenSees is not designed as a thread-safe engine for multiple simultaneous analyses in a shared domain.

Conclusion:

Memory-efficient, but not safe for independent simultaneous analyses.
Reasonable for lightweight parallel post-processing of results already on disk, where you don’t touch the OpenSees domain.

b. ProcessPoolExecutor

(what most people actually want for parallel FE runs)

from concurrent.futures import ProcessPoolExecutor

def run_case(params):
    # each worker is a fresh Python process
    # often: build domain + run + write results
    ...

with ProcessPoolExecutor(max_workers=4) as ex:
    ex.map(run_case, cases

N OS processes, each:
- its own Python interpreter
- its own C++ OpenSeesPy domain
No shared memory between workers (beyond OS tricks like copy-on-write at fork).

Memory behavior:

Each worker behaves like an independent OpenSeesPy run.
Total memory ≈ N_workers × (domain memory + Python overhead + data/plots per worker).
If one worker’s model uses ~4 GB and you launch 4 workers on a 16 GB node:
- You’re at ~16 GB just for domains, plus Python + overhead → likely OOM.

DesignSafe / Jupyter implications:

Easy to blow through your memory allocation if you oversubscribe workers.
In a notebook, ProcessPoolExecutor is extra fragile:
- Workers inherit the notebook’s already-inflated state at fork.
- RAM usage can jump immediately by N_workers × baseline_kernel_RAM.

c. Best practice for FE with concurrent.futures:

Prefer ProcessPoolExecutor, but only:
- for relatively small models per worker, or
- when workers run on different nodes via batch jobs, not inside a single Jupyter kernel.
Treat it like running N separate OpenSeesPy jobs.

F. OpenSeesPy + mpi4py

This pattern is conceptually similar to OpenSeesMP, but with a Python layer on top. Typical structure:

from mpi4py import MPI
from openseespy.opensees import *

comm = MPI.COMM_WORLD
rank = comm.Get_rank()

# Each rank builds its own domain or a partition of a domain

Execution & memory model

mpiexec -n N python script.py → N OS processes.
Each process:
- runs a CPython interpreter,
- imports OpenSeesPy,
- constructs its own C++ domain (full or partial),
- communicates via mpi4py (wrapping MPI).

Memory per rank:

C++ domain memory (similar to OpenSees/OpenSeesMP for that subdomain).
Python runtime memory (~hundreds of MB depending on imports and data structures).
Python-side arrays / data structures (e.g., for assembling results, controlling workflow).

Total memory across the job:

Roughly:[\text{Total RAM} \approx N_\text{ranks} \times (\text{OpenSees domain} + \text{Python runtime} + \text{Python data})]
If you do one full domain per rank (naïve pattern), this is like:
- OpenSeesMP’s full replication cost plus Python per rank.

If you implement a proper domain decomposition:

Each rank’s domain is smaller, so:
- Domain memory per rank drops.
- But Python overhead stays per-rank.

Result: for large N, Python overhead is non-negligible, especially on memory-constrained nodes.

Jupyter / DesignSafe implications

Like OpenSeesMP, you generally do not launch this from a notebook.
You run mpiexec in:
- a batch job script, or
- a terminal session (e.g., on Stampede3).

Inside a Jupyter kernel, trying to do MPI with mpi4py:

Is brittle (multi-process inside a managed kernel).
Can confuse Jupyter’s IO, signals, etc.
Can saturate node memory quickly if you build large domains per rank.

Where it makes sense

When you want:
- MPI-level parallelism, plus
- Python-level orchestration (smart control logic, adaptive workflows, etc.).
Good for:
- sophisticated parametric or multi-physics workflows
- where each rank runs OpenSeesPy as a “worker” controlled by Python.

But: memory-wise, you pay:

Domain memory per rank (like OpenSeesMP)
Python memory per rank

So you need to be even more careful with:

rank count
model size per rank
node memory capacity

III. Where These Two Fit in the Overall Story

a. OpenSeesPy + concurrent.futures (ProcessPool) vs OpenSeesMP

For FE-heavy workloads:

OpenSeesMP:
- C++ only, no Python per rank.
- Memory per rank = domain + solver + MPI.
- Total memory scaling is still high for full replication, but leaner than Python.
OpenSeesPy + ProcessPoolExecutor:
- Multiple Python interpreters + domains on one node.
- Memory per worker = OpenSees domain + Python + data.
- Usually heavier per rank/worker than OpenSeesMP for the same domain.

So if the goal is “big single model distributed across ranks,” OpenSeesMP is the more memory-efficient choice.

concurrent.futures shines when:

Each worker runs a small to mid-sized, independent model.
You’re doing embarrassingly parallel param sweeps, not partitioning one massive domain.

b. OpenSeesPy + mpi4py vs OpenSeesMP

OpenSeesMP:
- MPI & domain decomposition implemented in C++.
- No Python overhead per rank.
OpenSeesPy + mpi4py:
- Everything OpenSeesMP does (conceptually), plus:
  - Python runtime per rank
  - Python-side logic
  - Python arrays & state

Memory-wise, for the same domain partitioning:

Total RAM for OpenSeesPy + mpi4py ≥ OpenSeesMP.(Same domain memory + extra Python overhead × ranks.)

You choose OpenSeesPy + mpi4py when:

You need Python for:
- adaptive control
- complex IO
- integration with other Python tools
And you’re willing to pay extra in RAM and complexity.

IV. Quick “Memory-Conscious Choice” Guide

If your main constraint is RAM on a node / job, then very roughly:

Best (leanest) memory per DOF:
1. OpenSees (Tcl)
2. OpenSeesSP
3. OpenSeesMP
4. OpenSeesPy
5. OpenSeesPy + concurrent.futures (ProcessPool)
6. OpenSeesPy + mpi4py (Python + MPI + domain per rank)
When to prefer each:
- Use OpenSees / OpenSeesSP / OpenSeesMP for:
  - biggest models
  - production HPC runs
  - when memory is tight and you don’t need Python during the solve.
- Use OpenSeesPy (single process) for:
  - small/medium models
  - interactive work, teaching, and post-processing.
- Use OpenSeesPy + concurrent.futures (ProcessPool) for:
  - many small independent runs (param sweeps) where each run is modest in size.
- Use OpenSeesPy + mpi4py for:
  - advanced MPI workflows where Python orchestration is a must,
  - and nodes have enough memory to carry the Python overhead per rank.

V. How They Compare in Practice (Rules of Thumb)

For a given model size:

A. Small models (e.g., < 20k DOF)

At this scale:

Memory footprints are small across all engines.
Differences are dominated by your workflow, not the engine architecture.
Python overhead (OpenSeesPy) is negligible in absolute terms.
Parallel strategies typically don’t matter for RAM:

Engines that work perfectly fine:

OpenSees (Tcl)
OpenSeesSP
OpenSeesMP (though unnecessary)
OpenSeesPy
OpenSeesPy + concurrent.futures (threads or processes)
- Workers are small, so even ProcessPool duplication is manageable.
OpenSeesPy + mpi4py
- Per-rank Python overhead is modest when DOF is small.

Bottom line: You can use anything here — memory is almost never the limiting factor.

B. Medium models (≈ 20k–200k DOF)

Here memory overhead begins to matter. Patterns start to diverge.

Most memory-efficient and stable:

OpenSees (Tcl) Leanest per DOF.
OpenSeesSP Slightly larger than Tcl OpenSees, but efficient for repeated runs.

Still good, but requires care:

OpenSeesPy
- Python adds overhead but still manageable.
- Avoid storing large NumPy arrays or full-history data in RAM.
- Restart kernel between major experiments to avoid C++ residual memory.

Use cautiously:

OpenSeesMP
- Can be overkill memory-wise unless truly parallelized.
- If each rank holds a full domain copy, memory multiplies quickly.
OpenSeesPy + concurrent.futures (ProcessPool)
- Each worker = its own Python + domain copy →memory ~ N_workers × (OpenSeesPy footprint)
- Use only if:
  - workers handle small-to-mid models, OR
  - each worker runs on a separate HPC node via SLURM/Tapis.
OpenSeesPy + mpi4py
- More memory-heavy than OpenSeesMP for the same decomposition because each rank has:C++ domain + Python interpreter + Python-side data
- Reasonable if Python-level control logic is essential, but pricier in RAM.

Bottom line:

Tcl OpenSees and OpenSeesSP remain memory winners; OpenSeesPy is fine with discipline; MPI-based Python workflows require active RAM budgeting.

C. Large models (hundreds of thousands to millions of DOF)

Here memory is the governing constraint, and the engine choice critically affects feasibility.

The only practical option for distributed-memory scalability:

OpenSeesMP
- True MPI domain decomposition.
- Necessary for multi-million DOF models.
- BUT:
  - Expect large total memory across ranks.
  - Requires careful domain partitioning & batch scripting.
  - Strong scaling depends on high-quality MPI decomposition.

A possible bridge (depending on node memory and solver type):

OpenSeesSP
- Useful on well-provisioned nodes.
- More memory-efficient than Python-based MPI solutions.
- Still single-process, but with parallel solver acceleration.

Generally not suitable for large-model solving:

OpenSeesPy
- Python overhead + large C++ domain → too heavy for million-DOF scale.
- Feasible only for post-processing (reading results) at this size.

Usually inappropriate for solving large models:

OpenSeesPy + concurrent.futures (ProcessPool)
- Each worker replicates the entire large domain →explosive memory usage, unusable on shared nodes.
OpenSeesPy + concurrent.futures (threads)
- Threading not safe for shared-domain FE analyses;solver and domain structures are not thread-safe.
OpenSeesPy + mpi4py
- In theory, could handle large models with proper MPI decomposition.
- In practice:
  - Per-rank Python overhead is high.
  - Python memory scales with rank count.
  - Total RAM can exceed OpenSeesMP by a large margin.
- Only viable on large-memory HPC nodes and with careful design.

Bottom line:

For large FE models:

Solve with OpenSeesMP or OpenSeesSP (depending on scale).
Use OpenSeesPy only for preprocessing or post-processing.
Avoid Python-based multi-rank approaches for solving unless you have large-memory nodes and a very specific need for Python control logic.

Flowchart: Choosing the Right OpenSees Engine Based on Memory, Model Size, and Workflow

                                   ┌──────────────────────────┐
                                   │   Start: What are you    │
                                   │    trying to do?         │
                                   └─────────────┬────────────┘
                                                 │
                                ┌────────────────┴─────────────────┐
                                │                                   │
                      Solve a LARGE FE model                Solve a SMALL/MEDIUM model
                      ( ≥ 200k–1M+ DOF )                    ( ≤ 200k DOF )
                                │                                   │
                                ▼                                   ▼
               ┌─────────────────────────────────┐      ┌──────────────────────────────┐
               │ Need distributed-memory MPI?     │      │ Running inside Jupyter?      │
               └─────────────┬───────────────────┘      └──────────────┬──────────────┘
                             │                                          │
                     Yes     │     No                                  Yes
                             │                                          │
                             ▼                                          ▼
      ┌──────────────────────────────────────┐          ┌──────────────────────────────────┐
      │ Use **OpenSeesMP**                   │          │ Use **OpenSeesPy**               │
      │ - Multi-rank MPI                     │          │ - Good for interactive use       │
      │ - Requires batch/Tapis jobs          │          │ - Restart kernel often           │
      │ - Most memory-efficient MPI option   │          │ - Avoid large in-memory results  │
      └──────────────────────────────────────┘          └──────────────────────────────────┘
                             │                                          │
                             ▼                                          ▼
       ┌─────────────────────────────────────┐         ┌─────────────────────────────────────┐
       │ Need solver acceleration on one      │         │ Need parallelizing MANY small runs?│
       │ large-memory node?                   │         └────────────────┬───────────────────┘
       └──────────────┬──────────────────────┘                          │
                      │                                               Yes │ No
                      ▼                                                  ▼
       ┌─────────────────────────────────────┐        ┌──────────────────────────────────────┐
       │ Use **OpenSeesSP**                  │        │ Use **OpenSeesPy + concurrent.futures** │
       │ - Parallel solvers in one process   │        │ (ProcessPool)                        │
       │ - Moderately higher mem than Tcl    │        │ - Independent small/medium models    │
       │ - Good for large static & modal     │        │ - Avoid in a notebook (RAM blow-up)  │
       └─────────────────────────────────────┘        └──────────────────────────────────────┘
                               │
                               ▼
       ┌──────────────────────────────────────┐
       │ Considering Python + MPI logic?      │
       └───────────────────┬──────────────────┘
                           │
                        Yes │ No
                           ▼
       ┌──────────────────────────────────────┐
       │ Use **OpenSeesPy + mpi4py**          │
       │ - Domain per rank + Python per rank  │
       │ - More RAM than OpenSeesMP           │
       │ - Only viable for HPC batch jobs     │
       └──────────────────────────────────────┘

Cheat Sheet: Memory Behavior & Recommended Use Cases

This is the compact comparison table you can place right after your flowchart.

A. Memory Footprint Behavior (relative, same model size)

Engine / Pattern	Memory Footprint (Relative)	Why
OpenSees (Tcl)	★ Lowest	No Python; one domain; lean C++
OpenSeesSP	★★ Low–Medium	Parallel solver data structures
OpenSeesMP	★★★ Medium–High	Domain often replicated per rank
OpenSeesPy	★★ Medium	Python runtime + domain
OpenSeesPy + concurrent.futures (threads)	★★ Medium (unsafe)	Shared domain → not thread-safe
OpenSeesPy + concurrent.futures (processes)	★★★ High	Python + domain per worker
OpenSeesPy + mpi4py	★★★★ Highest	MPI ranks + Python per rank + domain per rank

B. Best Use Cases

Engine / Pattern	Best For	Avoid When
OpenSees (Tcl)	Large single-run models; production HPC; memory-constrained runs	Complex Python-based workflows
OpenSeesSP	Large static/eigen problems on one node; moderate memory growth	Truly distributed problems
OpenSeesMP	Very large models; strong MPI scaling; HPC batch runs	Jupyter, low-memory nodes
OpenSeesPy	Medium models; prototyping; teaching; post-processing; Jupyter	Large models; huge arrays; long-run analyses
OpenSeesPy + concurrent.futures (threads)	Parallel post-processing (not FE solving)	Any FE solve (domain not thread-safe)
OpenSeesPy + concurrent.futures (processes)	Independent small/medium parametric runs	Notebook environments; large models per worker
OpenSeesPy + mpi4py	MPI workflows + Python control logic; advanced research	Memory-constrained nodes; multi-million DOF solves

C. Simple “When to Use What” Summary

Model Size	Most Memory-Efficient Engine	Engines Allowed	Engines to Avoid
Small (<20k DOF)	OpenSees (Tcl)	All engines/code paths fine	None (memory-wise)
Medium (20k–200k DOF)	OpenSees (Tcl), OpenSeesSP	OpenSeesPy (carefully), OpenSeesMP (with partitioning)	OpenSeesPy ProcessPool if many workers
Large (>200k–1M DOF)	OpenSeesMP, OpenSeesSP	Tcl OpenSees (if single node), OpenSeesPy (only for post-processing)	OpenSeesPy ProcessPool, OpenSeesPy_MPI in tight memory
Huge (1M+ DOF)	OpenSeesMP	OpenSeesSP (if node RAM permits)	All Python-based solve workflows

D. Jupyter/DesignSafe Quick Reference

Scenario	Recommended	Not Recommended
Interactive small model	OpenSeesPy	Any MPI or multi-process approach
Teaching / demos	OpenSeesPy	OpenSeesMP
Post-processing HPC output	OpenSeesPy	None
Solving a large model	Submit a Tapis job running OpenSees or OpenSeesMP	Running solve inside Jupyter
Parametric sweep of many small models	OpenSeesPy + ProcessPool, but only in batch / HPC mode	Running ProcessPool inside a notebook

I. Big Picture Comparison

II. High-Level Comparison Table

A. OpenSees (classic Tcl interpreter)

Execution & memory model

Memory behavior

Typical failure mode

Best use case (memory-wise)

B. OpenSeesSP

Execution & memory model

Memory behavior

Pros vs OpenSees

Cons vs OpenSees

C. OpenSeesMP (MPI)

Execution & memory model

Memory behavior

Where this bites you

Best use case (memory-wise)

D. OpenSeesPy (single sequential)

Execution & memory model

Memory behavior vs Tcl OpenSees

Lifetimes & leaks

Best use case (memory-wise)

E. OpenSeesPy + concurrent.futures

Execution & memory model

a. ThreadPoolExecutor

Memory behavior:

Conclusion:

b. ProcessPoolExecutor

(what most people actually want for parallel FE runs)

Memory behavior:

DesignSafe / Jupyter implications:

c. Best practice for FE with concurrent.futures:

F. OpenSeesPy + mpi4py

Execution & memory model

Memory per rank:

Total memory across the job:

Jupyter / DesignSafe implications

Where it makes sense

III. Where These Two Fit in the Overall Story

a. OpenSeesPy + concurrent.futures (ProcessPool) vs OpenSeesMP

b. OpenSeesPy + mpi4py vs OpenSeesMP

IV. Quick “Memory-Conscious Choice” Guide

V. How They Compare in Practice (Rules of Thumb)

A. Small models (e.g., < 20k DOF)

Engines that work perfectly fine:

B. Medium models (≈ 20k–200k DOF)

Most memory-efficient and stable:

Still good, but requires care:

Use cautiously:

C. Large models (hundreds of thousands to millions of DOF)

The only practical option for distributed-memory scalability:

A possible bridge (depending on node memory and solver type):

Generally not suitable for large-model solving:

Usually inappropriate for solving large models:

Bottom line:

Flowchart: Choosing the Right OpenSees Engine Based on Memory, Model Size, and Workflow

Cheat Sheet: Memory Behavior & Recommended Use Cases

A. Memory Footprint Behavior (relative, same model size)

B. Best Use Cases

C. Simple “When to Use What” Summary

D. Jupyter/DesignSafe Quick Reference

Comments