Background: HPC problems potentially amenable to database techniques

Exascale HPC platforms will share characteristics with large-scale data processing platforms: relatively small main memory per node, relatively slow communication between nodes, and IO a limiting factor for performance.

Further, HPC applications frequently experience “data challenges” beyond just IO performance. For example, some chemistry codes may produce a very small output but construct and query enormous distributed data structures during processing.

Finally, as HPC computations become longer-running, there is a need to make them more interactive. Ad hoc science questions such as “Which regions of the domain exhibit temperatures above some threshold?” and ad hoc systems monitoring questions such as “Which network links are congested?” can potentially be modeled, expressed, and optimized as queries using an approach similar to previous work on high performance querying of sensor networks.