Tuesday 15th July 2014 10:15:49 UTC (over 3 years ago)
Felix Höfling writes:

 > We have had some discussion on the time dataset in the context of Monte
 > Carlo simulations. If I remember well the outcome was that in the case
 > no physical time, time is simply identical (=linked) to step.

That's not an option in H5MD 1.0.0 because "step" must be an integer and
"time" a float.

 > Our intentation was to have no optional parts in the core H5MD
 > element---for the sake of making reading simple.

The whole "particles" and "observables" groups are optional!

 > Whether such a decision was wise or not, I don't know. But it has
 > been fixed now for H5MD 1.x. Making step or time optional would
 > break compatibility with 1.0 and would make 1.0 basically
 > obsolete.

Right. But sooner or later, that will happen. As people start using
H5MD for more and more applications, weaknesses in the definition will
appear and need to be fixed in a new version. Otherwise people will
simply bend the rules and create somewhat non-conforming files. That's
what has happened with the PDB format.

I think it would be useful to collect feedback from H5MD users and
compile a list of recommendations: How should I represent ??? in H5MD?
Then, after a while, see which solutions are not really good ones,
and take them into account in a revision of H5MD.

 > Thus I don't think it is a good idea. Nevertheless, I am
 > open to extend the interpretation of step/time (but the fields must
 > be present). For example, step could also just numerate the
 > snapshots stored, without reference to any simulation order.

I have less problems with "step" than with "time", although I do see
situations where, like Olaf described, the step values are made up and
meaningless. But at least numbering steps from 1 to N doesn't create
any false illusions about what information is available. Making up
time values is worse because it suggests to the reader that there is
some meaningful time-like quantity in the simulation.

Pierre de Buyl writes:

 > In a more general idea about step/time, I have an idea since a long
time. I
 > didn't want it for H5MD 1.0 to avoid any confusion. But storing step and
 > when step is simply step[i] = STEP_SIZE*i and time[i] = STEP_SIZE*DT*i
is a bit
 > of a waste. We could define a proper setup for regularly sampled data,
for which
 > step[0], STEP_SIZE, time[0] and DT should be given.

Good idea, and not just to avoid wasting space. It would also contain
the message to the reader "this is regularly sampled data". For some
analyses this makes a big difference. For example, computing time
correlation functions of regularly sampled data is straightforward and
efficient, whereas it is cumbersome, slow, and imprecise for irregular
time series.

Right now, the only way to check if a time series is regular is to
check all the time labels. However, these are floats and thus subject
to round-off error. I'll bet that in practice, analysis software will
simply assume the time series to be equally spaced and not bother to
check. I'll also bet that sooner or later this will lead to wrong
results being published.

