Features Download

From: Hart, David Blaine <dbhart-4OHPYypu0djtX7QSmKvirg <at> public.gmane.org>
Subject: Re: [EXTERNAL] Re: topology
Newsgroups: gmane.science.simulation.h5md.user
Date: Friday 2nd May 2014 18:43:10 UTC (over 4 years ago)
Hi Pierre and Konrad,

> > Before discussing the technicalities, please define the scope of what
> > you call "topology". Which kinds of molecular models do you wish to
> > cover? Which categories of systems do you want to handle? And what are
> > the use cases for the information you plan to store?
> The scope is not defined yet. I would like to discuss the needs and
> experiences before we can decide something. There might be no universal
> solution, a case in which there could be different topology modules.
> I described my use case: store list of indices that represent bonded
> interaction in coarse-grained simulations. I don't expect my use case to
> generic :-)

The group I work with does a fair amount of analysis on bond lengths, angle
distributions and dihedral angle distributions as validation for force
field development. So our main use case for a 'topology' module would be to
store the pairs, 3- and 4-tuples of atoms that define the bonds, angles and

That said, I have use cases for wanting to store lists of non-bonded pairs
of atoms such as opposing carbons on a ring or designated 'endpoint' atoms
that can be used to represent the overall orientation of a larger molecule,
and in this use case a specific 'bond' list would not really be
> >  > Depending on the direction of the discussion this could become
either a module  
> >  > or a part of the specification itself.
> >
> > Unless we can come up with something good enough for all kinds of
> > particle-based simulations (which I doubt), it's better to make it a
> > module in order to allow for alternatives. One of the nice aspects of
> is its generality.
> No problem.
> >  > 2. Within the groups, H5MD elements store bonds as
> [N_bonds][bond_order] data.
> >  > For pairs, bond_order=2, for instance. This allows to store angles
> > and
> >
> > Please note that the term "bond order" is already used in chemistry
> > for something different: the number of electrons implied in a covalent
> > bond. A chemist would take bond_order=2 for a double bond.
> Noted. I have no specific name in mind for this, though.
> > I have spent a lot of time thinking about these issues for MOSAIC, and
> > a part of the background behind the decisions that lead to MOSAIC 1.0
> > is described in the paper (free download at
> > http://pubs.acs.org/articlesonrequest/AOR-dADBta6jVTVtVb6bbGmJ,
> > you need to create an ACS account for that). Note that the scope of
> > MOSAIC is different from H5MD, so the considerations to apply are not
> > the same, but there are many common points nevertheless.
> >
> >
> > One of the most important lessons from MOSAIC design, which I think
> > carries over to H5MD, is the need for both generic data structures and
> > precisely defined data items. For example of chemical bonds, that
> > would mean a generic data structure for storing pairs (or even
> > N-tuples) of particle indices, with some way of attaching semantic
> > information such as a text label. A bond list would then be stored as
> > a list of pairs with the "bonds" label.
> >
> > If you provide only the generic data structure for pairs, then
> > everyone will come up with a different label for bonds, creating chaos
> > without any real gain in flexibility. If you provide only a bond list
> > but not a generic pair list, people will abuse the bond list for other
> > pair-related applications. The history of the PDB format provides lots
> > of examples of such abuses due to a lack of flexibility.

I am totally guilty of abusing the PDB (and CAR and MDF) formats to sneak
in extra information. :-)

> >
> > H5MD actually applies this principle very well until now, so let's
> > keep that spirit for defining additional data items.
> Keeping all of that in mind, it would be beneficial to have an
> "storage scheme" (such as "[N_bonds][bond_order]" from my message) for
> connectivity-related modules, to avoid duplicating the work.
> Pierre

I would definitely agree that a standard storage scheme, would be useful,
even if the structure of a 'topology' section varies by module. That way,
even if the location of the lists varies, at least the same routines can be
used to read/write the data when it is located. As a possible example:

  +-- type:  String
  +-- dimension:  Integer[]
  \--  values: Integer[n-tuples][D]

Where a pair list would be dimension(D) = 2, and the values in list would
be the particle IDs. Given what Konrad pointed out about flexibility vs.
specificity ,the "type" string could have specific list of acceptable
values, much like the boundary attribute in the 'box' item. For my
particular use cases, I can see the following types of atom-tuple lists
being useful as topology/connectivity information: bonds, angles,
dihedrals, chains, and 'other'.

CD: 15ms