One of the main goals of the SFM project is to contribute to scientific development, providing the community with public resources.

We have developed and tested two algorithms to study the structure of  young stellar objects in star-forming regions,  S2D2 and  INDICATE. Here, we describe each of them and provide updated links to references and available implementations.

S2D2 Small Significant DBSCAN Detection

General Description

S2D2 is a clustering tool that applies DBSCAN to the search for local, compact substructure in star-forming regions. Such structures are interesting because they may retain some information on the structure of the parent cloud and its fragmentation process.

The procedure S2D2 chooses the parameters of DBSCAN epsilon and Nmin so the structures sought are small and significant.

Epsilon calculation: Small
We calculate the length scale for DBSCAN (epsilon) using the One point correlation function, OPCF (Joncour et al. 2017), comparing the first nearest neighbour distance distribution of the sample with the first nearest neighbour distance distribution of a homogeneous random distribution (CSR, or complete spatial randomness) with intensity rho derived from the mean of the 6th neighbor distribution of the sample, as described in Gonzalez et al. 2020.

Nmin calculation: Significance
We iteratively calculate the significance of a structure of epsilon scale and a fixed number of points k until we reach a specific significance (usually ~99.85%, larger than 3-sigma). The significance of a structure of size eps and k points, as described in Joncour et al. 2018, is given by the the probability of having k-1 nearest neighbours in an epsilon neighbourhood under a homogeneous random distribution with intensity rho.


We offer 3 public implementations of S2D2.

An online basic tool will be available at:

The complete code is available for advanced users, in Python 3 ( and R (


A catalog of the significant substructure found in four star-forming regions (Taurus, IC348, Upper Scorpius and Carina), described in González et al, 2020, is available at:


González, M., et al. 2020, A&A.

Joncour I., et al. 2018, A&A, 620, A27

Joncour, I., et al. 2017, A&A, 599, A14


General Description

The INDICATE tool was developed using the Anaconda distribution of Python v2.7 as part of the SFM Project.

INDICATE is a local clustering statistic which quantifies the degree of association of each point in a 2+D discrete dataset through comparison to an evenly spaced control field of the same size. Arguably INDICATE’s greatest strength is that, unlike most established clustering tools, it requires no a priori knowledge of the size, shape or substructure present in a distribution.

When applied to a dataset of size S, INDICATE derives an index I_{j,N} for every data point using:

I_{j,N} = Nr / N

where N is the nearest neighbour number (a user-defined integer) and Nr is the number of nearest neighbours to data point j within a radius of the mean Euclidean distance, r, of every data point to its Nth nearest neighbour in the control field. The index is a unitless ratio with a value in the range 0 ≤ I_{j,N} ≤ (S −1)/N such that the higher the value, the more spatially clustered a data point.

For each dataset the index is calibrated by the user so that significant values can be identified. A hundred realisations of a random distribution of the same size S, and in the same parameter space, as the dataset should be generated by the user. INDICATE is applied to the random samples to identify typical index values, I_{ran,N} of randomly distributed data points. Point j is considered spatially clustered if I_{j,N} >> I_{ran,N}.

Further details of the underlying concepts are described in Buckner et al. (2019).


INDICATE is publicly available to download at:

It will also be possible to use it online at:


Buckner, A. S. M., Khorrami, Z., Khalaj, P., et al. 2019, A&A, 622, A184