One of the main goals of the SFM project is to contribute to scientific development, providing the community with public resources.

We have developed and tested two algorithms to study the structure of young stellar objects in star-forming regions, **S2D2** and **I****N****D****I****C****A****T****E**. Here, we describe each of them and provide updated links to references and available implementations.

**S2D2 ****S**mall **S**ignificant **D**BSCAN **D**etection

**General Description**

**S2D2** is a clustering tool that applies DBSCAN to the search for local, compact substructure in star-forming regions. Such structures are interesting because they may retain some information on the structure of the parent cloud and its fragmentation process.

The procedure **S2D2** chooses the parameters of DBSCAN epsilon and Nmin so the structures sought are small and significant.

**Epsilon calculation: Small**

We calculate the length scale for DBSCAN (epsilon) using the One point correlation function, OPCF (Joncour et al. 2017), comparing the first nearest neighbour distance distribution of the sample with the first nearest neighbour distance distribution of a homogeneous random distribution (CSR, or complete spatial randomness) with intensity rho derived from the mean of the 6th neighbor distribution of the sample, as described in Gonzalez et al. 2020.

**Nmin calculation: Significance**

We iteratively calculate the significance of a structure of epsilon scale and a fixed number of points k until we reach a specific significance (usually ~99.85%, larger than 3-sigma). The significance of a structure of size eps and k points, as described in Joncour et al. 2018, is given by the the probability of having k-1 nearest neighbours in an epsilon neighbourhood under a homogeneous random distribution with intensity rho.

**Implementations**

We offer 3 public implementations of **S2D2. **

An online basic tool will be available at: https://gavip.esac.esa.int.

The complete code is available for advanced users, in Python 3 (https://github.com/martaGG/S2D2) and R (https://github.com/martaGG/S2D2_R).

**References**

González, M., et al. 2020, A&A.

Joncour I., et al. 2018, A&A, 620, A27

Joncour, I., et al. 2017, A&A, 599, A14

**INDICATE**

**General Description**

The **I****N****D****I****C****A****T****E** tool was developed using the Anaconda distribution of Python v2.7 as part of the SFM Project.

**I****N****D****I****C****A****T****E** is a local clustering statistic which quantifies the degree of association of each point in a 2+D discrete dataset through comparison to an evenly spaced control field of the same size. Arguably **I****N****D****I****C****A****T****E**’s greatest strength is that, unlike most established clustering tools, it requires no a priori knowledge of the size, shape or substructure present in a distribution.

When applied to a dataset of size S, **I****N****D****I****C****A****T****E** derives an index I_{j,N} for every data point using:

I_{j,N} = Nr / N

where N is the nearest neighbour number (a user-defined integer) and Nr is the number of nearest neighbours to data point j within a radius of the mean Euclidean distance, r, of every data point to its Nth nearest neighbour in the control field. The index is a unitless ratio with a value in the range

0 ≤ I_{j,N} ≤ (S −1)/N such that the higher the value, the more spatially clustered a data point.

For each dataset the index is calibrated by the user so that significant values can be identified. A hundred realisations of a random distribution of the same size S, and in the same parameter space, as the dataset should be generated by the user. **I****N****D****I****C****A****T****E** is applied to the random samples to identify typical index values, I_{ran,N} of randomly distributed data points. Point j is considered spatially clustered if I_{j,N} >> I_{ran,N}.

Further details of the underlying concepts are described in Buckner et al. (2019).

**Implementations**

**I****N****D****I****C****A****T****E** is publicly available to download at: https://github.com/abuckner89/INDICATE

It will also be possible to use it online at: https://gavip.esac.esa.int

**References**

Buckner, A. S. M., Khorrami, Z., Khalaj, P., et al. 2019, A&A, 622, A184