Day: January 30, 2025

What is Data SGP?

Data SGP is a suite of classes, functions and datasets that enables the calculation of student growth percentiles (SGP) and percentile growth projections/trajectories using large scale, longitudinal education assessment data. SGP analyses use quantitative methods such as percentile regression to estimate the conditional density matrix of a student’s achievement history and then construct a simulated growth trajectory that projects how far a student will need to travel to achieve a specific performance target.

Compared to other kinds of statistical analysis, SGP analyses are relatively simple and quick to run. However, they depend on accurate data preparation to be successful so there is a fair amount of back and forth between data preparation and SGP analyses. Fortunately, most problems encountered when running SGP analyses revert back to issues with data preparation and can be fixed relatively quickly.

The most common error messages that SGP users receive revolve around incorrect or missing values in the data. These errors are usually the result of an oversight during the data preparation process and can be corrected by reviewing the SGP documentation on GitHub and following the suggested steps to correct these problems.

SGP analyses are very sensitive to the type and number of variables used. For this reason, it is important to carefully review the SGP documentation and use of the sgptdata package when setting up data for an SGP analysis. This documentation is available on GitHub and also includes a helpful flow chart that illustrates the process of creating an SGP analysis.

There are many different ways that SGP analyses can be configured and sgptdata provides a set of data sets for users to try out these various configurations. The sgptData_LONG data set contains an anonymized panel data set that includes 8 windows (3 windows annually) of assessment data in long format for 3 content areas (Early Literacy, Math, Science). The sgpData_WIDE data set is similar to the sgptData_LONG set but excludes the student demographic/category variables.

For both of these data sets there are a series of lower level SGP functions that can be run and higher level function wrappers for these functions. For the simplest one off analyses, WIDE formatted data may be suitable but for operational year after year analysis it is generally recommended that users use LONG formatted data which offers numerous preparation and storage benefits.

When interpreting SGPs, it is important to remember that the percentile rankings are calculated each year so differences between years need to be viewed with caution. In general, differences of fewer than 10 points are not considered significant.

As the popularity of Singapore Pools games continues to grow across Southeast Asia, it is likely that more people will explore statistical techniques and mathematical models in an attempt to improve their chances of winning. Although no amount of analysis can guarantee a win, the fun of examining trends and making predictions is sure to continue for discerning lottery enthusiasts. Just be sure to play responsibly, and never bet more than you can afford to lose!