Data SGP – Getting Started With SGP Data

Data SGP – Getting Started With SGP Data

The data sgp package contains classes, functions and data used to calculate student growth percentiles and percentile growth projections/trajectories using large scale longitudinal education assessment data. It utilizes quantile regression to estimate the conditional density associated with each student’s assessment score history and then derives coefficient matrices that are utilized to produce projections/trajectories for each student.

The student growth percentile (SGP) is a statistic that indicates how much a student’s performance improved one year to the next compared to “academic peers.” Academic peers are other students in Washington State in the same grade and assessment subject who have statistically similar scores to the individual being assessed, e.g., they have followed a similar assessment score path. SGPs are reported as a percentage and are meant to be considered along with scaled scores and achievement levels when interpreting a student’s academic skills.

SGPs allow us to fairly compare students based solely on their score histories without regard for demographics or program participation. The calculations for SGPs are complex, but the interpretation is straightforward: a student’s SGP will indicate if they grew more than, equal to or less than their academic peers.

In order to conduct SGP analyses, a computer running the free software environment R is required. The software is available for Windows, Mac OSX and Linux and can be downloaded from the CRAN repository. The bulk of the time spent conducting SGP analyses is associated with data preparation. Once the data is prepared correctly, most SGP analyses follow a two step process.

Getting Started with SGP Data

The SGPdata package provides 4 examplar data sets that can be utilized for student growth percentile (SGP) analysis. The first of these, sgpData, specifies data in the WIDE format that’s used with the lower level SGP functions studentGrowthPercentiles and studentGrowthProjections. The other two data sets, sgpData_LONG and sgptData_LONG, specify data in the LONG format that’s used by higher level SGP functions like abcSGP, prepareSGP, and analyzeSGP. The last of these, sgpData_INSTRUCTOR_NUMBER, is a teacher-student lookup table that’s utilized to produce teacher level aggregates.

The SGP functions in the data sgp package utilize Gaussian Process Regression models that are highly flexible and can be applied to any set of educational outcomes. However, these models also have a computational complexity that limits their use on larger datasets. Fortunately, efficient approximation methods have recently emerged that leverage sparse and variational inference to reduce the model’s computational costs, while still providing excellent modeling accuracy. These methods are used by the function prepareSGP in the data sgp package to provide users with SGPs that require less memory than conventional GPR models.