Big Data at SGP

Big Data at SGP

The term big data has become a buzzword that describes datasets too large for traditional data management applications. While the SGP research data is large by any measure, it is far from being a full community database like Genbank or EarthChem. Rather, it is a collection of carefully managed data that has been consolidated for the specific research questions of the Working Groups. This means that it is not as straightforward to access and analyze as a traditional database and requires a more specialized approach.

SGP has a data management system that provides an integrated suite of functions for working with longitudinal student assessment data. This system, dubbed SGPdata, allows researchers to run a wide variety of analyses using WIDE or LONG formatted data. The WIDE data format has each case/row representing a unique student and columns representing variables that are associated with the student at different times. The higher level wrapper functions (studentGrowthPercentiles and studentGrowthProjections) require the use of LONG formatted data. The SGPdata package, installed when one installs the SGP packages, provides exemplar data sets in both formats to assist with determining which format to use for your particular analysis.

If you are interested in gaining access to the full set of data that is available for each Working Group, you can contact the individual Working Groups to request access. In addition, SGP is working to create a community database that will make the data widely accessible. While the goal is to eventually have all of the data in this database, it is important to remember that the SGP project has a very limited scope and resources at this time.

Currently, the SGP data is only available to those who have been approved as data custodians by each Working Group. These individuals have been trained in how to manage and analyze the data and will be able to answer questions about the content of the data. The SGP data custodians also have access to technical support to help them with any problems they may encounter while working with the SGP data.

SGP has developed a series of tutorial videos that will give an overview of the SGP data management system and demonstrate how to perform various analyses. These videos are available on our YouTube channel. SGP has also published a document describing the data structures and naming conventions for the SGPdata software package. These documents can be found on the SGPdata website. If you have any further questions, please contact the SGPdata team at sgpdata@umd.edu. We look forward to helping you with your questions!