Introduction
Motor skill development during early childhood plays a foundational role in children’s physical, cognitive, and socio-emotional growth, directly influencing their academic achievement, executive functioning, social participation, and lifelong physical activity behaviours [1, 2]. Research over the last five years has increasingly emphasised that well-developed fundamental motor skills (FMS) – such as running, hopping, throwing, and catching – are critical for school readiness and long-term health outcomes [3–5]. Despite their importance, studies show that many children enter school with underdeveloped motor competence, especially in socially vulnerable populations [6, 7], which reinforces the need for early, structured, and accessible motor interventions.
Fundamental motor skills are commonly classified into locomotor and object control skills, and they serve as the building blocks for more complex movements involved in physical education, recreational activities, and sports [8]. The effectiveness of motor skill acquisition depends not only on children’s opportunities for practice but also on the quality of instructional strategies employed by educators [9]. Traditional approaches – often based on verbal explanation and physical demonstration – may fall short when addressing individual differences in attention, comprehension, and engagement, particularly in early grade classrooms. In this context, digital learning tools have gained attention as complementary resources that enhance the instructional process. These tools refer to interactive technologies such as mobile applications or platforms that integrate animations, video modelling, and feedback systems to support learning and skill acquisition [10].
Recent studies have explored the use of digital interventions to improve motor development in children, showing promising outcomes in terms of both performance and motivation [11, 12]. Apps with animated movement models can help children visualise correct motor patterns more clearly, reduce cognitive load, and improve their attentional focus [13]. Moreover, such tools can enhance engagement and support differentiated instruction, especially when teaching in large or mixed-ability groups [14]. One notable example of a motor skill assessment adapted into digital format is the Test of Gross Motor Development – Third Edition (TGMD-3). The TGMD-3 is a widely validated tool used internationally to assess fundamental motor skills in children aged 3 to 10 years [15]. Recent adaptations have integrated TGMD-3 protocols into app-assisted instructional formats, using animations to demonstrate task execution while guiding evaluators and educators in a standardised way [4].
In addition to supporting children’s motor learning, digital tools also benefit teachers, therapists, and coaches by streamlining lesson planning, enhancing instructional consistency, and offering real-time feedback [16]. These tools not only contribute to the learning environment but also play a significant role in assessment. Digital applications have the potential to reduce variability in evaluator judgement and maintain children’s attention during testing procedures, which is often challenging in field-based educational settings [17].
Although there is growing evidence on the instructional benefits of app-assisted tools, few studies have examined their influence on the accuracy and consistency of motor skill assessments, particularly regarding inter-rater reliability. Variability in scoring – whether due to evaluator inexperience or inconsistencies in observation – can compromise both the validity of assessment outcomes and the quality of pedagogical decisions. Therefore, it is essential to investigate whether digital tools not only enhance motor performance but also improve the reliability of evaluators’ scoring in applied contexts.
Building on this gap, the present study aims to examine the effects of app-assisted instruction on both the development and evaluation of fundamental motor skills in early grade children. Given the rapid advancement of educational technologies, it is essential to determine whether digital tools effectively enhance children’s motor performance and improve the consistency of assessments conducted by different raters. Specifically, this study investigates whether app-supported instruction improves children’s performance and increases inter-rater reliability compared to traditional instructional approaches. By addressing these dual outcomes – skill acquisition and assessment consistency – this research contributes to the ongoing effort to integrate effective, equitable, and evidence-based digital strategies into early childhood education. We hypothesised that app-assisted instruction would lead to improved motor performance and greater inter-rater reliability in the evaluation of motor skills.
Material and methods
Study design
This study employed a quasi-experimental, within-subject crossover design. Although not randomised, the within-subject structure ensured that each participant served as their own control, reducing inter-individual variability. Participants completed two intervention protocols (traditional and app-assisted) in counterbalanced order, separated by a 30-day interval to minimise learning or fatigue effects.
Sample characteristics
Participants were 62 children aged 6–9 years (mean age = 8.03; SD ± 1.38), enrolled in a public school in Campo Grande/MS, Brazil. The sample included 32 boys (mean age = 7.89; SD ± 1.33) and 30 girls (mean age = 8.09; SD ± 1.41).
Inclusion criteria were: (a) enrolment in regular schooling; (b) no diagnosed motor or cognitive impairments; (c) no previous exposure to structured motor skill intervention programs; and (d) signed informed consent and assent forms. Exclusion criteria included current or prior participation in other motor training interventions or diagnosed neurodevelopmental disorders that could affect motor performance. Participants were selected using a convenience sampling method, based on their availability and institutional authorisation from the school. The sample size was calculated using G*Power 3.1.9.7, assuming a medium effect size (f = 0.25), α = 0.05, and power = 0.80, yielding a minimum of 52 participants. To account for potential attrition, 62 participants were recruited.
Participants’ demographic information is presented in Table 1.
Instruments
Test of Gross Motor Development – third edition (TGMD-3)
The Test of Gross Motor Development – third edition (TGMD-3) [15] validated for Brazilian children [18], was used to assess motor skill proficiency. The TGMD-3 evaluates locomotor skills (i.e., running, galloping, hopping, skipping, horizontal jump, and sliding) and ball skills (i.e., striking with two hands, striking with one hand, dribbling, catching, kicking, and overhand and underhand throwing). Validity evidence for Brazilian children showed adequate psychometric properties, including content validity by experts (content validity index [CVI] = 0.75 to 1.00; k = 0.77 to 0.97) and professionals (99% agreement), internal consistency (locomotor skills (LOCS) α = 0.63; ball skills (BS) α = 0.76), inter- and intra-rater reliability (ICC = 0.60–0.90), test-retest reliability (locomotor: r = 0.60–0.82; ball skills: r = 0.71–0.86), and fit indices for the two-factor model (RMSEA = 0.07, 90% CI = 0.06–0.08; CFI = 0.90; NFI = 0.87; TLI = 0.94; GFI = 0.94; AGFI = 0.91).
Motor Skills Sequential Pictures (MSSP)
The Motor Skills Sequential Pictures (MSSP), which was validated in [19], is a set of sequential illustrations depicting each TGMD-3 skill, including locomotor skills (running, galloping, hopping, skipping, jumping, sliding) and ball skills (striking with one and two hands, dribbling, catching, kicking, overhand and underhand throwing), along with their performance criteria (3–5 per skill) [15]. These illustrations assist in the instruction of the motor skills included in TGMD-3.
Procedures
After receiving authorisation from the school principal and collecting the required signed consent forms from the parents/guardians and children, data collection was initiated. The assessments were conducted in a court adjacent to the school, and the children were familiar with the evaluation setting. In the first stage, the traditional protocol was used. All participants completed the TGMD-3 assessment following Ulrich’s [15] standardised procedures. Initially, the evaluator demonstrated the skill to be performed, after which each child made three attempts: one for familiarisation and two valid attempts for scoring. Thirty days later, the second stage of the study was conducted. All children who had completed the traditional protocol were then assessed using the app-assisted protocol. In this stage, the evaluators used a smartphone to show animated demonstrations of the skills (via the MSSP app), shown twice before the child performed the skill. Again, the children completed three trials – one familiarisation and two valid performance attempts. All TGMD-3 assessments were conducted by trained researchers who were experienced with the instrument, and supervised by two additional professionals with expertise in the area. A master’s-level researcher in motor behaviour, blind to the study purpose, scored all assessments based on video recordings, strictly following the TGMD-3 scoring criteria. Inter-rater reliability was checked on 100% of the assessments by an external doctoral-level researcher in motor behaviour, resulting in 86.59% agreement.
Statistical analysis
Descriptive statistics (mean and standard deviation) were used to summarise the locomotor and ball skill performance according to instructional modality (traditional and app-assisted) and sex (male and female). To assess the inter-rater agreement, Krippendorff’s alpha coefficient was calculated. This robust index is appropriate for both categorical and continuous data, providing a comprehensive measure of consistency among evaluators. Interpretation of alpha values followed standard classification: 0.00–0.20 = poor; 0.210.40 = fair; 0.41–0.60 = moderate; 0.61–0.80 = substantial; and 0.81–1.00 = almost perfect.
To examine whether the instructional modality significantly affected motor performance, paired-samples t-tests were conducted for each TGMD-3 skill, comparing scores between traditional and app-assisted instruction. Assumptions of normality were evaluated using the Shapiro–Wilk test and Q–Q plot inspection, with no significant violations observed (p > 0.05). Effect sizes were calculated using Cohen’s d, with thresholds of 0.2 (small), 0.5 (medium), and 0.8 (large). All statistical analyses were performed using IBM SPSS Statistics version 27, with the significance level set at α < 0.05.
Results
Paired samples t-tests were conducted to compare performance on locomotor skills between the app-assisted and traditional instruction methods. Descriptive statistics and effect sizes are presented in Table 2. For locomotor skills, no significant differences were found in running, galloping, hopping, skipping, or lateral run (α > 0.05), although small effect sizes were observed in favour of the app-assisted method (e.g., d = 0.205 for the lateral run). However, a significant difference was found for the horizontal jump, t(61) = – 3.04, p = 0.003, with participants performing better under traditional instruction (d = –0.386).
Regarding ball skills (Table 3), most comparisons showed no significant differences (α > 0.05). The only statistically significant difference was found in catching performance, p = 0.004, favouring the traditional instruction method (d = –0.463). Effect sizes across other tasks were negligible to small, both positive and negative, suggesting minimal practical differences between methods. These findings suggest that, while both instructional methods yield comparable outcomes for most skills, specific tasks such as the horizontal jump and catching may benefit from traditional instruction approaches.
As presented in Table 4, the inter-rater reliability for the locomotor skill items, assessed using Krippendorff’s alpha (α), showed generally modest levels of agreement among raters across tasks, both in the traditional instruction (Trad) and app-assisted instruction (App) conditions. The alpha coefficients ranged from very low to fair, with none of the skills reaching the commonly recommended thresholds for observational assessments.
Horizontal jump exhibited the lowest reliability (α = 0.031), with a relatively narrow confidence interval (95% CI: 0.233 to 0.297), indicating limited agreement among evaluators despite the estimate’s precision. This suggests that this specific task may involve some challenges in achieving consistent scoring, potentially due to subjective interpretation of performance components, regardless of the instruction method.
Table 2
Descriptive statistics and effect sizes for locomotor skills under traditional and app-assisted instruction
Table 3
Descriptive statistics and effect sizes for ball skills under traditional and app-assisted instruction
Table 4
Inter-rater reliability for locomotor skills under traditional and app-assisted instruction
Table 5
Inter-rater reliability for ball skills under traditional and app-assisted instruction
Galloping also showed low reliability (α = 0.068; 95% CI: 0.201 to 0.279), suggesting some difficulty among evaluators in consistently applying the criteria across both the Trad and App conditions. Similarly, hopping (α = 0.155; 95% CI: –0.147 to 0.407) and skipping (α = 0.179; 95% CI: 0.076 to 0.400) demonstrated modest levels of agreement. The wider confidence interval for hopping, in particular, points to greater uncertainty in the estimate, possibly reflecting variability in how the raters interpreted the movement patterns depending on the instructional context.
Running yielded an alpha value of 0.216 (95% CI: 0.081 to 0.482), which, although still below optimal reliability standards, represents a somewhat more stable and precise estimate. This may indicate slightly better consensus among the evaluators, perhaps facilitated by the clarity of task execution in both instruction modes.
Among all locomotor tasks, the lateral run presented the highest inter-rater reliability (α = 0.345; 95% CI: 0.050 to 0.570), reaching the lower boundary of what is considered fair agreement. Although modest, this result suggests the evaluators were relatively more consistent in judging this skill, which may be related to its more easily observable criteria or a more structured task format in both the Trad and App approaches.
Taken together, these findings highlight variation in scoring consistency depending on the locomotor skill and indicate that regardless of the instruction method, the agreement among the raters was generally modest. These results point to opportunities for enhancing rater training, refining scoring criteria, or integrating supports such as visual models or digital tools to improve standardisation and objectivity in observational motor skill assessments.
As presented in Table 5, the inter-rater reliability for the ball skill items, assessed by Krippendorff’s alpha (α), ranged from poor to fair, with variability in consistency depending on the task. Despite the generally similar mean performance between the traditional instruction (Trad) and app-assisted instruction (App), the agreement among the evaluators remained limited for most skills.
The strike with two hands showed low reliability (α = 0.074; 95% CI: – 0.192 to 0.282), indicating considerable variation in how the evaluators interpreted and scored the performance, regardless of the instructional method. A similar trend was observed for the one-hand strike, which had the lowest reliability among all ball skills (α = 0.040; 95% CI: – 0.216 to 0.300), suggesting challenges in the consistent application of the scoring criteria.
Dribbling demonstrated the highest inter-rater reliability within this skill domain (α = 0.281; 95% CI: 0.043 to 0.470), indicating a comparatively greater level of evaluator agreement. Although still below the recommended thresholds, the narrower and positive confidence interval may reflect a more structured and easily observable task.
Catching also presented a modest reliability estimate (α = 0.193; 95% CI: 0.092 to 0.456), while kicking showed low agreement among the raters (α = 0.094; 95% CI: 0.177 to 0.356), with scores reflecting minimal variation across the instructional conditions.
The overhand throw yielded poor but slightly improved agreement (α = 0.201; 95% CI: 0.095 to 0.488), and the underhand throw had one of the highest reliability estimates among the ball skills (α = 0.284; 95% CI: 0.016 to 0.500), suggesting relatively greater consistency in evaluator scoring for this task.
Taken together, these results suggest that while app-assisted instruction did not substantially enhance the inter-rater reliability across all ball skills, tasks such as dribbling and underhand throw exhibited more stable agreement. This may point to the potential benefits of incorporating standardised visual cues or structured demonstrations to support scoring consistency in observational motor assessments.
Discussion
The findings of this study suggest that app-assisted instruction can serve as a complementary approach to traditional motor assessment methods. It proved useful in standardising instructions and potentially increasing children’s motivation, but it should not replace traditional approaches. Although the average scores between the two methods were similar, the reliability ranged from poor to moderate, indicating that the way the instructions were delivered may have affected the performance outcomes. These results align with previous research on both the challenges and benefits of using technology in the motor performance assessment of children.
The literature indicates that digital tools, such as interactive apps, can facilitate motor learning by offering dynamic and structured visual instructions, which may reduce children’s cognitive load and improve task focus [20, 21]. Nevertheless, the reliability of scores obtained through app-assisted instruction has yet to reach ideal levels of agreement across all motor skills, suggesting that children may interpret tasks differently depending on how the instructions are presented. Some authors argue that adapting the Test of Gross Motor Development – third edition (TGMD-3) into a digital format improves the structure of the assessment but does not eliminate the need for evaluator mediation to ensure accurate scoring [19]. More recently, studies [7] suggest that hybrid approaches – combining digital apps with in-person instruction – may enhance motor learning and optimise skill transfer to real-life contexts.
In terms of locomotor skills, the highest reliability was found for lateral running (α = 0.345, fair), while skills like horizontal jumping (α = 0.031, poor) and galloping (α = 0.068, poor) showed the least consistency in scoring. These findings indicate that certain skills may be more susceptible to variations in how performance criteria are interpreted based on app-based instructions, reinforcing the need for evaluator training to maintain standardisation in test application [22].
Regarding ball skills, moderate reliability was found in underhand throwing (α = 0.284) and bouncing (α = 0.281), suggesting that app-assisted instruction may help standardise the performance on these tasks by offering a clearer visual model. However, striking with one hand (α = 0.040, poor) and striking with two hands (α = 0.074, poor) showed low reliability, indicating that these movements may rely more heavily on direct, inperson feedback and individual adjustment – an interpretation supported by [23].
Beyond reliability, another relevant factor is the impact of technology-assisted instruction on children’s motivation. Studies have shown that animations and visual feedback can increase engagement and concentration during motor tasks [20–24]. However, it should be emphasised that technology should be integrated as a complement to traditional methods, as interaction with the evaluator is still crucial for refining motor skills and correcting movement patterns [25].
The results of this study support the use of app-assisted instruction as a valuable tool for motor assessment, particularly for improving instructional consistency and structure. However, the variability in reliability across different skills suggests that the app should be viewed as an additional resource rather than a substitute for traditional methods. Future research should investigate strategies to improve score agreement in technology-based assessments, including refining how instructions are presented and enhancing evaluator training. It is also important to explore the effects of this approach across different age groups and educational contexts. In this regard, [26] conducted a comparative analysis of motor assessments supported by smartphone applications and traditional observational methods. Their findings showed that although the app offered a faster and more standardised evaluation process, there were no significant differences in diagnostic accuracy between the two methods. They concluded that technology, when used as a complement to traditional assessments, may be particularly effective in contexts that demand precision and standardisation in motor variable measurement. One study demonstrated that interactive games and visual feedback apps can also enhance children’s motivation – especially in motor intervention programs [27]. These technologies make the assessment process more playful and engaging, potentially resulting in improved motor performance. However, effective assessment still depends on a careful balance between technological tools and human observation, as digital methods may not fully capture the complexity of human movement [27].
The findings suggest app-assisted instruction supports instructional clarity but does not eliminate evaluator variability. This has implications for motor learning frameworks, suggesting the potential utility of hybrid methods in PE curricula. The study’s generalis-ability is limited by its single-site design, modest sample, and lack of longitudinal tracking. Further research should investigate diverse contexts and test-retest reliability. Future directions include adaptive apps with real-time feedback and expanded evaluation protocols tailored for diverse populations.
Conclusions
The findings of this study suggest that app-assisted instruction can serve as a valuable complement to traditional methods of motor skill assessment. This technological approach proved effective in standardising instructions, structuring the assessment process, and potentially increasing children’s motivation and engagement. Nonetheless, the mean performance scores between the two instructional modalities were largely similar, and the inter-rater reliability ranged from poor to moderate. These results indicate that the mode of instruction delivery may influence how children perform specific motor tasks.
Rather than replacing traditional methods, app-assisted instruction should be viewed as a supportive tool that enhances instructional clarity and may contribute to greater consistency in motor skill assessment. Its integration into educational practice can facilitate more structured evaluations and support pedagogical efforts in motor learning. App-assisted instruction enhances structure and engagement, but evaluator oversight remains essential.
Future research should focus on refining digital instruction strategies to improve scoring reliability, including enhanced evaluator training and optimised delivery of visual and verbal cues. It is also important to assess the effectiveness of this approach across different age groups, developmental levels, and educational settings. In addition, long-term studies should explore the sustained impact of app-assisted instruction on motor development and examine its integration with adaptive technologies, such as artificial intelligence, to enhance personalisation and scoring accuracy.
