M. Valle Torre | TU Delft Repository

JELAI: Integrating AI and Learning Analytics in Jupyter Notebooks

Conference paper (2025) - M. Valle Torre, T. van der Velden, M.M. Specht, Catharine Oertel

Generative AI offers potential for educational support, but often lacks pedagogical grounding and awareness of the student’s learning context. Furthermore, researching student interactions with these tools within authentic learning environments remains challenging. To address this, we present JELAI, an open-source platform architecture designed to integrate fine-grained Learning Analytics (LA) with Large Language Model (LLM)-based tutoring directly within a Jupyter Notebook environment. JELAI employs a modular, containerized design featuring JupyterLab extensions for telemetry and chat, alongside a central middleware handling LA processing and context-aware LLM prompt enrichment. This architecture enables the capture of integrated code interaction and chat data, facilitating real-time, context-sensitive AI scaffolding and research into student behaviour. We describe the system’s design, implementation, and demonstrate its feasibility through system performance benchmarks and two proof-of-concept use cases illustrating its capabilities for logging multi-modal data, analysing help-seeking patterns, and supporting A/B testing of AI configurations. JELAI’s primary contribution is its technical framework, providing a flexible tool for researchers and educators to develop, deploy, and study LA-informed AI tutoring within the widely used Jupyter ecosystem. ...

The Sequence Matters in Learning - A Systematic Literature Review

Conference paper (2024) - M. Valle Torre, Catharine Oertel, M.M. Specht

Describing and analysing learner behaviour using sequential data and analysis is becoming more and more popular in Learning Analytics. Nevertheless, we found a variety of definitions of learning sequences, as well as choices regarding data aggregation and the methods implemented for analysis. Furthermore, sequences are used to study different educational settings and serve as a base for various interventions. In this literature review, the authors aim to generate an overview of these aspects to describe the current state of using sequence analysis in educational support and learning analytics. The 74 included articles were selected based on the criteria that they conduct empirical research on an educational environment using sequences of learning actions as the main focus of their analysis. The results enable us to highlight different learning tasks where sequences are analysed, identify data mapping strategies for different types of sequence actions, differentiate techniques based on purpose and scope, and identify educational interventions based on the outcomes of sequence analysis. ...

An investigation on integration of computational thinking into engineering curriculum at delft university of technology

Conference paper (2022) - X. Zhang, M. Valle Torre, M.M. Specht

Our life is surrounded by digital devices. Engineering education is one of the cornerstones in higher education for future generations and computational thinking (CT) is deemed as a core component in various engineering curricula. The Delft University of Technology (TU Delft), is the largest technical university in the Netherlands and computing; computational concepts and activities have been integrated into curriculum for years at TU Delft. However, there is not a comprehensive investigation on integration of CT into Engineering Curriculum, this paper presents a case study of Master’s level engineering curricula investigating: 1) to what extend CT components are integrated; 2) in what way CT is interpreted and integrated in the curriculum; 3) what educational and assessment methods have been used. The results show that CT has been largely integrated into the investigated curriculum mostly with lectures being the educational method and programming assignments as a method for the assessment. Our analysis shows that understanding the context and patterns in problems and solutions was important in different courses and engineering disciplines, indicating possible directions for integration of CT into curriculum. ...

Using Social Network Analysis to explore Learning networks in MOOCs discussion forums

Conference paper (2022) - A. Soleymani, L.C.M. Itard, Maarten de Laat, M. Valle Torre, M.M. Specht

Learning and educational challenges in the field of indoor climate and building services like energy systems are mainly due to the transformation of professional practices and learning networks, a big shift in the way in which people work, communicate, and share their knowledge and the need for additional workforce, either juniors or coming from other disciplines. One of the most important factors that highly influence professional development and workplace learning is networked learning. Our goal in this study, is understanding the learning networks characteristics and patterns of interaction using Social Network Analysis techniques in three MOOCs discussion forums. The result of this study shows not only the importance of Learning networks and peer support on professionalization of learners, but also how pedagogical approach of instructors in MOOCs can foster learning networks. This novel approach in developing learning networks and communities is not only able to help connect young professionals and experienced practitioners digitally, but also it can promote professional development and innovation in the energy installation sector. ...

Note the Highlight: Incorporating Active Reading Tools in a Search as Learning Environment

Conference paper (2021) - N. Roy, M. Valle Torre, Ujwal Gadiraju, D.M. Maxwell, C. Hauff

Active reading strategies - -such as content annotations (through the use of highlighting and note-taking, for example) - -have been shown to yield improvements to a learner's knowledge and understanding of the topic being explored. This has been especially notable in long and complex learning endeavours. With web search engines nowadays used as the primary gateway for learners (or users) to find content that helps them realise their learning goals, they are often poorly equipped with the necessary tools to aid in sense-making, an important aspect of theSearch as Learning (SAL) process. Within theInformation Retrieval (IR) community, research efforts have explored ways to keep track of users' search context by providing a notepad-like interface for the collection of relevant articles, and aid them during the exploratory search process. However, these studies did not explicitly measure the effect that such tools have on knowledge and understanding during a complex, learning-oriented search task. In this paper, we address this research gap by carrying out an InteractiveIR experiment with highlighting and note-taking tools built into the search interface. We conducteda crowdsourced between-subjects study (N=115), where participants were assigned to one of four conditions: (i) control (a standard web search interface); (ii) high (highlighting enabled);(iii) note (note-taking enabled); and (iv) highnote (both highlighting and note-taking enabled). We assess participants' learning with a recall-oriented vocabulary learning task, and a cognitively more taxing essay writing task. We find that(i) active reading tools do not aid in the vocabulary learning task. However,(ii) participants in high covered 34% more subtopics, and participants in note covered 34% more facts in their essays when compared to control. Furthermore, (iii) we observed that incorporating active learning tools significantly changed the search behaviour of participants across a number of measures. This is the first work that sheds light on the effect of active reading tools on the SAL process, with important design implications for learning-oriented search systems. ...

Active reading strategies - -such as content annotations (through the use of highlighting and note-taking, for example) - -have been shown to yield improvements to a learner's knowledge and understanding of the topic being explored. This has been especially notable in long and complex learning endeavours. With web search engines nowadays used as the primary gateway for learners (or users) to find content that helps them realise their learning goals, they are often poorly equipped with the necessary tools to aid in sense-making, an important aspect of theSearch as Learning (SAL) process. Within theInformation Retrieval (IR) community, research efforts have explored ways to keep track of users' search context by providing a notepad-like interface for the collection of relevant articles, and aid them during the exploratory search process. However, these studies did not explicitly measure the effect that such tools have on knowledge and understanding during a complex, learning-oriented search task. In this paper, we address this research gap by carrying out an InteractiveIR experiment with highlighting and note-taking tools built into the search interface. We conducteda crowdsourced between-subjects study (N=115), where participants were assigned to one of four conditions: (i) control (a standard web search interface); (ii) high (highlighting enabled);(iii) note (note-taking enabled); and (iv) highnote (both highlighting and note-taking enabled). We assess participants' learning with a recall-oriented vocabulary learning task, and a cognitively more taxing essay writing task. We find that(i) active reading tools do not aid in the vocabulary learning task. However,(ii) participants in high covered 34% more subtopics, and participants in note covered 34% more facts in their essays when compared to control. Furthermore, (iii) we observed that incorporating active learning tools significantly changed the search behaviour of participants across a number of measures. This is the first work that sheds light on the effect of active reading tools on the SAL process, with important design implications for learning-oriented search systems.

Quantum of choice

How learners' feedback monitoring decisions, goals and self-regulated learning skills are related

Conference paper (2021) - Ioana Jivet, Jacqueline Wong, Maren Scheffel, Manuel Valle Torre, Marcus Specht, Hendrik Drachsler

Learning analytics dashboards (LADs) are designed as feedback tools for learners, but until recently, learners rarely have had a say in how LADs are designed and what information they receive through LADs. To overcome this shortcoming, we have developed a customisable LAD for Coursera MOOCs on which learners can set goals and choose indicators to monitor. Following a mixed-methods approach, we analyse 401 learners' indicator selection behaviour in order to understand the decisions they make on the LAD and whether learner goals and self-regulated learning skills influence these decisions. We found that learners overwhelmingly chose indicators about completed activities. Goals are not associated with indicator selection behaviour, while help-seeking skills predict learners' choice of monitoring their engagement in discussions and time management skills predict learners' interest in procrastination indicators. The findings have implications for our understanding of learners' use of LADs and their design. ...

How Do Active Reading Strategies Affect Learning Outcomes in Web Search?

Conference paper (2021) - N. Roy, M. Valle Torre, Ujwal Gadiraju, D.M. Maxwell, C. Hauff

Prior work in education research has shown that various active reading strategies, notably highlighting and note-taking, benefit learning outcomes. Most of these findings are based on observational studies where learners learn from a single document. In a Search as Learning (SAL) context where learners have to iteratively scan and explore a large number of documents to address their learning objective, the effect of these active reading strategies is largely unexplored. To address this research gap, we carried out a crowd-sourced user study, and explored the effects of different highlighting and note-taking strategies on learning during a complex, learning-oriented search task. Out of five hypotheses derived from the education literature we could confirm three in the SAL context. Our findings have important design implications on aiding learning through search. Learners can benefit from search interfaces equipped with active reading tools—but some learning strategies employing these tools are more effective than others. (This research has been supported by DDS (Delft Data Science) and NWO projects SearchX (639.022.722) and Aspasia (015.013.027).) ...

EdX log data analysis made easy

Introducing ELAT: An open-source, privacy-aware and browser-based edX log data analysis tool

Conference paper (2020) - Manuel Valle Torre, Esther Tan, Claudia Hauff

Massive Open Online Courses (MOOCs), delivered on platforms such as edX and Coursera, have led to a surge in large-scale learning research. MOOC platforms gather a continuous stream of learner traces, which can amount to several Gigabytes per MOOC, that learning analytics researchers use to conduct exploratory analyses as well as to evaluate deployed interventions. edX has proven to be a popular platform for such experiments, as the data each MOOC generates is easily accessible to the institution running the MOOC. One of the issues researchers face is the preprocessing, cleaning and formatting of those large-scale learner traces. It is a tedious process that requires considerable computational skills. To reduce this burden, a number of tools have been proposed and released with the aim of simplifying this process. Those tools though still have a significant setup cost, are already out-of-date or require already preprocessed data as a starting point. In contrast, in this paper we introduce ELAT, the edX Log file Analysis Tool, which is browser-based (i.e., no setup costs), keeps the data local (i.e., no server is necessary and the privacy-sensitive learner data is not send anywhere) and takes edX data dumps as input. ELAT does not only process the raw data, but also generates semantically meaningful units (learner sessions instead of just click events) that are visualized in various ways (learning paths, forum participation, video watching sequences). We report on two evaluations we conducted: (i) a technological evaluation and a (ii) user study with potential end users of ELAT. ELAT is open-source and available at https://mvallet91.github.io/ELAT/. ...

Training Data Augmentation for Detecting Adverse Drug Reactions in User-Generated Content

Conference paper (2019) - Sepideh Mesbah, Jie Yang, Robert-Jan Sips, Manuel Valle Torre, Christoph Lofi, Alessandro Bozzon, Geert-Jan Houben

Social media provides a timely yet challenging data source for adverse drug reaction (ADR) detection. Existing dictionary-based, semi-supervised learning approaches are intrinsically limited by the coverage and maintainability of laymen health vocabularies. In this paper, we introduce a data augmentation approach that leverages variational autoencoders to learn high-quality data distributions from a large unlabeled dataset, and subsequently, to automatically generate a large labeled training set from a small set of labeled samples. This allows for efficient social-media ADR detection with low training and re-training costs to adapt to the changes and emergence of informal medical laymen terms. An extensive evaluation performed on Twitter and Reddit data shows that our approach matches the performance of fully-supervised approaches while requiring only 25% of training data. ...

Perceptual relational attributes

Navigating and discovering shared perspectives from user-generated reviews

Conference paper (2019) - Manuel Valle Torre, Mengmeng Ye, Christoph Lofi

Effectively modelling and querying experience items like movies, books, or games in databases is challenging because these items are better described by their resulting user experience or perceived properties than by factual attributes. However, such information is often subjective, disputed, or unclear. Thus, social judgments like comments, reviews, discussions, or ratings have become a ubiquitous component of most Web applications dealing with such items, especially in the e-commerce domain. However, they usually do not play major role in the query process, and are typically just shown to the user. In this paper, we will discuss how to use unstructured user reviews to build a structured semantic representation of database items such that these perceptual attributes are (at least implicitly) represented and usable for navigational queries. Especially, we argue that a central challenge when extracting perceptual attributes from social judgments is respecting the subjectivity of expressed opinions. We claim that no representation consisting of only a single tuple will be sufficient. Instead, such systems should aim at discovering shared perspectives, representing dominant perceptions and opinions, and exploiting those perspectives for query processing. ...

TSE-NER

An Iterative Approach for Long-Tail Entity Extraction in Scientific Publications

Conference paper (2018) - Sepideh Mesbah, Christoph Lofi, Manuel Valle Torre, Alessandro Bozzon, Geert-Jan Houben

Named Entity Recognition and Typing (NER/NET) is a challenging task, especially with long-tail entities such as the ones found in scientific publications. These entities (e.g. “WebKB”, “StatSnowball”) are rare, often relevant only in specific knowledge domains, yet important for retrieval and exploration purposes. State-of-the-art NER approaches employ supervised machine learning models, trained on expensive typelabeled data laboriously produced by human annotators. A common workaround is the generation of labeled training data from knowledge bases; this approach is not suitable for long-tail entity types that are, by definition, scarcely represented in KBs.
This paper presents an iterative approach for training NER and NET
classifiers in scientific publications that relies on minimal human input,
namely a small seed set of instances for the targeted entity type. We
introduce different strategies for training data extraction, semantic expansion, and result entity filtering.We evaluate our approach on scientific
publications, focusing on the long-tail entities types Datasets, Methods in
computer science publications, and Proteins in biomedical publications. ...

Concept Focus

Semantic Meta-Data For Describing MOOC Content

Conference paper (2018) - Sepideh Mesbah, Guanliang Chen, Manuel Valle Torre, Alessandro Bozzon, Christoph Lofi, Geert-Jan Houben

MOOCs promised to herald a new age of open education.
However, efficient access to MOOC content is still hard, thus unneces-
sarily complicating many use cases like efficient re-use of material, or
tailored access for life-long learning scenarios. One of the reasons for this
lack of accessibility is the shortage of meaningful semantic meta-data de-
scribing MOOC content and the resulting learning experience. In this pa-
per, we explore Concept Focus, a new type of meta-data for describing a
perceptual facet of modern video-based MOOCs, capturing how focused
a learning resource is topic-wise, which is often an indicator of clarity
and understandability. We provide the theoretical foundations of Con-
cept Focus and outline a methodical workflow of how to automatically
compute it for MOOC lectures. Furthermore, we show that the learners’
consumption behavior is correlated with a MOOC lecture’s Concept Focus, thus underlining that this type of meta-data is indeed relevant for user-centric querying, personalizing or even designing the MOOC experience. For showing this, we performed an extensive study with real-life
MOOCs and 12,849 learners over the duration of three months. ...

Perceptual Perspectives for Experience Items: Representation and Query Processing

Abstract (2017) - Christoph Lofi, Manuel Valle Torre