Td
T.C.M. de Koning
info
Please Note
<p>This page displays the records of the person named above and is not linked to a unique person identifier. This record may need to be merged to a profile.</p>
2 records found
1
The gap between CPU and memory performance becomes increasingly larger. Together with a growing memory pressure caused by higher CPU core counts combined with multi-tenant systems, this causes the need for new memory technologies. Recently, various technologies are becoming available for commercial use. Examples of these technologies are memory types like non-volatile RAM. These technologies generally have different characteristics than traditional DRAM. To be able to fully utilise the potential of these new memory types, a better understanding of memory usage in modern systems is required. A way to gain a better understanding is through memory traces.
Solutions that are currently available either do not support multi-core architectures or cause a severe slowdown. Therefore, this thesis presents a novel approach to gather full-system after-cache memory access traces. The proposed system is a hybrid framework which consists of the QEMU emulator combined with a custom distributed cache and page table simulator. A modified version of QEMU, called QMEMU, is devised to improve tracing performance and allow tracing instruction fetches.
By leveraging the existing tracing functionality of QEMU only a small amount of modifications have to be made to QEMU. The traces produced by QMEMU contain virtual addresses. However, for accurate cache simulation, the physical addresses have to be used. Tracing the physical address instead of the virtual address for each memory access is shown to cause a 70% slowdown when using QMEMU.
To find these physical addresses for the traced accesses, a novel approach is employed. This approach simulates the guest page tables outside the critical path for memory tracing and therefore does not decrease performance. Using QMEMU traces can be gathered with a speedup of up to 42.6 times over the gem5 simulator for benchmarks of the PARSEC suite.
In the second part of the framework, which performs memory, cache, and page table simulation, cache simulation is found the most computationally intensive task. Therefore in the proposed framework cache simulation is performed in a parallel and distributed manner. Most modern systems use set-associative caches, simulation can be parallelised without reducing accuracy by dividing the memory access traces based on these cache sets. Using this approach 10 Million accesses can be processed per second by the simulator when simulating a single modern cache hierarchy. When simulating 7 different cache hierarchies concurrently a throughput of 6 Million accesses per second is reached. The simulated guest page tables provide additional information like the number of accesses or virtual memory size for each process of the guest workload. This information can be used to decrease the size of the semantic gap between memory traces and their meaning. The proposed framework is evaluated by comparing it to CMP$im and gem5 using the PARSEC benchmark suite.
...
Solutions that are currently available either do not support multi-core architectures or cause a severe slowdown. Therefore, this thesis presents a novel approach to gather full-system after-cache memory access traces. The proposed system is a hybrid framework which consists of the QEMU emulator combined with a custom distributed cache and page table simulator. A modified version of QEMU, called QMEMU, is devised to improve tracing performance and allow tracing instruction fetches.
By leveraging the existing tracing functionality of QEMU only a small amount of modifications have to be made to QEMU. The traces produced by QMEMU contain virtual addresses. However, for accurate cache simulation, the physical addresses have to be used. Tracing the physical address instead of the virtual address for each memory access is shown to cause a 70% slowdown when using QMEMU.
To find these physical addresses for the traced accesses, a novel approach is employed. This approach simulates the guest page tables outside the critical path for memory tracing and therefore does not decrease performance. Using QMEMU traces can be gathered with a speedup of up to 42.6 times over the gem5 simulator for benchmarks of the PARSEC suite.
In the second part of the framework, which performs memory, cache, and page table simulation, cache simulation is found the most computationally intensive task. Therefore in the proposed framework cache simulation is performed in a parallel and distributed manner. Most modern systems use set-associative caches, simulation can be parallelised without reducing accuracy by dividing the memory access traces based on these cache sets. Using this approach 10 Million accesses can be processed per second by the simulator when simulating a single modern cache hierarchy. When simulating 7 different cache hierarchies concurrently a throughput of 6 Million accesses per second is reached. The simulated guest page tables provide additional information like the number of accesses or virtual memory size for each process of the guest workload. This information can be used to decrease the size of the semantic gap between memory traces and their meaning. The proposed framework is evaluated by comparing it to CMP$im and gem5 using the PARSEC benchmark suite.
...
The gap between CPU and memory performance becomes increasingly larger. Together with a growing memory pressure caused by higher CPU core counts combined with multi-tenant systems, this causes the need for new memory technologies. Recently, various technologies are becoming available for commercial use. Examples of these technologies are memory types like non-volatile RAM. These technologies generally have different characteristics than traditional DRAM. To be able to fully utilise the potential of these new memory types, a better understanding of memory usage in modern systems is required. A way to gain a better understanding is through memory traces.
Solutions that are currently available either do not support multi-core architectures or cause a severe slowdown. Therefore, this thesis presents a novel approach to gather full-system after-cache memory access traces. The proposed system is a hybrid framework which consists of the QEMU emulator combined with a custom distributed cache and page table simulator. A modified version of QEMU, called QMEMU, is devised to improve tracing performance and allow tracing instruction fetches.
By leveraging the existing tracing functionality of QEMU only a small amount of modifications have to be made to QEMU. The traces produced by QMEMU contain virtual addresses. However, for accurate cache simulation, the physical addresses have to be used. Tracing the physical address instead of the virtual address for each memory access is shown to cause a 70% slowdown when using QMEMU.
To find these physical addresses for the traced accesses, a novel approach is employed. This approach simulates the guest page tables outside the critical path for memory tracing and therefore does not decrease performance. Using QMEMU traces can be gathered with a speedup of up to 42.6 times over the gem5 simulator for benchmarks of the PARSEC suite.
In the second part of the framework, which performs memory, cache, and page table simulation, cache simulation is found the most computationally intensive task. Therefore in the proposed framework cache simulation is performed in a parallel and distributed manner. Most modern systems use set-associative caches, simulation can be parallelised without reducing accuracy by dividing the memory access traces based on these cache sets. Using this approach 10 Million accesses can be processed per second by the simulator when simulating a single modern cache hierarchy. When simulating 7 different cache hierarchies concurrently a throughput of 6 Million accesses per second is reached. The simulated guest page tables provide additional information like the number of accesses or virtual memory size for each process of the guest workload. This information can be used to decrease the size of the semantic gap between memory traces and their meaning. The proposed framework is evaluated by comparing it to CMP$im and gem5 using the PARSEC benchmark suite.
Solutions that are currently available either do not support multi-core architectures or cause a severe slowdown. Therefore, this thesis presents a novel approach to gather full-system after-cache memory access traces. The proposed system is a hybrid framework which consists of the QEMU emulator combined with a custom distributed cache and page table simulator. A modified version of QEMU, called QMEMU, is devised to improve tracing performance and allow tracing instruction fetches.
By leveraging the existing tracing functionality of QEMU only a small amount of modifications have to be made to QEMU. The traces produced by QMEMU contain virtual addresses. However, for accurate cache simulation, the physical addresses have to be used. Tracing the physical address instead of the virtual address for each memory access is shown to cause a 70% slowdown when using QMEMU.
To find these physical addresses for the traced accesses, a novel approach is employed. This approach simulates the guest page tables outside the critical path for memory tracing and therefore does not decrease performance. Using QMEMU traces can be gathered with a speedup of up to 42.6 times over the gem5 simulator for benchmarks of the PARSEC suite.
In the second part of the framework, which performs memory, cache, and page table simulation, cache simulation is found the most computationally intensive task. Therefore in the proposed framework cache simulation is performed in a parallel and distributed manner. Most modern systems use set-associative caches, simulation can be parallelised without reducing accuracy by dividing the memory access traces based on these cache sets. Using this approach 10 Million accesses can be processed per second by the simulator when simulating a single modern cache hierarchy. When simulating 7 different cache hierarchies concurrently a throughput of 6 Million accesses per second is reached. The simulated guest page tables provide additional information like the number of accesses or virtual memory size for each process of the guest workload. This information can be used to decrease the size of the semantic gap between memory traces and their meaning. The proposed framework is evaluated by comparing it to CMP$im and gem5 using the PARSEC benchmark suite.
GPS Location Tracker
Collecting data for sports visualisation
Bachelor thesis
(2017)
-
Bryan van Wijk, Dorian de Koning, Jochem Lugtenburg, Marco Zuñiga Zamalloa, Ronald Steen, Huijuan Wang
A start-up creates videos which users can watch to experience their running or cycling activity all over again. Currently, the company depends on external data sources to generate a video. To be less dependent on these sources the company wants to create their own tracking solution. This solution has to fit in their existing smartphone application available for iOS and Android. The company wants to remain flexible, therefore the tracking application has to be developed in such a way that it can also be used in other products the company might develop in the future. As a goal, the data has to result in visually pleasing videos for a large user base.
Based on an experimental app developed during the research phase, raw smartphone GPS data was found to be unsuitable for video rendering. To improve this data, a Kalman Filter is used, in combination with a smoothing algorithm. The system has been designed to allow code sharing between iOS and Android where possible. The system has been implemented in Objective-C, Java, and TypeScript. Separating the system in three blocks enables code reuse which improves maintainability of the system. The filter has been integrated as shared code in the TypeScript implementation, which allows filtering to happen on the device. The user of the React Native Module developed has freedom to retrieve the unprocessed and processed data.
The system has been tested by means of unit tests in all three programming languages used. Tests have been executed using a continuous integration server, testing each pull request against the current code base to ensure quality. Part of the testing phase includes the React Native Module to be implemented in the client's smartphone application to demonstrate its use. The application has been sent to a number of test participants to collect data from different routes and activities. The project can be seen as a success since all important requirements have been successfully implemented.
...
Based on an experimental app developed during the research phase, raw smartphone GPS data was found to be unsuitable for video rendering. To improve this data, a Kalman Filter is used, in combination with a smoothing algorithm. The system has been designed to allow code sharing between iOS and Android where possible. The system has been implemented in Objective-C, Java, and TypeScript. Separating the system in three blocks enables code reuse which improves maintainability of the system. The filter has been integrated as shared code in the TypeScript implementation, which allows filtering to happen on the device. The user of the React Native Module developed has freedom to retrieve the unprocessed and processed data.
The system has been tested by means of unit tests in all three programming languages used. Tests have been executed using a continuous integration server, testing each pull request against the current code base to ensure quality. Part of the testing phase includes the React Native Module to be implemented in the client's smartphone application to demonstrate its use. The application has been sent to a number of test participants to collect data from different routes and activities. The project can be seen as a success since all important requirements have been successfully implemented.
...
A start-up creates videos which users can watch to experience their running or cycling activity all over again. Currently, the company depends on external data sources to generate a video. To be less dependent on these sources the company wants to create their own tracking solution. This solution has to fit in their existing smartphone application available for iOS and Android. The company wants to remain flexible, therefore the tracking application has to be developed in such a way that it can also be used in other products the company might develop in the future. As a goal, the data has to result in visually pleasing videos for a large user base.
Based on an experimental app developed during the research phase, raw smartphone GPS data was found to be unsuitable for video rendering. To improve this data, a Kalman Filter is used, in combination with a smoothing algorithm. The system has been designed to allow code sharing between iOS and Android where possible. The system has been implemented in Objective-C, Java, and TypeScript. Separating the system in three blocks enables code reuse which improves maintainability of the system. The filter has been integrated as shared code in the TypeScript implementation, which allows filtering to happen on the device. The user of the React Native Module developed has freedom to retrieve the unprocessed and processed data.
The system has been tested by means of unit tests in all three programming languages used. Tests have been executed using a continuous integration server, testing each pull request against the current code base to ensure quality. Part of the testing phase includes the React Native Module to be implemented in the client's smartphone application to demonstrate its use. The application has been sent to a number of test participants to collect data from different routes and activities. The project can be seen as a success since all important requirements have been successfully implemented.
Based on an experimental app developed during the research phase, raw smartphone GPS data was found to be unsuitable for video rendering. To improve this data, a Kalman Filter is used, in combination with a smoothing algorithm. The system has been designed to allow code sharing between iOS and Android where possible. The system has been implemented in Objective-C, Java, and TypeScript. Separating the system in three blocks enables code reuse which improves maintainability of the system. The filter has been integrated as shared code in the TypeScript implementation, which allows filtering to happen on the device. The user of the React Native Module developed has freedom to retrieve the unprocessed and processed data.
The system has been tested by means of unit tests in all three programming languages used. Tests have been executed using a continuous integration server, testing each pull request against the current code base to ensure quality. Part of the testing phase includes the React Native Module to be implemented in the client's smartphone application to demonstrate its use. The application has been sent to a number of test participants to collect data from different routes and activities. The project can be seen as a success since all important requirements have been successfully implemented.