Familiarity with mathematics, probability theory, statistics, and algorithms is expected, on the level it is typically introduced at the bachelor level in computer science or engineering programs. His research interests lie mostly within artificial intelligence, with a focus on machine learning and data mining, and in the use of AI-based modeling in other sciences. Blockeel has made a variety of contributions on topics such as inductive logic programming, probabilistic-logical learning, and decision trees.
He is an action editor for the journals Machine Learning and Data Mining and Knowledge Discovery, and a member of the editorial board of several other journals. He is a EurAI fellow since When more than one target variable has to be predicted, we talk about multi-target prediction. Predictive modeling problems may also be complex in other ways, i. The course will first give an introduction to the different tasks of multi-target prediction, such as multi-target classification and regression, hierarchical versions thereof, and versions of the tasks that involve additional complexity such as semi-supervised multi-target regression.
It will continue to present some methods, first basic and then advanced, for solving such tasks. Finally, it will review different applications of multi-target prediction, ranging from gene function prediction, through image annotation, to space exploration. Basic understanding of machine learning and data mining would be helpful, but is not strictly necessary. Appropriate references will be provided in the lecture notes slides for the course.
He leads a twenty-strong research group that investigates machine learning and data mining methods including structured output prediction and automated modeling of dynamic systems , as well as their applications in environmental sciences, incl. Two major trends in computing systems are the growth in high performance computing HPC with an international exascale initiative, and the big data phenomenon with an accompanying cloud infrastructure of well publicized dramatic and increasing size and sophistication.
This tutorial weaves these trends together using some key building blocks. Here we aim at using the major open source Big Data software environment but develop the principles allowing use of HPC software and hardware to achieve good performance. We give several examples of software for example Hadoop and Heron and algorithms implemented in this software. We give examples including clustering, topic modelling and dimension reduction and their visualization with a framework called Harp.
Whether to attract, service, or maintain customers, businesses position data mining at the cornerstone of customer relations. Eindhoven University of Technology. Easily read eBooks on smart phones, computers, or any eBook readers, including Kindle. Srivastava Co-Founded Ninja Metrics www. Lecture 3: Intermediate Concepts II: Recent advances in community discovery; Graph sparsification and sampling strategies; Deep-dive into recent advances in stochastic flow clustering of networks. Springer
This allows an understanding of what type of hardware and software is needed for what type of exhibited features; it allows a one to either unify or distinguish applications across the simulation and Big Data regimes. We show that using a broad range of applications requires a variety of capabilities that seem best packaged as a reconfigurable toolkit Twister2. Some familiarity with parallel computing algorithms and software helpful.
Some familiarity with data analytics helpful. Peng, B.
Zhang, L. Chen, M. Avram, R. Henschel, C. Stewart, S. Zhu, E. Mccallum, L.
Smith, T. Zahniser, J. Omer, J. Geoffrey Fox received a Ph. He has supervised the Ph.
The analytics focuses on scalable parallelism. He is an expert on streaming data and robot-cloud interactions. He is involved in several projects to enhance the capabilities of Minority Serving Institutions. Effective Big Data analytics need to rely on algorithms for querying and analyzing massive, continuous data streams that is, data that is seen only once and in a fixed order with limited memory and CPU-time resources.
Such streams arise naturally in emerging large-scale event monitoring applications; for instance, network-operations monitoring in large ISPs, where usage information from numerous network devices needs to be continuously collected and analyzed for interesting trends and real-time reaction to different scenarios e. In addition to memory- and time-efficiency concerns, the inherently distributed nature of such applications also raises important communication-efficiency issues, making it critical to carefully optimize the use of the underlying communication infrastructure.
click This course will provide an overview of some key algorithmic tools for supporting effective, real-time analytics over streaming data. Our primary focus will be on small-space sketch synopses for approximating continuous data streams, and their applicabilty in both centralized and distributed settings. Introduction and Motivation 2. Data Streaming Models and Mathematical Tools 3.
Conclusions and Looking Forward 7. Time-permitting Hands-on Experience with Streaming Tools. Database management systems, design and analysis of algorithms, randomized algorithms.
Haas, and Chris Jermaine. Papers: 1. Noga Alon, Phillip B. Graham Cormode, S. LATIN Phillip B. VLDB SIAM J. Graham Cormode, Minos N. Garofalakis: Approximate continuous querying over distributed streams. ACM Trans. Database Syst. Izchak Sharfman, Assaf Schuster, Daniel Keren: A geometric approach to monitoring threshold functions over distributed data streams.
Minos N. PVLDB 6 10 , PVLDB 8 5 , Research in Santa Clara, CA He has published over scientific papers in top-tier international conferences and journals in these areas. GoogleScholar gives over Garofalakis work, and an h-index value of This seminar introduces the R language via data visualization, aka computer graphics, in the context of a discussion of best practices and consideration for the analysis of big data.
Code to generate the graphs is presented in terms of R base graphics, Hadley Wickham's ggplot package, and the author's lessR package. The content of the seminar is summarized with R Markup files that include commentary and implementation of all the code presented in the seminar, available to all participants. These explanatory examples serve as templates for applications to new data sets.
Gerbing, D. Wickham, H. Springer-Verlag New York. David Gerbing, Ph.
He has authored R Data Analysis without Programming, which describes his lessR package, and many articles on statistical techniques and their application in a variety of journals that span several academic disciplines. Semantic technologies may promote new ways of managing data within an organization. In particular, the paradigm of ontology-based data management provides techniques for accessing, using, and maintaining data by means of an ontology, i.
This paradigm aims at addressing one important challenge of modern information systems, namely managing the autonomous, distributed, and heterogeneous data sources of an organization, and devising tools for deriving useful information and knowledge from them. On the other hand, many today's organization face, among others, the problem of publishing Open Data. Despite the current interest in this subject, a formal and comprehensive methodology supporting an organization in deciding which data to publish and carrying out precise procedures for publishing high-quality data, is still missing.
In the course, we first provide an introduction to ontology-based data management, then we discuss the main techniques for using an ontology to access the data layer of an information system, and finally we illustrate the basic elements of a methodology for ontology-based Open Data publishing. Introduction to ontology-based data management OBDM ; languages for OBDM; query answering in OBDM; meta-modeling and higher-order ontology languages; the problem of open data publishing; ontology-based open data publishing.