Addressing management issues in data science
Data science problems and systems naturally involve management issues, which are related to the management of data science input (data, resources, models and methods), data science projects and processes, data science output (the resultant models and methods; identified knowledge, insights and intelligence; and deliverable data products).
Data science input management involves data management, resource management, requirement management, and objective management.
Data management and resource management is to collect and manage relevant data, information, resources, devices, equipment, infrastructure, and computing facilities that may be required to conduct a data science project.
Requirement management is to acquire and manage functional and nonfunctional requirements related to data science problem-solving and projects.
Objective management is concerned with objectives, goals and milestones to achieve the objectives in a data science project.
Data science projects and processes involve project management, data team management, model management, process management, communication management, risk management, and quality management.
Project management is concerned with the scope of the project, goal definition, milestone definition and check, workload assignment, costs, budgets, and related issues, as well as project planning, progress reviewing, timetabling, and so on.
Team management involves the definition, selection, and management of data science roles and responsibilities in a data science project, collaboration workload assignment, work patterns, team management, hierarchical leadership, reporting, scheduling, performance, and reflection.
Model management is concerned with the models, methods, algorithms, systems, and programming that directly manipulate data to achieve the defined business objectives.
Process management deals with the definition of processes, goals and activities for each step, the personnel required to undertake each activity, connection and transfer between procedures, and the data, information, documentation, and models associated with each step.
Communication management handles the documentation of the project, process, roles, input and output, and drives the communication between roles, activities, and stakeholders.
Risk management handles the possibility and severity of risk, including the effects associated with each step of the process and the impact on personnel, the data selected, the modeling method, the communication design, and the evaluation mechanisms. Risk may be evaluated in terms of a specific aspect of the project or the overall risk from the perspective of resources, technical or economic component, timing, market value, and more.
Quality management defines existing or potential quality issues and develops measures for quantifying quality issues.
Data science output management consists of knowledge management, product management, testing management, and possible deployment management.
Knowledge management administers the resultant intellectual property of the delivered models, algorithms, codes and systems; the identified findings including patterns, rules, exceptions, and other analytical results; the lessons and insights gained in a data project concerning optimal operation, management, and decision-making; the evaluation and reflection on the scope and objectives, team, process, communication, risk control, and quality assurance of a project.
Product management refers to the management of data products or broad data related deliverables. This may involve product definition and specification, as well as issues related to lifecycle, quality, end users, usability, market value, and market segment.
Testing management of data science output oversees the various testing methods on different granularities, test specifications, test result analysis, testing-based adjustment, refinement and optimization.
Deployment management is to manage things related to data output deployment. This may involve the definition and specification of the output or data product, scheduling, stakeholder relationship, modes of execution, and problem reporting and management on deployment.
 Note: Excerpted from "L. Cao. Understanding Data Science, Springer, 2018"