DATA SCIENCE AND APPLICATIONS

Author: Adam Updated: 24/05/2019

1. Data Science Strategy

We are living in a connected world, but why we are feeling so? By 2020, for every human being, 1.7 MB of data will be generated every second. Today we are going to talk about infrastructure data and building data.

Siemens Smart infrastructure connects electricity with buildings anf range of industries through ingenious innovation to create environments that care.

Why we are even care about data science at Siemens SI? One reason is that the data scientists harvest the cost saving potential while maintaining uptime. They also act timely when buildikng needs you. Another reason is that operations status automated according to you. But data scientists also feel comfortable and safe because of data.

The cloud is producting work behind the magic. The first level of cloud is field devices, with sensors, meters, swithches and others. The upward level is management level which is building management system. The top layer is a connective gateway (BACnet, ModBus, SNMP).

Now that we have the data from the cloud, we could do analytics. The most basic step is the descriptive analytics which use information based on traditional BI. The next step is the diagnostic analytics. The third is to do predictive analytics and the last is to operate prescriptive analytics.

Think about the case of self-adaptive buildings. From the early on, we once live in the cave. But that we have the traditional building which has sub-systems(i.e. energy systemn) working in slios, and this has no data science work. Then is the automated building which is fully intergrated and connected to IoT devices, this is just like the disgnostic step of the data analytics. Cyber-physical bulilding is probably the kind of building we live now and it is a smart building combining the digital and physical world. This is also a starting point for AI and could be similar to the predictive step. But what we perhaps to live in the future is that the living building. This kind of building acts like an organism which anticipants needs. This one is more like the prescriptive step.

In order to achieve this, we need not only a lot of data, but also a lot of data scientist work.

Let us now look back to a darta sciencer strategy. The first one is to frame the problem by asking questions and turn fuzzy requests into clear problerms to be solved match problem with business strategy/opportunities. The second one is to collect the data in the right format And at the right resolution by intergrating the internal data and the exeternal data. The third step is to process the data, by checking data quality and cleaning the data. The forth step is to explore the data by playing with the data identify patterns and exteact features. The fifth is to perform in-depth analysis, by creating a model. The last step is to communicate results by visualizign findings and validating how gthe soluton solves the problem with a good story.

2. Data Quality

The data quality is the ability of a set of data to contribute to an intended purpose which means, that less or low-quality data is not helpful or not in the degree one might need to do the thing with it that you wish to do. We need data quality to grasp the quality of a fgiven data. Data quality measures and metrics are providing an effective framework for monitoring performance with respect to data qwuality control, management and evaluation.

The measures could be devided into five pillars which are frequency, completeness, accuracy, integrity, and availability.

Frequency: timing of process cycle. If the progress cycles are too extensive, lead to missed information.
Completeness: data values having a fall entry in each record. Large amount of missing information lead to misssing necessary knowledge about the data.
Accuracy: content of data. Lack of appropriate content, leads to the analysis of the wrong values which distort the results.
- missing accuracy: must be measured using simplify oe other confirmation techniques.
Integrity: content has to match rules and structure. Need of transformation into a useful structure.
Availability: delivery/receipt of data. Need of receival of data.

Accauracy is the count of accurate values divided by the sum of the count of accurate and inaccurate values mutiplies 100%. Frequency is the change in time which is the difference of amount between t+1 and t. Integrity is to use valid data devided by invalidate data. Completeness is the amount of stored data devided by the amount of 100% completed data.

3. APP: Comfy

The word Comfy is an APP to provide the control and choice employees expect. It has an insights dashboard to preview the results workplace team need. Expertise and service to implememnt, engage, and support workplace teams and employees.

4. APP: Filthy Filter

The objective of filthy filter is to detect the AHU air filter dirty level. By forecasting future dirty level and filter replacement time, filthy filter avoid fixed time filter replacement, and only replace the filters when it is necessary. It reduces the aggregated costs through the labor costs, filter hardward costs and the enargy costs.

The problem is that it is difficult ro build clean dirty curves using robust least squares fitting. It is also difficult to minimizr the aggregated costs and estimate filter replacement time based on air quality.

In order to deal with the problem, the solution is data cleaning by following the steps. After dealing with the data clearning, we conduct the hierarchical least square regression.

start from the raw data
generate one day filter curve which is noisy
calculate dirty dilter index (Df) without data cleaning
1st smoothing: Df regression when removes the wild points
2nd smoothing: using domain knowledge by eliminating fan ramp up and down data, and by identifying sensor offline cases.
dectect failing edges only, using Sobel filter. The peak are filter replacement dates
Find the date: auto threshold decision

Sobel filter or sobel edge detection is usually used in image processing, which could also be applied to show the dirty filter replacement date. Air quality is used as an input data in order to estimate frequency of replacement.

The conclusion is to minimize the cost function sumarize the labor cost of changing a filter and the extra energy consumption cost without changing a filter.

5. RaVi (Rapid Visualization Tool) / E-Conan

E-Conan is a special invastigator of potential customer data.

6. SuperSet

Superset is a open source business intelligence tool. It uses modern, enterprise-ready, business intelligence web application to visualize a rich set of data.

It could easy data-set upload (csv or exl) and it has an easy-to-use interface for exploring data. It also highly-granularity permission model.