1b.app
Link copied -

Minute cron stopped working

Here https://baza.cn.ua/admin/shop/statistic/
The minute kroner worked out all the time in 1 minute, and then it froze at 2021-01-25 18:22:13
It's been over 25 minutes and it's not working
1. Please solve the problem
2. Explain why it happened
Original question is available on version: ru

Answers:

This happens when external services do not break the connection, but the data stops giving back.
Here is a process (cron) connected to an external server and the connection is in the "Established" state ESTABLISHED
php 19222 bazacnua 9u IPv4 166670705 0t0 TCP none:47926->ec2-52-214-51-78.eu-west-1.compute.amazonaws.com:https (ESTABLISHED)
But no data is being transferred.
clock_gettime(CLOCK_MONOTONIC, {tv_sec=5450225, tv_nsec=785639629}) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=5450225, tv_nsec=785759689}) = 0
poll([{fd=9, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 0 (Timeout)
clock_gettime(CLOCK_MONOTONIC, {tv_sec=5450225, tv_nsec=786256316}) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=5450225, tv_nsec=786320364}) = 0
poll([{fd=9, events=POLLIN}], 1, 1000) = 0 (Timeout)
clock_gettime(CLOCK_MONOTONIC, {tv_sec=5450226, tv_nsec=787662606}) = 0
poll([{fd=9, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 0 (Timeout)
clock_gettime(CLOCK_MONOTONIC, {tv_sec=5450226, tv_nsec=788096833}) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=5450226, tv_nsec=788179420}) = 0
poll([{fd=9, events=POLLIN}], 1, 1000) = 0 (Timeout)
clock_gettime(CLOCK_MONOTONIC, {tv_sec=5450227, tv_nsec=789450028}) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=5450227, tv_nsec=789555074}) = 0
poll([{fd=9, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 0 (Timeout)
26.01.2021, 01:29
Original comment available on version: ru

Куприян Владислав Валерьевич
Baza.cn.ua / Integrator (FOP Kupriyan)
Ok, we actually found the reason, now it remains to figure it out:
1. Who is to blame for this? in the actions on the minute crown, I assume that this is "rozetka_auto_action_import_orders" since exactly at the time when 19:22 the order came from the outlet and everything "stopped")?
2. How can you make sure that such external things do not break the work of minute actions?
26.01.2021, 10:59
Original comment available on version: ru

1. All we know is external integration. Everything that is configured in the system is configured by you.
In which DCs and with what help companies build their IT infrastructure, we don’t know and we don’t know why.
We are in charge of our product.
2. If you do not use integration with external services, then it will not break for these reasons.
But the bottom line is that we are talking about work (the order came from Rosette) with an external resource.
Working with an external resource, we work with a software product that is created and supported
people and there are also possible as technical problems (heavy load on the resource, servers, networks ....),
and the human factor (errors that appear when certain factors coincide).
Therefore, we do not have a ready-made solution and I doubt that anyone has it.
26.01.2021, 15:14
Original comment available on version: ru

Куприян Владислав Валерьевич
Baza.cn.ua / Integrator (FOP Kupriyan)

Tasun Sergey Vladimirovich
Employee wrote:
1. All we know is external integration. Everything that is configured in the system is configured by you.
In which DCs and with what help companies build their IT infrastructure, we don’t know and we don’t know why.
We are in charge of our product.

You somehow generalize the problem very strongly, there is a reluctance to understand it, I understand your pain, but please give more specifics so that I understand what external resource the problem is and who to contact so that such problems do not arise in the future, you can help ?

Tasun Sergey Vladimirovich
Employee wrote:
2. If you do not use integration with external services, then it will not break for these reasons.
But the bottom line is that we are talking about work (the order came from Rosette) with an external resource.
Working with an external resource, we work with a software product that is created and supported
people and there are also possible as technical problems (heavy load on the resource, servers, networks ....),
and the human factor (errors that appear when certain factors coincide).
Therefore, we do not have a ready-made solution and I doubt that anyone has it.

You have a strange position "my hut is on the edge"
Plus, I don’t understand the logic of your system, it seemed to me that every “creator” of the code should take care that his code continues to work with all exceptions and options, and in your case it turns out that someone sent a “clumsy” xml or json response and your combine has stopped
Or did I misunderstand the situation, if so, please rephrase?
26.01.2021, 18:10
Original comment available on version: ru


Kupriyan Vladislav Valerievich
Baza.cn.ua / Integrator (FOP Kupriyan) wrote:
You somehow generalize the problem very strongly, there is a reluctance to understand it, I understand your pain

I'm not generalizing problems, companies always hide IT infrastructure in order to reduce the amount of
not pleasant moments for yourself.
You asked me to explain why this happened, I showed you
what happened and why it stopped working.
But you were dissatisfied and are already demanding that you be given the "culprit", because, I quote
"otherwise this "eu-west-1.compute.amazonaws.com" does not give me any information, I want to figure it out..."
You want to figure it out, but for some reason you are asking us to figure it out, a strange approach, don’t you think?
This resource is api.privatbank.ua
And about pain - Nothing hurts us, and if you feel something, then this is your pain.

Kupriyan Vladislav Valerievich
Baza.cn.ua / Integrator (FOP Kupriyan) wrote:
You have a strange position "my hut is on the edge"
Or did I misunderstand the situation, if so, please rephrase?

You understood exactly as you understood.
Our position is such that in our system there are a large number of integrations with external resources that
work according to the documentation provided by these resources.
And keep track of how correctly, at the moment, the external resource API works
and whether this resource is available at all is not in the area of responsibility of our company.
With official changes in the API of an external resource, we make changes in our product.
27.01.2021, 01:15
Original comment available on version: ru

Куприян Владислав Валерьевич
Baza.cn.ua / Integrator (FOP Kupriyan)

Tasun Sergey Vladimirovich
Employee wrote:
This resource is api.privatbank.ua

Thanks

Tasun Sergey Vladimirovich
Employee wrote:
You understood exactly as you understood.
Our position is such that in our system there are a large number of integrations with external resources that
work according to the documentation provided by these resources.
And keep track of how correctly, at the moment, the external resource API works
and whether this resource is available at all is not in the area of responsibility of our company.
With official changes in the API of an external resource, we make changes in our product.

Ok, maybe you misunderstood me, I'll try to abstract and give an example
Imagine that your OneBox product is a family doctor who has a plan to see patients.
Here he must accept a certain number of patients per minute, hour, day
And so it turns out that some non-standard patient (api.privatbank.ua) came to the doctor and began to tell him and demand things that are not entirely related to the specifics of the doctor as a result, the doctor "went crazy" and he left his workplace for a couple of hours .
As a result, all patients could not solve their questions (problems)
So it seems to me that the position of the doctor in this example is not very professional, since he should not drop out of the work process due to a non-standard patient, he should simply kick him out or send him to hell and continue to work on fulfilling his plan.
If you understood me correctly, then I would like to hear your opinion and close the issue
27.01.2021, 16:12
Original comment available on version: ru

Let me explain with your example.
This patient (api.privatbank.ua) comes all the time and everything is fine with him, an ordinary patient.
But today he decided to tell the doctor his new problem, the family doctor should listen to him,
to understand that today this patient has become non-standard and redirect the patient to another specialist.
This is where those couple of hours are lost. (in life, by the way, this is exactly how everything happens ... as an example, going to the dentist)
And now to our case.
The logic of work is as follows: we knock on a third-party service, pass registration data to it,
so that the service knows who came and why, the service receives data, analyzes and gives an answer,
after which the connection (door) closes.
But in our case, the connection (door) service opened, accepted all the data, and left for an answer .... the connection (door) did not close.
Here we stand and wait in the hope that we will be given what we asked for.
You may ask "Why are we waiting so long", to this I can answer that in the documentation for a third-party service (API),
nothing is said that such a situation is possible in principle.
And if there is no answer within such a time, then there is no need to wait.
That's actually all.
27.01.2021, 16:46
Original comment available on version: ru

Куприян Владислав Валерьевич
Baza.cn.ua / Integrator (FOP Kupriyan)

Tasun Sergey Vladimirovich
Employee wrote:
Let me explain with your example.
This patient (api.privatbank.ua) comes all the time and everything is fine with him, an ordinary patient.
But today he decided to tell the doctor his new problem, the family doctor should listen to him,
to understand that today this patient has become non-standard and redirect the patient to another specialist.
This is where those couple of hours are lost. (in life, by the way, this is exactly how everything happens ... as an example, going to the dentist)
And now to our case.
The logic of work is as follows: we knock on a third-party service, pass registration data to it,
so that the service knows who came and why, the service receives data, analyzes and gives an answer,
after which the connection (door) closes.
But in our case, the connection (door) service opened, accepted all the data, and left for an answer .... the connection (door) did not close.
Here we stand and wait in the hope that we will be given what we asked for.
You may ask "Why are we waiting so long", to this I can answer that in the documentation for a third-party service (API),
nothing is said that such a situation is possible in principle.
And if there is no answer within such a time, then there is no need to wait.
That's actually all.

Thanks for the detailed and clear answer!
As for me, a long wait is a mistake (there must be some kind of timeout), but judging by your answer, you are acting loyally, while this loyalty harms other processes, but this is a matter of logic
27.01.2021, 20:59
Original comment available on version: ru

Please join the conversation. If you have something to say - please write a comment. You will need a mobile phone and an SMS code for identification to enter. Log in and comment