Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eCapture consuming too much memory when deal bigfile in single long connection #718

Open
chilli13 opened this issue Jan 10, 2025 · 6 comments · May be fixed by #731
Open

eCapture consuming too much memory when deal bigfile in single long connection #718

chilli13 opened this issue Jan 10, 2025 · 6 comments · May be fixed by #731
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed improve

Comments

@chilli13
Copy link
Contributor

Important

@dosu AI robot

Describe the bug
ecapture tls -m text cost memory over 1G when long connection access to large files

To Reproduce
Steps to reproduce the behavior:

  1. run ecapture tls on https server
  2. client access a big file (1G or more) with long connectionn, ensure the connection 5-tuple remains unchanged during access
  3. watch ecapture memory consume

some ecapture output, Length field is 241283072

{"time":"2025-01-10T13:37:24+08:00","message":"UUID:3118798_3118798_nginx_32_1_192.168.10.122:44818-192.168.10.41:443, Name:HTT
PResponse, Type:3, Length:241283072\nHTTP/1.1 200 OK\r\nContent-Length: 965141578\r\nAccept-Ranges: bytes\r\nConnection: keep-a
live\r\nContent-Type: text/plain\r\nDate: Fri, 10 Jan 2025 05:36:04 GMT\r\nEtag: \"67809e48-3986e44a\"\r\nLast-Modified: Fri, 1
0 Jan 2025 04:12:56 GMT\r\nServer: nginx/1.21.5\r\n\r\n.......

Linux Server/Android (please complete the following information):

  • Device: Linux Server x86 vm openeuler 20.03
  • Kernel Info: `Linux abc 4.19.90-2412.1.0.0306.oe2003sp4.x86_64
  • eCapture Version: v0.9.2

Additional context
code
when ew.incoming reach ew.tickerCount then always keep ew.writeEvent(e), ew.payload.Write(e.Payload()) makes e.Payload() too long to apply too much memory. In some cases, it may even trigger an OOM
Is it possible to add some configuration mechanism to limit the number of times a single uuid work executes ew.writeEvent(e)?

func (ew *eventWorker) Run() {
	for {
		select {
		case <-ew.ticker.C:
			// 输出包
			if ew.tickerCount > MaxTickerCount {
				//ew.processor.GetLogger().Printf("eventWorker TickerCount > %d, event closed.", MaxTickerCount)
				ew.processor.delWorkerByUUID(ew)

				for {
					select {
					case e := <-ew.incoming:
						ew.writeEvent(e)
					default:
						if ew.IfUsed() {
							time.Sleep(10 * time.Millisecond)
							continue
						}
						ew.Close()
						return
					}
				}
			}
			ew.tickerCount++
		case e := <-ew.incoming:
			// reset tickerCount
			ew.tickerCount = 0
			ew.writeEvent(e)
		}
	}

}


func (ew *eventWorker) writeEvent(e event.IEventStruct) {
	if ew.status != ProcessStateInit {
		_ = ew.writeToChan("write events failed, unknow eventWorker status")
		return
	}
	ew.payload.Write(e.Payload())
}
Copy link

dosubot bot commented Jan 10, 2025

Hey there, @chilli13! I'm here to help you with any bugs, questions, or contributions you have. Let's tackle this issue together!

To address the issue of excessive memory usage in eCapture when dealing with large files over a long connection, you can consider implementing a mechanism to limit the size of the payload buffer in the eventWorker. Here are a few potential approaches:

  1. Limit Payload Size: Introduce a maximum payload size for the ew.payload buffer. Before writing to the buffer, check if adding the new payload would exceed this limit. If it does, you can either discard the data, log a warning, or handle it according to your application's needs.

  2. Chunked Processing: Instead of writing the entire payload to the buffer at once, process the data in smaller chunks. This can help manage memory usage by not holding the entire payload in memory simultaneously.

  3. Configuration Option: Add a configuration option to set a maximum buffer size or a maximum number of events that can be processed for a single connection. This allows users to adjust the settings based on their environment and requirements.

  4. Garbage Collection and Buffer Reset: Regularly reset the buffer using ew.payload.Reset() after processing a certain amount of data to free up memory. Ensure that the data is processed or stored elsewhere before resetting.

Implementing one or more of these strategies can help mitigate the memory consumption issue you're experiencing.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@dosubot dosubot bot added the 🐞 bug Something isn't working label Jan 10, 2025
@cfc4n
Copy link
Member

cfc4n commented Jan 18, 2025

Is it possible to add some configuration mechanism to limit the number of times a single uuid work executes ew.writeEvent(e)?

It seems that this is a customized requirement rather than a general requirement. I suggest you hack it yourself.

@chilli13
Copy link
Contributor Author

Unlimited writing to the payload may trigger OOM, I suggest set an output threshold to limit the length of a single message. A long response message may be split into multiple outputs, but it can avoid applying a large amount of memory for a single operation.

case <-ew.ticker.C:
			// 输出包
			if ew.tickerCount > MaxTickerCount || writeCount >= 128 {  ==> The threshold 128 can be adjusted based on experience
				//ew.processor.GetLogger().Printf("eventWorker TickerCount > %d, event closed.", MaxTickerCount)
				ew.processor.delWorkerByUUID(ew)

case e := <-ew.incoming:
			// reset tickerCount
			ew.tickerCount = 0
			writeCount++  ==》 set a writeCount to limit the length of a single message
			ew.writeEvent(e)

@yuweizzz
Copy link
Contributor

I think we can handle this part when we read from http payload?

@cfc4n cfc4n added enhancement New feature or request help wanted Extra attention is needed improve and removed 🐞 bug Something isn't working labels Feb 6, 2025
@cfc4n
Copy link
Member

cfc4n commented Feb 6, 2025

I think we can handle this part when we read from http payload?

Can you provide an optimization solution?

@yuweizzz
Copy link
Contributor

yuweizzz commented Feb 7, 2025

I will try it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed improve
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants