More

roskilli · on Jan 7, 2025

If you don’t mind me asking: which popular LLM(s) have you been using for this and how are you providing the code base into the context window?

fragmede · on Jan 7, 2025

Not OP but Aider provides a repo map to the LLM as context, which consists of the directory tree, filenames, and important symbols in each file. It can use the popular LLMs as well as Ollama.

https://aider.chat/docs/repomap.html

Aider hosts a leaderboard that rates LLMs on performance, including a section on refactoring.

https://aider.chat/docs/leaderboards/refactor.html

Zambyte · on Jan 7, 2025

AI generated images can be good, and even reasonable to use for branding. Slapping an image right at the top of the page that says "Abstract Synxex Tree" with a meaningless graph and an absolutely expressionless and useless humanoid robot is a great way to immediately lose my interest in anything they have to say though. The homepage would be more interesting as a wall of text.

klibertp · on Jan 8, 2025

Agreed, mostly, but this is not a homepage. On the homepage, there's a video demo and a wall of text (https://aider.chat/). Still, that Synxex Tree should disappear :)

roskilli · on Nov 8, 2024

Not sure but I believe it’s possible you may have read the parent comment unintentionally in the inverse? I might be wrong but I believe you posited their desired focus is to manage people from parent comment, but I actually think it’s opposite. They don’t want to manage people (especially career wise), instead they want to manage and guide the work of teams and people across one or many teams.

My interpretation of it was to pursue the type of work and things you focus on as an Engineering Manager in terms of getting the most of out teams for the goals of the organization but doing so without the need to manage people directly. Which I would agree is nice since it’s really hard to wear the technical hat and also directly manage people, so separation of concerns makes for far less context switching and folks to naturally align doing the part of the job that they do best. I also agree with this definition since it’s how I think of it too.

opello · on Nov 8, 2024

I took the post you're replying to as to have interpreted the "doesn't want to be a manager or director" idea as that there are times others or a team may be foist upon them without such a people management role being the majority of the person's time during, say, a year.

I'd also add "team lead" to the list of possible titles that indicate a primary focus on people management vs. individual contribution.

scottlamb · on Nov 9, 2024

> I took the post you're replying to as to have interpreted the "doesn't want to be a manager or director" idea as that there are times others or a team may be foist upon them without such a people management role being the majority of the person's time during, say, a year.

Yes, this. There may come a point in your career where your old manager is advancing and they need a new manager for your team. The position is offered to you, maybe repeatedly. It's clear that either you take it or you and your teammates will soon have a new manager who is fresh to the team or company. Maybe you've seen the latter go badly before. Maybe your outgoing manager expresses their own anxiety about that possibility. You finally get the hint and become a manager.

Now you can change from the engineering ladder to the management ladder, but you don't have to. You can be an Engineering Manager, commit all your time to this, and hope to advance to Engineering Director. Or you can stay a Staff (Software) Engineer and also manage a few people. With some help from say a good PM, you can be a good manager to a small team without giving up the technical aspects. (I assert part-time managers are actually better for small, high-performing teams; no idle time for micro management.)

> I'd also add "team lead" to the list of possible titles that indicate a primary focus on people management vs. individual contribution.

Maybe. At some companies (e.g. Google), a team's tech lead and manager can be two different people. If so, the tech lead doesn't have reports according to HR. At promo time, they don't do the manager reviews, although they likely put a fair bit of time into writing peer reviews and participate in the promo and calibration committees, so they're not entirely without what many smaller and/or more traditional companies would consider manager responsibilities.

opello · on Nov 9, 2024

> a team's tech lead and manager can be two different people

Yes, I've worked in an environment like that. I had a "functional manager" that took care of the HR side of things and a "team lead" that kind of led an effort for a specific project. They also interacted with each other about how things were going. I also agree that for small, high-performing teams it can be a very nice arrangement for everyone.

But my point was that the "team lead" role ends up requiring some amount of people management while not taking over as the primary focus for the person with that role, "no idle time for micro management" as you said. I may have been lazy with "a primary focus" as a construction because while I don't think it's >50% of time for a team lead in the structure I worked in or imagined in this discussion, I do think it can be around 20% for that person, depending on how they feel about people management, and the dynamics that exist between those involved, and the size of the team being led.

roskilli · on Nov 16, 2023

> Moreover, we encountered some rough edges in the metrics-related functionality of the Go SDK referenced above. Ultimately, we had to write a conversion layer on top of the OTel metrics API that allowed for simple, Prometheus-like counters, gauges, and histograms.

Have encountered this a lot from teams attempting to use the metrics SDK.

Are you open to comment on specifics here and also what kind of shim you had to put in front of the SDK? It would be great to continue to retrieve feedback so that we can as a community have a good idea of what remains before it's possible to use the SDK for real world production use cases in anger. Just wiring up the setup in your app used to be fairly painful but that has gotten somewhat better over the last 12-24 months, I'd love to also hear what is currently causing compatibility issues w/ the metric types themselves using the SDK which requires a shim and what the shim is doing to achieve compatibility.

bhyolken · on Nov 16, 2023

Sure, happy to provide more specifics!

Our main issue was the lack of a synchronous gauge. The officially supported asynchronous API of registering a callback function to report a gauge metric is very different from how we were doing things before, and would have required lots of refactoring of our code. Instead, we wrote a wrapper that exposes a synchronous-like API: https://gist.github.com/yolken-airplane/027867b753840f7d15d6....

It seems like this is a common feature request across many of the SDKs, and it's in the process of being fixed in some of them (https://github.com/open-telemetry/opentelemetry-specificatio...)? I'm not sure what the plans are for the golang SDK specifically.

Another, more minor issue, is the lack of support for "constant" attributes that are applied to all observations of a metric. We use these to identify the app, among other use cases, so we added wrappers around the various "Add", "Record", "Observe", etc. calls that automatically add these. (It's totally possible that this is supported and I missed it, in which case please let me know.)

Overall, the SDK was generally well-written and well-documented, we just needed some extra work to make the interfaces more similar to the ones we were using before.

roskilli · on Nov 17, 2023

Thanks for the detailed response.

I am surprised there is no gauge update API yet (instead of callback only), this is a common use case and I don't think folks should be expected to implement their own. Especially since it will lead to potentially allocation heavy bespoke implementations, depending on use case given mutex+callback+other structures that likely need to be heap allocated (vs a simple int64 wrapper with atomic update/load APIs).

Also I would just say that the fact the APIs differ a lot to more common popular Prometheus client libraries does beg the question of do we need more complicated APIs that folks have a harder time using. Now is the time to modernize these before everyone is instrumented with some generation of a client library that would need to change/evolve. The whole idea of an OTel SDK is instrument once and then avoid needing to re-instrument again when making changes to your observability pipeline and where it's pointed. This becomes a hard sell if OTel SDK needs to shift fairly significantly to support more popular & common use cases with more typical APIs and by doing so leaves a whole bunch of OTel instrumented code that needs to be modernized to a different looking API.

arccy · on Nov 16, 2023

the official SDKs will only support an api once there's a spec that allows it.

for const attributes, generally these should be defined at the resource / provider level: https://pkg.go.dev/go.opentelemetry.io/otel/sdk/metric#WithR...

roskilli · on July 10, 2022

One feature I’d love to see is a transformer that instead of providing a random value provides a cryptographic one way hash of the data (ie sha2) - that way key uniqueness stays the same (to avoid unique constraints on columns) and also the same value used in one place will match another value in another table after transformation which more accurately reflects the “shape” of the data.

pistoriusp · on July 10, 2022

We do this via Copycat (https://github.com/snaplet/copycat). We generate static "fake values" by hashing your original value to a number, and map that to a fake-value.

MadsRC · on July 10, 2022

This will not work, at least not if we’re talking PII as it is defined by a Somewhat Sane (TM) privacy legislation.

Sure, passwords and credit card info is obscured with your methodology, but names, dates of birth, sexual orientation, telephone numbers, email and ip will remain unique. This uniqueness is what allows you to potentially identify a person given enough data.

tyingq · on July 10, 2022

>Sure, passwords and credit card info is obscured with your methodology

Even that's problematic, because there may be code that depends on the data being somewhat "real". Credit cards, for example, may need to pass LUHN tests, or have valid BIN sections, etc.

MadsRC · on July 10, 2022

I suppose that what you’d have to do is change the data and then hash it. But once you’ve changed the data it’s no longer PII, so there’s no reason to hash it.

Of course, given enough data that has been changed can potentially allow you to deduce how that data was changed and thus revert it, at which point it would become PII again and you’d have a problem… but that’s probably a fringe scenario

BobbyJo · on July 11, 2022

I hate to be so self promoting (I swear I'm just trying to be helpful), but Gretel has that as a transformer you can use[0]. You can test out a lot of our stuff without payment info through our console[1] if you just want to mess around and see if tools like it ( and Replibyte of course :) ) would fit your use case. That being said, you can run into issues using direct transforms like this, depending on the correlated data, because of various known deanonymization attacks. There are some pretty gnarly examples out there if you Google around.

[0]https://docs.gretel.ai/gretel.ai/transforms/transforms-model...

[1]https://console.gretel.cloud/login

cratermoon · on July 11, 2022

What you're asking for is similar to what goes by the term "tokenization"[1], a technique often used by payment processors to avoid leaking credit card numbers and similar sensitive data. Using the proper transformer might provide the behavior you need.

1 https://www.tokenex.com/resource-center/what-is-tokenization

ev0xmusic · on July 10, 2022

Hi, author of Replibyte here. Feel free to open an issue and explain what is your use case. I will be happy to consider a solution with the community.

roskilli · on July 4, 2021

Right exactly. As a point of reference, within M3DB each unique time series has a list of “in-order” compressed timestamp/float64 tuple streams. When a datapoint is written the series finds an encoder that it can append while keeping the stream in order (timestamp ascending), and if no such stream exists a new stream is created and becomes writeable for any datapoints that arrive with time stamps greater than the last written point.

At query time these streams are read by evaluating the next timestamp of all written streams for a block of time and then taking the datapoint with the lowest timestamp of the streams.

M3DB also runs a background tick that targets to complete within a few minutes each run to amortize CPU. During this process each series merges any streams that have sibling streams created due to out of order writes, producing one single in order stream. This is done by the same process used at query time to read the datapoints in order and they are consequently written to a new single compressed stream. This way extra computation due to our of order writes is amortized and only if a large percentage of series are written in time descending order do you end up with a significant overhead at write and read time. It also reduces the cost of persisting the current mutable data to a volume on disk (whether for snapshot or for persisting data for a completed time time window).

roskilli · on Jan 5, 2021

Hey jaren hope things are well at Robinhood. Good question, there's a diagram on what a default deployment looks like alongside the M3 v1.0 announcement https://medium.com/chronosphere/m3-v1-0-released-a-productio... and in depth documentation on the website https://m3db.io/docs/overview/.

Storage, aggregation and compute are all separate and scale up/down independently. The coordinator and query services are both stateless and you just add more instances, DB nodes do not do compression/decompression for instance all this happens as part of computation on the query service.

M3DB for storage has a k8s operator that can manage clusters (expansion, etc), and the M3 aggregator can be deployed as a stateful set in k8s and also can be independently expanded.

roskilli · on Sept 10, 2020

Modules that rely on global state for anything other than memory pooling or what have you should be avoided. It’s a lot more testable and clean to return a high level data structure that contains any of that state you would have held before globally in the module and have that be the “context” or just the parent data structure to any others that are spawned.

Global state makes thing impossible like parallel unit tests that all use another module, or changing things like “MaxConcurrency” in a way that is synchronized across goroutines that might already be calling into the third party module.

My 2c.

barrkel · on Sept 10, 2020

Of course. I'm not advocating global state. It's just that most real systems have things like file systems, config files, databases, listening network sockets, etc. These are inherently non-local.

roskilli · on Aug 12, 2020

M3 is meant to be an open source, central, horizontally scalable metrics store - but your mileage may vary. Either way, check it out: https://m3db.io

roskilli · on Aug 12, 2020

And M3DB too if you want to cluster and scale out, vs cloud store

roskilli · on July 28, 2020

Curious: What is your strategy on replication? Is it some form of synchronous replication or asynchronous (i.e. active/passive with potential for data loss in event of hard loss of primary)? Also curious why you might look at UDP replication given unless using a protocol like QUIC on top of it, UDP replication would be inherently lossy (i.e. not even eventually consistent).

bluestreak · on July 28, 2020

The strategy is to multicast data to several nodes simultaneously. Data packets are sequence to allow receiver identify data loss. When loss is detected receiver finds breathing space to send a NACK. The packet and the nack would identify missing data chunk with O(1) complexity and sender then re-sends. Overall this method is lossless and avoids overhead of contacting nodes individually and sending same data over the network multiple times. This is useful in scenarios where several nodes participate in query execution and getting them up to date quickly is important.

judofyr · on July 29, 2020

This reminds me a bit of Aeron (https://github.com/real-logic/aeron) which is a reliable UDP uni/multicast transport library with built-in flow control. It's written in Java and seems to have superb performance (I haven't used it myself). Might be an interesting alternative if you don't want to write it all yourself.