Massassi Forums Logo

This is the static archive of the Massassi Forums. The forums are closed indefinitely. Thanks for all the memories!

You can also download Super Old Archived Message Boards from when Massassi first started.

"View" counts are as of the day the forums were archived, and will no longer increase.

ForumsDiscussion Forum → Computer Science and Math and Stuff
1234567891011121314151617181920212223242526272829303132333435
Computer Science and Math and Stuff
2018-01-10, 9:18 AM #41
Originally posted by Zloc_Vergo:
I've rarely met a CS student who struck me as good at mathematics, which makes me feel all kinds of uneasy about work that they do. Is this field really just 90% jargon and knowing libraries?
Math is 90% jargon and knowing theorems.... your Cauchy sequence is my file descriptor.

Something can be complicated to reason about without involving complicated mathematics. In the case of neural networks, the hard part isn’t the neural network, it’s making the neural network fast enough to be practical. That’s what your libraries and frameworks are abstracting away for you. If you had to do that work yourself you would be churning away on it for years.
2018-01-10, 9:41 AM #42
Originally posted by Jon`C:
A really fancy Markov chain


no, no, no, it's a decision process, completely different.
I had a blog. It sucked.
2018-01-10, 9:48 AM #43
Originally posted by Jon`C:
Something can be complicated to reason about without involving complicated mathematics. In the case of neural networks, the hard part isn’t the neural network, it’s making the neural network fast enough to be practical. That’s what your libraries and frameworks are abstracting away for you. If you had to do that work yourself you would be churning away on it for years.

This is fair. My concern is that my introduction to inference was through a very intense course that doubled as an introduction to information theory, where inference was framed in terms of the geometry accompanying KL divergence. I don't have faith that most computer scientists I've met could reproduce the material from that course, despite it seeming to be critical in a lot of this research (from my external perspective).
I had a blog. It sucked.
2018-01-10, 10:17 AM #44
Originally posted by Zloc_Vergo:
This is fair. My concern is that my introduction to inference was through a very intense course that doubled as an introduction to information theory, where inference was framed in terms of the geometry accompanying KL divergence. I don't have faith that most computer scientists I've met could reproduce the material from that course, despite it seeming to be critical in a lot of this research (from my external perspective).


Most computer scientists don’t work in ML and don’t have much use for information theory. So of course most computer scientists couldn’t reproduce that work from first principles, among other things because they’d never be interested in doing it.

I think we discussed this earlier in the thread, but I’m not sure so I’ll repeat it here. CS is way too broad a field for generalizations to be useful. Like, ML, computational geometry, and formal languages are all considered CS, but none of those researchers would have much of anything to discuss with each other professionally. And those are all on the mathy side of things. Why should a systems researcher know anything about it? Or an HCI researcher? Or a software engineering researcher? When CS becomes more mature they won’t even be considered the same field.
2018-01-10, 1:51 PM #45
FWIW I started going through this class today and got through the first few lecture pages: http://cs231n.github.io/

and that's roughly the level I need right now. Not as basic as Andrew Ng's Coursera course, isn't jargon-filled but is dropping the key terms to connect with existing knowledge that I have (e.g. softmax and how it arises from cross entropy, and the different interpretations therein), notation isn't awful
I had a blog. It sucked.
2018-01-10, 3:05 PM #46
99%. The jargon goes out the window pretty quickly, too. Maybe it's different for the "big" companies. I've worked at a number of places in varying industries (content management, ecommerce, biotech, ad tech, security) and with the exception of the biotech there wasn't much science going on anywhere. I probably self-select based on the jobs I qualify for but read the "who's hiring" threads on Hacker News to get similar insight. The smartest people I've worked with are the ones that could cut through the **** (unfortunately earlier in my life I wasn't smart enough to recognize them).
2018-01-10, 3:21 PM #47
I don't know about machine learning really, except that it can be useful when analyzing data of persistent homologies in topological data science. My impressions is that its not nearly as cool as futurologists make it seem.
2018-01-10, 4:20 PM #48
Originally posted by Reid:
I don't know about machine learning really, except that it can be useful when analyzing data of persistent homologies in topological data science. My impressions is that its not nearly as cool as futurologists make it seem.


Just admit that you think this is cool because of the sheer suprise that it's actually possible to cash out on a highly abstract topic from your pure math education. :P

If I ever learned about it I'd do it because it was cool.

Otherwise I'd just use linear algebra and calculus.
2018-01-10, 4:25 PM #49
I think it’s constructive to note the difference between computer scientists and practitioners. When I talk about computer science (and when academics talk about it) the term exclusively refers to the activities of professional computer science researchers, e.g. people with doctorates working in labs. It especially doesn’t refer to people working to create software products, even if they might have a lesser degree in computer science or use knowledge or research techniques they learned in college.

I think some people ITT may have been using the term differently, so I wanted to clarify it.

When you talk about the application of any academic material to professional software development, the ratio pretty quickly drops off to nothing. (Outside of job interviews, at least, which are structured as a series of 45 minute computer science oral exams.)

That doesn’t mean the knowledge is useless. It’s incredibly useful, in fact. It rarely comes up, though, because for it to happen you need to have a software developer on your team who has the skill, paired up with a manager who knows how to employ those skills. That’s not a common situation. In my direct personal experience, though, you can build incredible things when it does happen.
2018-01-10, 4:27 PM #50
Originally posted by Brian:
99%. The jargon goes out the window pretty quickly, too. Maybe it's different for the "big" companies. I've worked at a number of places in varying industries (content management, ecommerce, biotech, ad tech, security) and with the exception of the biotech there wasn't much science going on anywhere. I probably self-select based on the jobs I qualify for but read the "who's hiring" threads on Hacker News to get similar insight. The smartest people I've worked with are the ones that could cut through the **** (unfortunately earlier in my life I wasn't smart enough to recognize them).


The best mathematicians are the ones that put in enough thought to make things simple to state and understand. People who propogate jargon for its own sake are blindly following a cargo cult of notation (just like the worst programmers don't realize that the language or even the syntax isn't the most important thing), whereas if they knew what they were doing they'd just cut to the chase.

That said, I don't think Sutton made his book look complicated because of this; he's a researcher and has totally different aims than a mere practioner. Just like in physics you don't need to know every proof to be a good user of mathematics.
2018-01-10, 4:38 PM #51
Originally posted by Reverend Jones:
The best mathematicians are the ones that put in enough thought to make things simple to state and understand. People who propogate jargon for its own sake are blindly following a cargo cult of notation (just like the worst programmers don't realize that the language or even the syntax isn't the most important thing), whereas if they knew what they were doing they'd just cut to the chase.

That said, I don't think Sutton made his book look complicated because of this; he's a researcher and has totally different aims than a mere practioner. Just like in physics you don't need to know every proof to be a good user of mathematics.


Software engineering is pure artifice. It’s abstraction layered upon abstraction, jargon all the way down. How do you decide which layer of abstraction is the “simplest”, which one “cuts to the chase”?

Ridiculous, borderline offensive notion.
2018-01-10, 5:01 PM #52
Well I was talking about math. In math the artifice doesn't exist except in the brain of a human being, and usually drawing a simple picture or diagram can capture the most important thing.

I have no idea what software is like.
2018-01-10, 5:08 PM #53
Incidentally, this is probably why mathematics proofs so often have mistakes deeply hidden in them, with counterexamples emerging years later. Folks like the late Fields medalist and homotopy type theory co-founder, Vladimir Voevodsky, as well as computer scientist Leslie Lamport advocated for mathematicians to adopt a more structured approach to constructing an artifice in building their proofs in order to catch bugs early, but by and large mathematics remains something of an oral tradition written in impressionistic diagrams and pictures, whereas written mathematical language is invariably bursting with typos.
2018-01-10, 5:14 PM #54
Originally posted by Reverend Jones:
Incidentally, this is probably why mathematics proofs so often have mistakes deeply hidden in them, with counterexamples emerging years later.


Can’t imagine what that would be like
2018-01-10, 5:16 PM #55
Originally posted by Reverend Jones:
Well I was talking about math. In math the artifice doesn't exist except in the brain of a human being, and usually drawing a simple picture or diagram can capture the most important thing.

I have no idea what software is like.


A cross between mathematics and art history.
2018-01-10, 5:43 PM #56
Originally posted by Jon`C:
Can’t imagine what that would be like


Lol
2018-01-10, 9:13 PM #57
Originally posted by Reverend Jones:
Just admit that you think this is cool because of the sheer suprise that it's actually possible to cash out on a highly abstract topic from your pure math education. :P

If I ever learned about it I'd do it because it was cool.

Otherwise I'd just use linear algebra and calculus.


Today I listened to a talk by some Japanese guy who works for Toyota. Using topological data analysis methods he was able to take photographs of steel stock and determine its quality and probably failure points, and used machine learning algorithms to increase the accuracy to over 90%.

Pretty cool ****, the problem being it's hard to get cool jobs like that.
2018-01-10, 9:20 PM #58
Industrial chair.
2018-01-10, 9:27 PM #59
I've broken at least two crappy swivel chairs by leaning back on them too far.
2018-01-10, 9:36 PM #60
Not that kind of industrial chair.

That's how you get that sort of job. In western countries, I mean. You get a phd and a professorship at some mid-high tier university. Then you look at all of the companies operating in your area, and think really hard about how your research might solve a problem they have. You approach them, and ask them to fund your research. They create a funded chair, get a tax benefit, get their name on all of your papers and business cards, and maybe benefit from your research some day. You get food and a good excuse to work on a cool problem for a few decades.

Just don't watch the sequel, where you deliver the innovation and your industrial partner doesn't have the interest or resources to productize it. You approach their OEMs and they actively refuse to engage with you, if not threaten you, because an expensive problem for their customers is a lucrative business unit for them.

Or you can go work for Toyota, Hitachi, Komatsu,....
2018-01-10, 10:16 PM #61
I'm pretty sure the aluminum in my chair could have benefited from some topological data analysis. I'm not even fat like mb
2018-01-10, 10:47 PM #62
i met mb once but he didn't know who i was was. fun-size fact: he's short
2018-01-10, 10:53 PM #63
You used to be someone different in your past than you used to be now?
2018-01-10, 10:57 PM #64
Yes; it's kind of complicated.
2018-01-11, 7:36 PM #65
Today spent time with a friends whose an UG listening to talks on UG research topics so if something struck his fancy we could try doing some research.

Then went to some talks on homotopy type theory and later on geometric group theory, which I will not attempt to describe here.
2018-01-12, 12:43 AM #66
C++ tip:

If you use std::move() to pass an lvalue to a function that accepts an rvalue reference, the lvalue you moved is in an unknown state. That doesn't just mean the lvalue is in some potentially invalid 'moved-from' state. It means unknown. An rvalue reference is just like any other mutable reference, with an optional invitation that the callee may take ownership if the object. That means if you move an lvalue into some code you don't control, and that code decides it doesn't really want what you've passed it, you are still the owner of that object.

Lessons.

1.) Avoid passing stuff like RAII objects using std::move/std::forward.
2.) If it's unavoidable, make sure to manually destroy any unmoved objects that you need promptly destroyed.
3.) If you are writing the callee, prefer pass by value + uncopyable objects when possible. Only accept rvalue references if you are forced to accept copyable types, or if you genuinely need an optional move.
4.) Remember this for the future, if your code ever deadlocks while read()ing a pipe whose write end you moved away.
2018-01-15, 11:56 PM #67
C++ tip:

Visual C++ is hot garbage. There are a lot of reasons why it's hot garbage, so let me be more specific.

Visual C++ has roughly a billion dials, a trillion knobs, and a googolplex of moving parts. You have the usual pieces - compiler, linker, archiver, build system - but there is just so much more on top of that. So much that just... no other toolchain even tries doing, even comes close to doing.

So for example, by default CL (the compiler) cannot run in parallel with other CL processes. CL shares state with other processes by performing unsynchronized writes to a file on disk (a PDB file). You may optionally use the CL /FS flag to synchronize writes via a different program called MsPdbSrv, but it's buggy as ****. For one thing, MsPdbSrv is undocumented and hangs around however long it wants, so it's roughly the last process you ever want to see pop up from a continuous integration job. On top of that, though, CL sometimes seems to just totally ignore the /FS flag for no reason. Running CL processes in parallel for the same project seems completely untested. I don't think you can even enable it under Visual Studio, but it's out there, documented, ready and waiting to **** up your day.

Okay, so parallel CL processes for the same project are out. Fortunately, project-level parallelism is fine. That's what you're supposed to do. Take all of the source files in your project, build them all at once in a single CL command. Don't overthink things, either: normally the build system is responsible for knowing which objects need to be rebuilt after source code changes, but CL does this too. If any source file in your project has been changed, rebuild the whole thing. Compile 'em all and let CL sort 'em out. This is the default in Visual Studio, so it's ostensibly well tested.

What about if your build has one or two really big projects, though? Well hey, CL's got you covered there, too! You can optionally tell CL to parallelize itself using the CL /MP option. See, normally build systems are responsible for parallel scheduling of compilers, but here CL handles it too. This is actually a great thing because process creation is much slower on Windows than it is on Linux. Assuming it works. Given how hidden the option is in Visual Studio, I'm thinking it probably isn't tested well. You should split up your projects into smaller ones and link them together instead.

Let's keep count. So far I've mentioned three different, optional/secret ways of parallelizing your Visual C++ builds. Object-level parallelism using /FS and MsPdbSrv, manual project-level parallelism, and CL /MP.

If you're familiar with the Visual Studio UI, you've probably noticed the conspicuous "maximum number of parallel project builds" setting. That means Microsoft has apparently blessed project-level parallelism as the true solution, although it doesn't work exactly like you might think. If you're using MsBuild, that option tells MsBuild to create that many child processes. Projects are dispatched across those child processes for building. This is where the project-level parallelism comes from. It's not just that easy, though. MsBuild lets you specify different compiler options per source file, so instead of just saying something like "take all of my source files and build them", it has to snoop around for CL invocations and merge them together if they have the same command line arguments. It's all very complicated, but in case you can't tell that's supposed to be the theme for this post.

Speaking of complicated, do you want to know how MsBuild /t:Clean works? When you run MsBuild it fires up a daemon called Tracker. This nosey little ****er hooks into every program your build runs, even non-Microsoft ones, and writes down the names of every file they ever created. It stores the information in about a billion files, scattered everywhere. When you run MsBuild /t:Clean it loads those files and deletes everything listed. As far as I can tell there's no way to get rid of Tracker once it's poisoned your process tree. There are pros and cons here. The pro is that Tracker keeps idiots from accidentally persisting out of date files across clean rebuilds. The con is that Tracker makes it impossible to do, even if you want to do something as simple as collect data across CI builds.

That's the second daemon, by the way. First was MsPdbSrv, then Tracker. Then there's also vctip, which spies on you for Microsoft. That's a new one I just discovered today. And, of course, if you're using Visual Basic or C# post-Roslyn, you also have vbcscompiler, a long-running compiler server which starts itself when needed and stops itself whenever it feels like. Which is... well, I don't even know. It's necessary, because the C# compiler takes too long to start up now. But it's also horrible that it's necessary. And it means you have four opaque, undocumented, arbitrary-lifespan servers running on your computer - or worse, your build server.

Might be getting petty at this point, but I'll add this. You all know how the Windows shell is bad at Unicode, right? Yet Visual Studio supports Unicode compiler error messages. How does it do this? It doesn't read diagnostics from stdout. The text in the outbox box actually comes from a pipe that Visual Studio passes to its child processes over an environment variable, and that pipe does support Unicode. It's not just text, either. Visual Studio parses the data from that pipe somehow, and uses it to detect whether some descendant process has failed -- and then it terminates the build. So let's say, hypothetically, that your project has a test, and during that test it runs CL against a source file that contains an error. On purpose, because your code runs CL, and you want to test how your code handles CL errors. You'd run your test under Visual Studio and boom, your whole test suite process tree vanishes into an abyss. If you didn't know about this environment variable you would be mystified. Terrified, even. You'd think you had done something horribly wrong, you'd run your test harness under a debugger, you'd log every damn thing your program does, and nothing. It just goes away. The reason I know about this is because I spent days for a former employer specifically reverse-engineering how Visual Studio does this. If you are just some schmuck writing tests, there'd be no hope for you.

And I haven't even gotten into the frontend yet.
2018-01-16, 2:55 PM #68
**** comcast
2018-01-16, 7:45 PM #69
Originally posted by Reid:
**** comcast


So like an idiot I thought "surely the comcast router can't be that bad" since I was only going to live in this apartment for a year before finding a new place.

Turns out the router had some setting enabled to download router settings directly from Comcast (****ing why?). Well, since Comcast is a damn mess, it was downloading settings from completely random people's accounts.

Figuring out what the hell was going on, and how to start the router in a way that would actually let me access the panel to disable the setting was a nightmare.

But I have functioning internet again.
2018-01-16, 8:12 PM #70
What. The. ****.

Are you sure that's what happened? Downloading other people's random configurations?? I can't imagine why this would even be close to intended behavior of a router.
2018-01-16, 8:57 PM #71
Originally posted by Reverend Jones:
What. The. ****.

Are you sure that's what happened? Downloading other people's random configurations?? I can't imagine why this would even be close to intended behavior of a router.


Yeah. Different SSID and a list of devices with MAC addresses I didn't know of.
2018-01-17, 11:02 PM #72
So a quick update on my WoW hacking adventure: I've developed a teleport hack and the server I'm testing on apparently gives no ****s when you teleport distances, so I wrote some scripts which teleport around an instance looting what chests and items are available.

I'm fabulously wealthy in game now. The challenge is sorta gone now unless I wanna try some other things.
2018-01-17, 11:14 PM #73
Originally posted by Reid:
So a quick update on my WoW hacking adventure: I've developed a teleport hack and the server I'm testing on apparently gives no ****s when you teleport distances, so I wrote some scripts which teleport around an instance looting what chests and items are available.

I'm fabulously wealthy in game now. The challenge is sorta gone now unless I wanna try some other things.


Is this on an official server, or on an emulator?
2018-01-17, 11:25 PM #74
Emulator, I don't think I'm skilled enough to break an official server, Blizzard can be pretty crafty
2018-01-18, 6:20 AM #75
Oh. Consider me Less Impressed now, Reid!!!
2018-01-18, 8:12 AM #76
Originally posted by saberopus:
Oh. Consider me Less Impressed now, Reid!!!


I know enough about reverse engineering to slowly figure out a block of code, I don't know enough to identify and subvert anti-cheat detection systems. I don't want to piss away money getting accounts banned for small things until I'm more sure of what I'm doing.
2018-01-20, 12:06 AM #77
Originally posted by Jon`C:
Test coverage is often collected with a profiler, or something very similar to a profiler. However, it’s not by itself concerned with performance.

You aren’t measuring your code when you collect coverage data, you’re measuring your tests. You’re trying to determine how well your tests exercise the code.

No. Test coverage tells you what parts of your code are definitely untested.

Here’s a formal definition of test coverage I came up with a few years ago, if it helps.

Take some set of source code locations, L, which maps bijectively onto the lines of code in some collection of source code (e.g. each element is an ordered pair of source file name and line number). We say that line x in L is run (in R) if the line of code at location x is executed during any test. We say that line x in L is tested (in T) if any single test both executes the line and asserts the result of that execution (N.B. T is a subset of R.)


Okay, got it. The math language helps clear it up.

Originally posted by Jon`C:
Test coverage partitions L into three disjoint subsets: covered, uncovered, and uncoverable. Covered is a subset of R. Uncovered is disjoint from R. No other information is known.


Wouldn't uncoverable be a subset of uncovered?

Originally posted by Jon`C:
GCC builds the lookup table. It has to be regenerated any time the translation unit changes, but it can be cached, and indeed ccache does so.

Some of them might slow you down initially, for sure. Code reviews definitely have a high perceived cost, for example. But so does testing. And so does writing a spec. And so does using an SCM, a 40 hour work week, and so-forth. You have to decide where to draw the line, which tools or processes are worth that up-front cost.

fwiw, though, roughly 100% of the software industry presently draws that line faaaaaaar short of the break-even point.


Okay, so you seem to reeeally think that good code review is extremely important. It seems the GNU Compiler has a bunch of this stuff built in, should I learn Git along with whatever tools GCC/G++ has?

Originally posted by Jon`C:
They are exceedingly mathy.


That seems clear given it's a proof. I guess I'm not sure what Reverend Jones' point was about them. Is it typical for comp sci undergraduates to do this kind of theoretical proof?
2018-01-20, 12:21 AM #78
Originally posted by Reid:
The sentiment was also expressed in the video, which expressed disdain for mathematics as being obfuscatory or otherwise hard to penetrate. Is this sentiment really that common among computer scientists? I'll admit that, in my limited experiences with undergraduates, I didn't think highly of their ability to reason mathematically™ (in the proper sense), although many had some competency in math. But the thing is, the topics the Intro are decrying as "too mathy" are hardly more difficult than lower division or intro upper division math topics, and not particularly challenging ones at that. To me, it feels frustrating that frankly elementary topics should be treated as soooo hard by CS people. Not sure why this is, honestly.


Originally posted by Reid:
That seems clear given it's a proof. I guess I'm not sure what Reverend Jones' point was about them. Is it typical for comp sci undergraduates to do this kind of theoretical proof?


I wouldn't know. Also, I wasn't aware that computer science was taught at the undergraduate level.
2018-01-20, 12:33 AM #79
https://pron.github.io/posts/correctness-and-complexity
2018-01-20, 12:34 AM #80
Quote:
Deductive Proofs

Deductive proof (namely, use M⊢φ to show M⊨φ)

is another general approach. Like model checkers, it is precise, and therefore it is computationally no easier than the model checking problem (in fact, the two can be made equivalent in many cases).

But even though it, too, is subject to the same worst-case complexity bounds it can be more scalable than model-checkers, because it can be tailored to very specific problems, and programs verified with deductive methods are usually written with deductive proofs in mind. Those who use deductive proofs don’t write programs and then prove them. They write and prove hand-in-hand, often severely limiting their program, so that proofs will be possible.

There are interesting advancements in automated proof tools (in particular, SMT solvers), that can assist with many, usually simple theorems, but in general, currently humans do the bulk of the work.

Empirical evidence suggests that this approach – while can be made to work – is very costly, but not more than current high-assurance software development methodologies, used, say, in the defense industry. But now we know why they’re so hard: Someone must pay the complexity price, and if it’s not the computer, it’s us. In the industry, manual deductive proofs are usually used to tie together results from automated proofs and model checkers.

Even if we put aside the difficulty in finding a proof, a feasible proof – i.e. one of polynomial length in the size of the program – may not even exist. Its is a well-known result (by Cook and Reckhow) that even in the simple propositional calculus – simplest of all logics, just boolean variables and boolean connectives, and, or, not – a short proof exists for all tautologies iff NP=coNP, which is assumed to not be the case.


(Emphasis added.)
1234567891011121314151617181920212223242526272829303132333435

↑ Up to the top!