Wikipedia:Reference desk/Archives/Computing/2016 September 28

From Wikipedia, the free encyclopedia
Computing desk
< September 27 << Aug | September | Oct >> September 29 >
Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


September 28[edit]

Average case time complexity of maze backtacker solver[edit]

Hello,

Given a simply-connected maze on a 2-d grid (there exist a single path betwene any two points, no loops), how can the average time complexity of a DFS backtracking solution, and of a BFS solution, be found? I understand this is a problem of tree traversal and I have looked at this StackExchange answer, but I failed to understand the direction suggested there. Thanks — Preceding unsigned comment added by 77.126.23.147 (talk) 07:47, 28 September 2016 (UTC)[reply]

Depth-first search and breadth-first search are the relevant articles, but I don't know if they answer your Q. StuRat (talk) 14:43, 28 September 2016 (UTC)[reply]
This is a fantastic question, and I believe its answer is much more difficult to find than might appear superficially. I agree that your StackExhange link is not taking you towards an easy-to-understand answer, let alone a mathematically-sound solution.
It is very hard to determine statistics for number of times you will revisit each square, on average, across the set of all possible simply-connected mazes. But if you can determine those statistics, you have your answer. I'm still thinking of a good way to analyze this problem simply and correctly, and which resources might be good to reference. Nimur (talk) 15:14, 28 September 2016 (UTC)[reply]
I teach this in Big O complexity. This is how I approach it: Assume the solution to your maze is X steps. A breadth-first search will check all solutions with 1 step, then all with 2 steps, then all with 3 steps, and so on. All of those are guaranteed to be a waste of time because we won't find the solution until we get to all paths with X steps. Even then, we will check, on average, half the paths of X steps before finding the solution. A depth-first search will get to X steps faster. Assume the longest path is actually X steps. You will immediately jump to searching all paths of X steps. Depth-first is clearly the way to go... but only in that case. Assume that most paths are longer than X steps. You will be searching a lot of paths that are too long before eventually getting to the correct X-length path. If X is small compared to the average path, you would be better off only looking at short paths - a breadth-first search. Of course, a random maze doesn't make it easy to know how long the solution path is. So, you have to guess. You turn it into a universe of paths. One of those paths is the correct one. How do you trim out the ones you don't want and focus on those you do want. For example, if I have a 5x5 grid and the starting point is on the left and the exit is on the right, I know that the absolute shortest possible path is 5. I can do a breadth-first search starting with paths of length 5. Getting a clear "This is better" answer is not really possible because it is based on the length of the solution compared to the lengths of all possible paths. 209.149.113.4 (talk) 16:06, 28 September 2016 (UTC)[reply]
209..., I like your approach, but my concern rests exactly with this statement: "we will check, on average, half the paths..." Are you certain that this is actually a true statement? Doesn't this depend on some pretty difficult mathematics that describe the topology of a maze? Isn't this detail even more important in the case of a depth-first search, where (because paths in a simple closed maze must never overlap or cross - that would be a loop!) we have certain search paths that guarantee 100% of the sub-paths are unsuitable?
The statistical incidence-rate (and depth) of any such "guaranteed-to-fail" sub-path depends on the specific layout of any individual maze. The question seeks to find the average run-time over the set of all possible simple closed mazes. So we need to do some pretty heavy graph-theory math to analyze this.
So... your quantitative statement about the probability of choosing the correct path might be completely valid, but I am not certain; and I'm still brooding over a good method for proving or disproving it. Nimur (talk) 19:29, 28 September 2016 (UTC)[reply]
My statement that you will check half the paths is correct. Assume you have P paths of length X. You also (somehow) know that the solution is length X. You know nothing more about the maze. So, you have to start choosing paths and testing them. You might find the solution on the first check. You might find it on the last check. Every check from 1 to P will have a 1/P chance of being the one that finds the solution. So, on average, you will check P/2 paths until you find the solution. In class this argument wouldn't come up because we would have already covered sorting algorithms and done many "on average" cases. Therefore, I'm not used to explaining it in better detail. Of course, if you did in fact know more about the maze, you could trim the search space. Assume that I know I can cut the search space in half by kicking out P/2 solutions. I'd have a new P of possible candidates. I would still have to check P/2 of those on average to find the solution. All I really did is replace P with a new P. I didn't change the fact that, on average, I will check half the candidates before finding the solution. 209.149.113.4 (talk) 11:26, 29 September 2016 (UTC)[reply]
That argument assumes that there is only a single solution. If there are more than one, you will, on average, have to check a smaller part of search level X. --Stephan Schulz (talk) 11:56, 29 September 2016 (UTC)[reply]
The assumption of a single solution seems appropriate for a "simple, closed" maze, unless I misunderstand that definition.
209's justification seems solid enough... I'm convinced that he has demonstrated "50%" as an upper bound.
I assert that it is possible to get an even tighter algorithmic upper bound, because each path searched has the potential to reveal information about the maze topology. Testing some paths, and failing to find the solution, may provide sufficient geometric information to guide the selection of the next test-path, which improves the method beyond relying "random chance." I further propose that with enough effort, you might be able to use this to put an even tighter algorithmic upper-bound on such a depth-first-search where each subsequent path choice is guided by maze geometry information learned from previous path choices. That's the only bit I am not certain about; whether this method would actually beat the "big O" complexity class of the "random" path choice that 209 suggested depends on how much algorithmic complexity is required to analyze previous-path geometry.
Nimur (talk) 14:48, 29 September 2016 (UTC)[reply]
Another way to frame this argument is converting it to set theory. In our scenario, we have a set of paths and only one is the solution (we have no idea which one). We have a very simple O(1) algorithm that tells us if a path is a solution - we try the path and it either works or fails. So, no matter how I sort the set of solutions, I should statistically expect to find the solution equally with each attempt. From that, it follows that I will search half the solutions on average before finding a solution. I use this approach early on in Big O because I think people handle sets or collections better than abstract solutions. (As for the "what if there is more than one solution" argument - that is impossible because there are no loops. ) 209.149.113.4 (talk) 16:15, 29 September 2016 (UTC)[reply]
I suspect that the exact method used to create the "random" maze may be quite important in figuring out the optimal solution method. I always favor a BFS search approach that starts at both ends, rather than just one. While not guaranteed to be quicker, imagine a case where an infinite 2D maze is just an open grid with no walls (but we don't know this ahead of time, so can't use it). If our starting and ending points are r=10 spaces away, then building a circle from one pt until it hits the other would fill in about pi(10)2 spaces, while two circles, each with a radius of 5, would fill in 2pi(5)2. That works out to half as many spaces. When you go to 3D maze, that goes to one quarter as many spaces. When you go to higher dimensions, it becomes even more efficient (doubling the difference with each added dimension). Also note that the puzzle may not be a maze. Imagine a chess board in the current position and initial position, where your goal is to find the steps used to get there. Starting from both ends could be much quicker there, assuming we just use a brute-force BFS. (Of course, if you have more than a few moves, a pruning method really is needed here.) StuRat (talk) 18:27, 28 September 2016 (UTC)[reply]
You can use an A* algorithm, or some other heuristically guided method. Plain depth first is only guaranteed to terminate if the maze is finite and if there are no loops (or you do bookkeeping to handle them). BFS will always find the shortest route, so it naturally handles loops and infinite mazes. --Stephan Schulz (talk) 19:01, 28 September 2016 (UTC)[reply]
The problem with using a heuristic approach to maze-solving is that it's not obvious which moves are better or worse, until the maze has been solved. If you move closer to the target, that may be a worse move, because you are moving towards a dead end. StuRat (talk) 20:11, 28 September 2016 (UTC)[reply]
If you knew which moves were better, it wouldn't be a search, heuristic or otherwise. The heuristic in A* is not arbitrary: see admissible heuristic. For a grid maze, the Manhattan distance to the goal is an admissible heuristic. A simple breadth-first search is equivalent to A* with a distance heuristic of 0, which is admissible for any search but inferior to Manhattan distance in this case. -- BenRG (talk) 21:29, 29 September 2016 (UTC)[reply]
It's not obvious that being physically closer, whether in Euclidean distance or taxicab distance, is an indication that you are on the correct path. Consider a maze which is a giant spiral, plus a few dead-end offshoots here and there, with the center of the spiral being the start and the outside being the finish. In such a maze you would need to move away from the target almost as often as you move towards it. StuRat (talk) 02:01, 1 October 2016 (UTC)[reply]

Help with MIDI files in WP[edit]

When I click "Play" on a MIDI file in WP such as this one: Play, the file doesn't play. Instead I'm asked to save it to my local hard disk. Is this inevitable? Does this happen to everyone? Or is there something I'm doing wrong? Is there something I can do to have it play in the browser as I'm reading the article? Thanks. Basemetal 19:30, 28 September 2016 (UTC)[reply]

See WP:Media help (MIDI) for details. Tevildo (talk) 19:53, 28 September 2016 (UTC)[reply]
Thanks. That help says nothing about Chrome. Does anyone know what to do about this issue in Chrome? Basemetal 20:02, 28 September 2016 (UTC)[reply]
You'll need a browser extension - however, Google isn't proving helpful about obtaining one. Does anyone else have a suggestion? Tevildo (talk) 22:04, 29 September 2016 (UTC)[reply]
Exactly. Thanks Tevildo. Is there really no Chrome users here who've solved the problem for themselves? Maybe I'll check at the Village Pump. But don't hesitate to reply if you've suddenly got a brain storm, y'all. Basemetal 15:31, 1 October 2016 (UTC)[reply]