'MIT 6.824: Distributed Systems' 카테고리의 글 목록

Lecture 7: Fault Tolerance: Raft (2)

Raft Sync 다음과 같은 상황을 가정해봅시다 10 11 12 13 14 S1 3 S2 3 3 4 S3(leader) 3 3 5 6 다음과 같은 상황에서 S3가 13번째 index에 값을 넣는 상황을 가정해보자 Follower로 Log요청을 할때 leader는 prevLogIndex(12)와 prevLogTerm(5)을 같이 보내준다 서버 S1,S2와 prevLogIndex(12)와 prevLogTerm(5)가 맞지 않기 때문에 두 서버가 거절한다 leader는 nextIndex를 업데이트 한다 => nextIndex[S1]=12, nextIndex[S2]=12 (1만큼 rollback) 다시 log를 요청할 때는 prevLogIndex(11)와 prevLogTerm(3)로 전달하며, 리더는 12번째..

MIT 6.824: Distributed Systems 2024.02.21

Lecture 6: Fault Tolerance: Raft (1)

Raft replication을 구현하는 방법 중 하나이다 홀수개의 서버로 구성 - Raft는 majority vote를 통해 리더를 선출한다, 리더가 client request를 받는다 => 이 때 죽어있는서버 살아있는서버 포함해서 majority달성해야한다 (2f+1 -> f failure까진 tolerant) - break symmetry - network parition이 발생했을때 적어도 한 partition은 majority를 달성할수있다 요청처리 리더는 클라이언트의 요청을 바로 처리하는 대신 log를 보낸다 리더는 AppendEntries message를 보내면서 서버가 어디까지 commit 되었는지 또 알려준다 이를 보고 follower가 어디까지 commit 할지 결정한다 (piggyba..

MIT 6.824: Distributed Systems 2024.02.07

Lecture 5: Go, Threads, and Raft

Closure bad.go package main import "sync" func main() { var wg sync.WaitGroup for i := 0; i < 5; i++ { wg.Add(1) go func() { sendRPC(i) wg.Done() }() } wg.Wait() } func sendRPC(i int) { println(i) } /////////////////////////////// $ go run bad.go 4 5 5 5 5 Go routine에서 sendRPC를 호출할때 i의 값이 변경되어 예상치못한 값이 호출된다 (참고로 main thread 종료되면 sub thread 자동종료) loop.go package main import "sync" func main() { v..

MIT 6.824: Distributed Systems 2024.02.05

Lab 1: MapReduce

https://pdos.csail.mit.edu/6.824/labs/lab-mr.html 6.5840 Lab 1: MapReduce Introduction In this lab you'll build a MapReduce system. You'll implement a worker process that calls application Map and Reduce functions and handles reading and writing files, and a coordinator process that hands out tasks to workers and copes with fail pdos.csail.mit.edu 처음에 너무 어려워서 다른 사람 깃허브 읽어보고 했습니다 https://github.c..

MIT 6.824: Distributed Systems 2024.01.30

Lecture 4: Primary-Backup Replication

Replication에서 Failure의 의미 - fail stop faults : stop if anything goes wrong (unplugged, CPU overheats) - 소프트웨어나 디자인 상의 버그는 replication 있어도 막지 못하기 때문에 고려하지 않는다 - backup / primary 간의 Failure가 독립적이어야한다 Replication 방법 State transfer - Sending state (contents of RAM. memory) - modification없이 바로 복제를 할 수 있다 Replicated State machine - external event만 보낸다 (input) 주로 external event < state 이다 고려사항 What sta..

MIT 6.824: Distributed Systems 2024.01.25

Lecture 3: GFS

Why BIG storage so hard? high performance => sharding (분산처리) => fault tolerance 필요 sharding causes faults => fault tolerance 필요 tolerance => replication replication => Inconsistency consistency => low performance * strong consistency : 서버 하나와 통신하는것처럼 보이게 하는것 Bad replication design c1 -> s1,s2에 write request c2 -> s1,s2에 write request consistency 보장 X GFS 장점 - big & fast - sharding : throughput 늘..

MIT 6.824: Distributed Systems 2024.01.24

Lecture 2: RPC and Threads

Thread I/O concurrency - 한 thread가 IO (e.g. client request, wait for disk read)를 처리하는 동안 다른 스레드는 다른일을 할 수 있다 Multicore performance Convenience like master check whether each worker is still alive. event-driven으로 single thread로 여러가지 일을 할수있지만 multi-core speed up 하지못한다 Threading challenges Sharing data safely (race condition) : 같은 자원을 다른 스레드가 동시에 사용 -> use locks (Go's sync.Mutex) -> or avoid sharin..

MIT 6.824: Distributed Systems 2024.01.23

개발일기장

MIT 6.824: Distributed Systems 7

티스토리툴바